Hotspot丨DBRX reaches 132 billion parameters, the most powerful open source model changes hands
·Focus: Artificial intelligence, chip and other industries
Welcome all guests to pay attention and forward
Recently, the field of global open source large models has ushered in major changes. The new open source model DBRX launched by the startup company Databricks has surpassed the previous leaders Llama 2, Mixtral and Grok-1 at the technical level and topped the list of the world's most powerful open source large models. .
This breakthrough achievement undoubtedly sets a new milestone in the field of open source models.
It is worth noting that DBRX achieves significant reduction in training cost. Databricks invested only US$10 million and 3,100 H100 chips to efficiently complete the training of DBRX within two months.
Compared with the huge investment required by Meta to develop Llama2, this cost demonstrates Databricks' excellent capabilities in technical efficiency and cost control.
In terms of performance, DBRX also shows strong strength. Whether in the field of language understanding, programming, mathematics or logic, DBRX easily surpasses the open source models LLaMA2-70B, Mixtral and Grok-1.
What’s more worth mentioning is that the overall performance of DBRX even surpasses GPT-3.5. Especially in terms of programming, DBRX has demonstrated excellent performance beyond GPT-3.5.
Databricks recently launched the open source model DBRX with a parameter size of up to 132 billion.
This model adopts an advanced fine-grained MoE architecture and only uses 36 billion parameters for each input, significantly improving token throughput per second.
DBRX has more expert models through a fine-grained mixture of experts (MoE) architecture, thus significantly surpassing LLaMA 2-70B in inference speed and achieving a two-fold improvement.
DBRX is a large model based on Transformer pure decoder, which also uses next token prediction for training.
In MoE, some parts of the model are started based on the query content, which effectively improves the training and operation efficiency of the model.
Compared with other open source MoE models such as Mixtral and Grok-1, DBRX adopts a fine-grained design and uses a larger number of small experts.
DBRX has 16 expert models, 4 of which are used at a time, while Mixtral and Grok-1 each have 8 expert models, and 2 are selected at a time.
This design allows DBRX to provide 65 times the number of possible expert combinations, greatly improving model quality.
In addition, DBRX also adopts technologies such as Rotated Position Encoding (RoPE), Gated Linear Unit (GLU), and Grouped Query Attention (GQA) to improve model quality. At the same time, DBRX also uses the GPT-4 tokenizer provided in the tiktoken repository.
At the method level, the DBRX model (including pre-training data, model architecture and optimization strategy) is equivalent to the previous generation MPT model, but the computational efficiency is increased by nearly 4 times.
①After comprehensive evaluation, DBRX’s “fine-tuned version” Instruct performed excellently in multiple benchmark tests.
In the composite benchmark test of Hugging Face Open LLM Leaderboard, DBRX Instruct topped the list with a score of 74.5%, significantly ahead of the second place Mixtral Instruct with 72.7%.
At the same time, in the Databricks Model Gauntlet, an evaluation suite that contains more than 30 tasks and spans six fields, DBRX Instruct also came out on top with a score of 66.8%, which is significantly better than the second place Mixtral Instruct's 60.7%.
②DBRX Instruct has shown particularly outstanding abilities in programming and mathematics-related tasks.
In HumanEval, a task to evaluate code quality, its accuracy rate reached 70.1%, which is about 7 percentage points higher than Grok-1, about 8 percentage points higher than Mixtral Instruct, and surpassed all evaluated LLaMA2-70B Variants.
In the GSM8k math problem solving test, DBRX Instruct also achieved the best result of 66.9%, surpassing Grok-1, Mixtral Instruct and other LLaMA2-70B variants.
It is worth noting that although Grok-1 has 2.4 times the number of parameters than DBRX Instruct, DBRX Instruct still maintains its leading position in the above programming and mathematical tasks.
Even on top of the CodeLLaMA-70B Instruct model, which is specifically designed for programming tasks, DBRX Instruct still performs well on HumanEval.
③DBRX Instruct also performs well in multi-language understanding capabilities.
On the large-scale multi-task language understanding dataset (MMLU), DBRX Instruct continues to demonstrate top performance, scoring as high as 73.7%, surpassing all other models in this comparison.
To sum up, DBRX’s “fine-tuned version” of Instruct has performed well in multiple benchmark tests, especially in programming, mathematics and multi-language understanding.
Databricks, originated from the AMPLab project of the University of California, Berkeley, focuses on the development of Apache Spark, an open source distributed computing framework based on Scala, and pioneered the concept of "data Lakehouse".
In March 2023, the company followed the ChatGPT craze and launched the open source language model dolly, and put forward the slogan of "the first truly open and commercially viable instruction tuning LLM (large model)" in the subsequent version 2.0, which marked With Databricks' "second industry innovation".
It is worth mentioning that Jonathan Frankle was the chief scientist of the generative AI startup MosaicML.
Databricks successfully acquired MosaicML for US$1.4 billion in June 2023. This move prompted Frankle to resign as a professor at Harvard University and devote himself entirely to the research and development of DBRX.
Just a few days ago, Musk announced the birth of Grok-1, the largest open source model in history. This event undoubtedly attracted widespread attention in the industry.
The DBRX team has 16 experts and selected 4 for model development, while the Mixtral and Grok-1 teams each have 8 experts and selected 2.
This choice gives DBRX over 65 times more expert combination possibilities, significantly improving model quality.
DBRX adopts technologies such as Rotated Position Encoding (RoPE), Gated Linear Unit (GLU) and Grouped Query Attention (GQA) in model development, and uses the GPT-4 tokenizer provided in the tiktoken repository ).
These decisions are the result of in-depth evaluation and scaling experiments by the team.
① The RAG tool will be launched soon. This model is of great significance to its development. At the same time, Databricks has built-in simple and efficient RAG method.
Next, we will work on making the DBRX model the best generator model for RAG to provide users with more powerful support.
②The DBRX model will be hosted on all mainstream cloud environment products, including AWS, Google Cloud (GCP) and Azure.
As an open source model, users are encouraged to use it freely according to their own needs to promote business development and innovation.
③The DBRX model is expected to be provided through the Nvidia API Catalog and supported on the Nvidia NIM inference microservice.
This will bring users a more stable and efficient reasoning experience and further promote business growth and expansion.
Databricks’ focus on helping enterprises build, train and scale models that meet their specific needs is far-reaching.
This unicorn team places a high priority on enterprise adoption as it directly relates to their business model.
As part of the LLM release program, Databricks has launched two models under an open license: DBRX Base and DBRX Instruct.
DBRX Base is a pre-trained basic model, while DBRX Instruct is a fine-tuned version for a small amount of interaction.
It’s worth mentioning that DBRX is supported by Azure Database on AWS, Google Cloud, and Microsoft Azure, which means enterprises can easily download models and run them on any graphics processing unit (GPU) of their choice.
In addition, enterprises can choose to subscribe to DBRX and other tools, such as Retrieval Augmented Generation (RAG), to customize LLM through Databricks’ Mosaic AI Model service offering.
The Mosaic AI Model service connects to DBRX via Foundation Model APIs, enabling enterprises to access and query LLMs from service endpoints. This feature provides enterprises with greater customization capabilities and flexibility.
Foundation Model APIs offer two pricing models: pay per token and allocated throughput.
Pay-per-Token pricing is based on concurrent requests, while throughput is billed per hour per GPU instance.
Both rates, including cloud instance costs, start at $0.070 per Databricks unit.
At the same time, Databricks also provides corresponding pricing ranges for different GPU configurations to meet the computing needs of enterprises in different scenarios.
With the rapid progress of large AI models in 2024, innovation and breakthroughs will show exponential growth.
For example, models such as OpenAI Sora, stable diffusion3, stable diffusion3 Tubro, Grok-1, and Claude 3 have been released and open for use.
As the LLM community gradually matures, we have reason to believe that in the near future, every enterprise will have the ability to build proprietary private LLM models in the emerging field of generative AI, and fully explore and utilize the value of enterprise private data .
Reference for some information: Heart of the Machine: "The open source large model throne changed hands again, 132 billion parameter DBRX went online", Xinzhiyuan: "The world's strongest open source model changed hands overnight, 132 billion parameter inference soared 2 times", CSDN: " Databricks' open source 132 billion parameter large model disrupted the situation, Grok and LLaMA all lost", Programming Singularity: "Musk's open source Grok 10 days ago was beaten, 132 billion parameter DBRX came online", open source AI project implementation: "DBRX: The world's most powerful open source large model changes ownership》
Recommended reading: