Article count:1318 Read by:3201801

Account Entry

Hotspot丨DBRX reaches 132 billion parameters, the most powerful open source model changes hands

Latest update time:2024-04-01
    Reads:

·Focus: Artificial intelligence, chip and other industries

Welcome all guests to pay attention and forward






Foreword :
The birth of DBRX marks the latest pinnacle of global open source artificial intelligence language model technology, and indicates that the development of artificial intelligence technology has officially entered a new stage.

Through the combination of deep learning technology and large-scale training data, this model not only performs well in the field of natural language processing, but also demonstrates unprecedented capabilities in multiple fields such as program code parsing and generation, complex mathematical calculations, and logical reasoning.


Author | Fang Wensan
Image source | Network

The throne of the world's most powerful open source model changes hands


Recently, the field of global open source large models has ushered in major changes. The new open source model DBRX launched by the startup company Databricks has surpassed the previous leaders Llama 2, Mixtral and Grok-1 at the technical level and topped the list of the world's most powerful open source large models. .


This breakthrough achievement undoubtedly sets a new milestone in the field of open source models.


It is worth noting that DBRX achieves significant reduction in training cost. Databricks invested only US$10 million and 3,100 H100 chips to efficiently complete the training of DBRX within two months.


Compared with the huge investment required by Meta to develop Llama2, this cost demonstrates Databricks' excellent capabilities in technical efficiency and cost control.


In terms of performance, DBRX also shows strong strength. Whether in the field of language understanding, programming, mathematics or logic, DBRX easily surpasses the open source models LLaMA2-70B, Mixtral and Grok-1.


What’s more worth mentioning is that the overall performance of DBRX even surpasses GPT-3.5. Especially in terms of programming, DBRX has demonstrated excellent performance beyond GPT-3.5.



DBRX large model, using MoE architecture

Databricks recently launched the open source model DBRX with a parameter size of up to 132 billion.


This model adopts an advanced fine-grained MoE architecture and only uses 36 billion parameters for each input, significantly improving token throughput per second.


DBRX has more expert models through a fine-grained mixture of experts (MoE) architecture, thus significantly surpassing LLaMA 2-70B in inference speed and achieving a two-fold improvement.


DBRX is a large model based on Transformer pure decoder, which also uses next token prediction for training.


In MoE, some parts of the model are started based on the query content, which effectively improves the training and operation efficiency of the model.



Compared with other open source MoE models such as Mixtral and Grok-1, DBRX adopts a fine-grained design and uses a larger number of small experts.


DBRX has 16 expert models, 4 of which are used at a time, while Mixtral and Grok-1 each have 8 expert models, and 2 are selected at a time.


This design allows DBRX to provide 65 times the number of possible expert combinations, greatly improving model quality.


In addition, DBRX also adopts technologies such as Rotated Position Encoding (RoPE), Gated Linear Unit (GLU), and Grouped Query Attention (GQA) to improve model quality. At the same time, DBRX also uses the GPT-4 tokenizer provided in the tiktoken repository.


At the method level, the DBRX model (including pre-training data, model architecture and optimization strategy) is equivalent to the previous generation MPT model, but the computational efficiency is increased by nearly 4 times.



Outstanding performance in three core competencies


①After comprehensive evaluation, DBRX’s “fine-tuned version” Instruct performed excellently in multiple benchmark tests.


In the composite benchmark test of Hugging Face Open LLM Leaderboard, DBRX Instruct topped the list with a score of 74.5%, significantly ahead of the second place Mixtral Instruct with 72.7%.


At the same time, in the Databricks Model Gauntlet, an evaluation suite that contains more than 30 tasks and spans six fields, DBRX Instruct also came out on top with a score of 66.8%, which is significantly better than the second place Mixtral Instruct's 60.7%.


②DBRX Instruct has shown particularly outstanding abilities in programming and mathematics-related tasks.


In HumanEval, a task to evaluate code quality, its accuracy rate reached 70.1%, which is about 7 percentage points higher than Grok-1, about 8 percentage points higher than Mixtral Instruct, and surpassed all evaluated LLaMA2-70B Variants.


In the GSM8k math problem solving test, DBRX Instruct also achieved the best result of 66.9%, surpassing Grok-1, Mixtral Instruct and other LLaMA2-70B variants.


It is worth noting that although Grok-1 has 2.4 times the number of parameters than DBRX Instruct, DBRX Instruct still maintains its leading position in the above programming and mathematical tasks.


Even on top of the CodeLLaMA-70B Instruct model, which is specifically designed for programming tasks, DBRX Instruct still performs well on HumanEval.


③DBRX Instruct also performs well in multi-language understanding capabilities.


On the large-scale multi-task language understanding dataset (MMLU), DBRX Instruct continues to demonstrate top performance, scoring as high as 73.7%, surpassing all other models in this comparison.


To sum up, DBRX’s “fine-tuned version” of Instruct has performed well in multiple benchmark tests, especially in programming, mathematics and multi-language understanding.



Databricks once again disrupts the market and strives to disrupt the market


Databricks, originated from the AMPLab project of the University of California, Berkeley, focuses on the development of Apache Spark, an open source distributed computing framework based on Scala, and pioneered the concept of "data Lakehouse".


In March 2023, the company followed the ChatGPT craze and launched the open source language model dolly, and put forward the slogan of "the first truly open and commercially viable instruction tuning LLM (large model)" in the subsequent version 2.0, which marked With Databricks' "second industry innovation".


It is worth mentioning that Jonathan Frankle was the chief scientist of the generative AI startup MosaicML.


Databricks successfully acquired MosaicML for US$1.4 billion in June 2023. This move prompted Frankle to resign as a professor at Harvard University and devote himself entirely to the research and development of DBRX.


Just a few days ago, Musk announced the birth of Grok-1, the largest open source model in history. This event undoubtedly attracted widespread attention in the industry.


Key to Databricks' ability to differentiate itself from the competition is the company's technology integration capabilities and proprietary data.

These two core strengths will continue to drive the creation of new and better model variants.


The DBRX team has 16 experts and selected 4 for model development, while the Mixtral and Grok-1 teams each have 8 experts and selected 2.


This choice gives DBRX over 65 times more expert combination possibilities, significantly improving model quality.


DBRX adopts technologies such as Rotated Position Encoding (RoPE), Gated Linear Unit (GLU) and Grouped Query Attention (GQA) in model development, and uses the GPT-4 tokenizer provided in the tiktoken repository ).


These decisions are the result of in-depth evaluation and scaling experiments by the team.



Reveal next steps for open source model


① The RAG tool will be launched soon. This model is of great significance to its development. At the same time, Databricks has built-in simple and efficient RAG method.


Next, we will work on making the DBRX model the best generator model for RAG to provide users with more powerful support.


②The DBRX model will be hosted on all mainstream cloud environment products, including AWS, Google Cloud (GCP) and Azure.


As an open source model, users are encouraged to use it freely according to their own needs to promote business development and innovation.


③The DBRX model is expected to be provided through the Nvidia API Catalog and supported on the Nvidia NIM inference microservice.


This will bring users a more stable and efficient reasoning experience and further promote business growth and expansion.



Let large model manufacturers see the path to monetization


Databricks’ focus on helping enterprises build, train and scale models that meet their specific needs is far-reaching.


This unicorn team places a high priority on enterprise adoption as it directly relates to their business model.


As part of the LLM release program, Databricks has launched two models under an open license: DBRX Base and DBRX Instruct.


DBRX Base is a pre-trained basic model, while DBRX Instruct is a fine-tuned version for a small amount of interaction.


It’s worth mentioning that DBRX is supported by Azure Database on AWS, Google Cloud, and Microsoft Azure, which means enterprises can easily download models and run them on any graphics processing unit (GPU) of their choice.


In addition, enterprises can choose to subscribe to DBRX and other tools, such as Retrieval Augmented Generation (RAG), to customize LLM through Databricks’ Mosaic AI Model service offering.


The Mosaic AI Model service connects to DBRX via Foundation Model APIs, enabling enterprises to access and query LLMs from service endpoints. This feature provides enterprises with greater customization capabilities and flexibility.


Foundation Model APIs offer two pricing models: pay per token and allocated throughput.


Pay-per-Token pricing is based on concurrent requests, while throughput is billed per hour per GPU instance.


Both rates, including cloud instance costs, start at $0.070 per Databricks unit.


At the same time, Databricks also provides corresponding pricing ranges for different GPU configurations to meet the computing needs of enterprises in different scenarios.


Through the combination of a robust business model and a large open source model, this also provides companies with a ticket to enter the AIGC field.

By using our platform, enterprises can not only reduce the cost of developing generative AI use cases using their own enterprise data, but also not be subject to commercial use restrictions from closed model providers such as OpenAI.



end:


With the rapid progress of large AI models in 2024, innovation and breakthroughs will show exponential growth.


For example, models such as OpenAI Sora, stable diffusion3, stable diffusion3 Tubro, Grok-1, and Claude 3 have been released and open for use.


As the LLM community gradually matures, we have reason to believe that in the near future, every enterprise will have the ability to build proprietary private LLM models in the emerging field of generative AI, and fully explore and utilize the value of enterprise private data .


Reference for some information: Heart of the Machine: "The open source large model throne changed hands again, 132 billion parameter DBRX went online", Xinzhiyuan: "The world's strongest open source model changed hands overnight, 132 billion parameter inference soared 2 times", CSDN: " Databricks' open source 132 billion parameter large model disrupted the situation, Grok and LLaMA all lost", Programming Singularity: "Musk's open source Grok 10 days ago was beaten, 132 billion parameter DBRX came online", open source AI project implementation: "DBRX: The world's most powerful open source large model changes ownership》


The articles and pictures published on this official account are from the Internet and are only used for communication. If there is any infringement, please contact us for a reply. We will process it within 24 hours after receiving the information.



END


Recommended reading:

For business cooperation, please add WeChat:

18948782064

Please be sure to indicate:

"Name + Company + Cooperation Requirements"



Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号