Article count:1318 Read by:3201801

Account Entry

Depth丨The main battlefield of AI chips: from training to inference?

Latest update time:2024-02-28
    Reads:

·Focus: Artificial intelligence, chip and other industries

Welcome all guests to pay attention and forward






Foreword :
The importance of artificial intelligence reasoning has become increasingly prominent. The core technology behind the efficient operation of large end-side models and AI software is reasoning. In the near future, the main market of global chip manufacturers will fully shift to the field of artificial intelligence reasoning.


Author | Fang Wensan
Image source | Network

The rise of Groq LPU, the main battlefield of AI chips is turning


Compared with AI training, AI reasoning is more closely related to user terminal scene requirements, and the large-scale model after training needs to be actually applied to the scene through AI reasoning.


However, the current AI inference solution based on NVIDIA GPUs is relatively expensive, and performance and latency issues affect user experience.


Before the debut of Groq LPU, the training and inference of large-scale AI models relied on NVIDIA GPUs and adopted the CUDA software technology stack.


However, the rapid rise of Groq LPU has caused the market to speculate that the main battlefield of AI chips may shift from training to inference.


The Groq LPU inference card solves performance and cost issues from the hardware level, making large-scale deployment of AI inference possible and promoting the implementation of more AI inference applications.


At the same time, the growth in demand for AI inference will further promote the development of cloud inference chips, especially more new-generation dedicated inference chips that can replace NVIDIA GPUs will be used in data centers.


During the inference phase, the AI ​​model needs to run at extreme speed, aiming to provide more Tokens to end users, thereby speeding up the response to user instructions.



Driven by demand, the focus shifts from training to inference

The field of AI inference is closely related to the needs of application terminals such as large-scale consumer electronics. Therefore, the focus of industry development is expected to shift from [training] to [inference].


Compared with AI training, the GPU parallel computing power requirements in the inference field under the application background of [massive data bombardment] are much lower than those in the training field.


The inference process involves decision-making or identification of trained models. The CPU-centered central processing unit, which is good at handling complex logic tasks and control flow tasks, is sufficient to handle many inference scenarios efficiently.


Currently, the AI ​​market is mainly focused on the [training] stage of using big data to train large language models, and NVIDIA has become the main beneficiary in this field.


However, as large AI models become leaner, run on devices and focus on inference tasks, the market focus of chip manufacturers will shift to [inference], that is, model application.


Looking forward to industry development trends, AI computing load is expected to gradually migrate from training to inference, thus lowering the threshold for AI chips.


Chip companies covering areas such as wearable devices, electric vehicles and the Internet of Things are expected to fully penetrate into the field of AI inference chips.


Data centers are also expected to gain interest in processors dedicated to inference tasks on trained models, collectively driving the inference market to grow beyond the training market.


It is expected that within one to two years, large AI models will generate huge demands for computing power/AI chips on both the training and inference sides.


If large models are widely commercialized in the future, the demand for computing power/AI chips on the inference end will be significantly higher than that on the training end.


After a two- to three-year data center upgrade cycle for AI training, the market will see more sales from inference chip suppliers.



AI reasoning is becoming more and more popular, and companies and capital are also moving towards reasoning.


AMD CEO Su Zifeng believes that the size of the large model inference market will be far larger than the model training market in the future.


Intel CEO Kissinger said: When inference occurs, there is no CUDA dependency. It does not mean that Intel will not compete in the training field, but fundamentally, the inference market is the focus of competition.


Zuckerberg believes: It is clear that the next generation of services needs to build comprehensive general intelligence, build the best AI assistants, provide enterprise creators and more to make progress in all areas of AI - from reasoning to planning to coding to memory. and other cognitive abilities.


As enterprise AI applications gradually mature, enterprises will transfer more computing power from model training to AI inference work.


In terms of chip requirements, training chips focus on versatility, while inference chips are highly bound to large models that have been trained.


As the application of large models continues to deepen, inference requirements have gradually migrated from the cloud to the edge/terminal, and have shown a trend of customized development.


In the global AI chip market, reasoning first and then training has become the mainstream path, such as the AI ​​chip company Habana acquired by Intel and many AI startups in my country.


Behind this choice is the catalytic effect of the downstream market: as AI model training gradually matures and AI applications are gradually implemented, the cloud inference market has gradually surpassed the training market.


Artificial intelligence computing resources are gradually shifting from training large-scale AI models to inference, so a more balanced infrastructure needs to be built between the client, edge and cloud.


It is estimated that there are more than 18 chip design start-ups around the world dedicated to AI large model training and inference, with a cumulative financing of more than 6 billion US dollars, and an overall valuation of more than 25 billion US dollars.


These startups are backed by powerful investors such as Sequoia Capital, OpenAI, Wuyuan Capital, ByteDance and others.


At the same time, technology giants such as Microsoft, Intel, and AMD are also increasing their [core manufacturing] efforts, causing Nvidia to face unprecedented competitive pressure.



Competing with NVIDIA, companies are making breakthroughs in niche areas


In order to reduce the cost of model training and inference, the industry continues to explore more possibilities to achieve high energy efficiency and high performance chip architecture.


Looking at technology giants such as Meta, Amazon, and Alphabet, they are all developing their own AI chips.


These chips are more specialized and have clear goals. In comparison, Nvidia's chips are more versatile.


①AMD: The newly released MI300 includes two series. The MI300X series is a large GPU with the memory bandwidth required for leading generative AI and the training and inference performance required for large language models;


The MI300A series integrates CPU+GPU and is based on the latest CDNA3 architecture and Zen4 CPU, which can provide breakthrough performance for HPC and AI workloads.


In December last year, in addition to launching its flagship MI300X accelerator card, AMD also announced that the Instinct MI300A APU has entered the mass production stage and is expected to begin delivery this year. It is expected to become the world's fastest HPC solution after launch.


In July last year, Intel released Habana Gaudi2, an AI chip targeting the Chinese market using a 7-nanometer process in Beijing. The chip can run large language models and accelerate AI training and reasoning.


Its performance per watt running ResNet-50 is approximately twice that of NVIDIA A100, and its price/performance ratio is 40% higher than NVIDIA-based solutions in the AWS cloud.


②Intel: Announced cooperation with Arm to deploy its Xeon products on Arm CPUs, and launched the AI ​​inference and deployment operation tool suite OpenVINO.


In addition, open source models such as LIama2 are being released one after another, prompting more companies to directly use these models, which can be applied with only AI inference chips, thus reducing the need for computing power training chips.


Intel launched new computer chips late last year, including Gaudi3, an artificial intelligence chip used to generate artificial intelligence software.


Gaudi3 will launch this year and will compete with chips from rivals such as Nvidia and AMD to power large and power-hungry AI models.


③Meta: Plans to put into production self-developed chips this year to reduce AI accelerator card procurement costs and reduce dependence on Nvidia.


The chip consumes only 25 watts of power, which is 0.05% of the power consumption of the same NVIDIA product, and uses the RISC-V open source architecture. Market sources revealed that the chip is produced by TSMC’s 7nm process.


Meta recently announced that it has built its own DLRM inference chip and has widely deployed it.


Known internally as [Artemis], this ASIC's main performance is focused on inference and is based on the second-generation internal chip product line announced last year.


Zuckerberg revealed the updated roadmap of the Meta artificial intelligence plan in the video: Meta will build a new Meta AI roadmap around the upcoming Llama3, and is currently advancing the AI ​​training of Llama3.


Llama3 will compete with Google’s recently released Gemini model, OpenAI’s GPT-4, and the upcoming GPT-5 model.


④NVIDIA: In August last year, NVIDIA announced the launch of a new generation of GH200 Grace Hopper super chip. The new chip will be put into production in the second quarter of this year.


GH200 and GH200NVL will use Arm-based CPU and Hopper to solve the training and inference problems of large language models.


Nvidia plans to launch B100 based on x86 architecture to replace H200, and GB200, an inference chip based on ARM architecture, to replace GH200.


In addition, NVIDIA also plans to replace L40S with B40 products to provide better AI inference solutions for enterprise customers.


According to Nvidia's plan to release the Blackwell architecture this year, the B100 GPU chip using this architecture is expected to significantly increase processing capabilities.


Preliminary evaluation data shows that compared with the existing H200 series using Hopper architecture, the performance is improved by more than 100%.


⑤Amazon: Early last year, AWS released Inferentia2 (Inf2), which is specially built for artificial intelligence. The computing performance is increased by three times, the total accelerator memory is increased by 25%, and it supports distributed reasoning.


Through direct ultra-high-speed connections between chips, Inf2 supports distributed inference and can handle up to 175 billion parameters, making it the most powerful in-house manufacturer in the AI ​​chip market today.



Single-point breakthroughs are rewarding, and domestic products are expected to tie up


At the same time, my country's Huawei, Tianshu Zhixin and other AI chip manufacturers are also actively deploying large model training inference and AI computing power products.


At present, the products of Chinese manufacturers such as Cambrian, Suiyuan, Kunlun Core, etc. have the strength to compete head-on with the mainstream Tesla T4 in the market: its energy efficiency ratio is 1.71TOPS/W, which is slightly different from T4's 1.86TOPS/W.


Denglin Technology, Tianshu Zhixin, and Suiyuan Technology that choose GPGPU have achieved comprehensive coverage of training and reasoning, while ASIC chips such as Pingtou Ge need to focus on reasoning or training scenarios.


①Yizhu Technology: Based on the CIM framework and RRAM storage media, the [all-digital storage and computing integrated] large computing power chip improves computing energy efficiency by reducing data transfer, and at the same time ensures computing accuracy with the help of digital storage and computing integrated method, suitable for the cloud AI inference and edge computing.


② Cambrian: As Cambrian’s third-generation cloud product, Siyuan 370 uses a 7-nanometer process technology and becomes my country’s first AI chip using Chiplet technology. Its maximum computing power can reach 256TOPS (INT8).


Cambrian mainly uses ASIC architecture. Although its versatility is poor, its computing power can surpass GPU in specific application scenarios.


Test results show that the performance of the 590 is close to 90% of the performance of the A100; the 590 basically supports mainstream models, and the overall performance is close to 80% of the A100 level.


In addition, Siyuan 370 is also Cambrian’s first AI chip using chiplet technology, integrating 39 billion transistors and with a maximum computing power of up to 256TOPS (INT8).


③Pingtou Ge: In August last year, Pingtou Ge released the first self-developed RISC-V AI platform, which supports the operation of more than 170 mainstream AI models, pushing RISC-V into the era of high-performance AI applications.


At the same time, Pingtou Ge announced a new upgrade of Xuantie processor C920. C920 can perform GEMM calculations 15 times faster than the Vector solution.


④ Biren Technology: Its BR100 series is developed based on its own original chip architecture, using a mature 7-nanometer process, integrating 77 billion transistors, with a 16-bit floating-point computing power of more than 1000T, and an 8-bit fixed-point computing power of more than 2000T, single chip The peak computing power reaches the PFLOPS level.


At the same time, BR100 combines a number of cutting-edge chip design, manufacturing and packaging technologies in the industry, including Chiplet, and has the advantages of high computing power, high energy efficiency, and high versatility.


⑤Suiyuan Technology: Since its establishment more than 5 years ago, it has established two product lines, cloud training and cloud reasoning, and developed Yunsui®T10, Yunsui®T20/T21 training products, Yunsui®i10, Yunsui®i20, etc. Reasoning products.


According to media reports, Suiyuan Technology’s third-generation AI chip product will be launched early this year.


⑥Huawei: Ascend 310 is a low-power chip for inference and edge computing scenarios. It is the most powerful AI SoC in China for edge computing scenarios.


The Ascend 310 chip can achieve on-site computing power of up to 16Tops, supporting the simultaneous identification of 200 different objects including cars, people, obstacles, and traffic signs; it can process thousands of pictures in one second.


Huawei's Ascend series of AI chips have a unique advantage, which is that they adopt a unified and scalable architecture independently developed by Huawei.


This architecture achieves full coverage from extremely low power consumption to extremely high computing power scenarios, making one development applicable to deployment, migration and collaboration in all scenarios, thus significantly improving software development efficiency.



end:


As large models are increasingly used in various scenarios, the importance of the inference link has become increasingly prominent.


Therefore, we need to pay attention to the computing requirements and system configuration of inference chips to reduce costs and improve ease of use, thereby promoting the rapid popularity of large models in various fields.


Reference for some information: Hard AI: "12 Unicorns Want to [Overtake] NVIDIA in a Corner", Electronic Engineering Special: "In which direction will AI training and reasoning develop?" ", Semiconductor Industry Observation: "Meta is accelerating to abandon Nvidia? Self-developed inference chips will be deployed this year", Global Semiconductor Observation: "A review of 8 domestic AI chip manufacturers", Mantan AI: "The next heat wave belongs to the inference chip market", Semiconductor Industry Horizon: "Competition intensifies for AI chips this year"


The articles and pictures published on this official account are from the Internet and are only used for communication. If there is any infringement, please contact us for a reply. We will process it within 24 hours after receiving the information.



END


Recommended reading:

For business cooperation, please add WeChat:

18948782064

Please be sure to indicate:

"Name + Company + Cooperation Requirements"



Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号