Article count:6059 Read by:18077454

Account Entry

The Survival Guide for AI Chip Unicorns

Latest update time:2024-04-14
    Reads:

Recently, the world’s “AI chip unicorns” have successively announced new developments in their companies and products. Groq claimed that the inference efficiency of its Groq Chip is 10 times that of NVIDIA H100, and sparked heated discussions on social media; Astera Labs, known as "Little NVIDIA", was listed on NASDAQ in the United States on March 20, local time. , the current total market value reaches US$10.43 billion.

In the current AI chip market, NVIDIA has taken the lead, with AMD and Intel following closely, and cloud service providers such as Google and Microsoft have joined the ranks of self-research. In the increasingly fierce competition, the newly entered "AI chip unicorns" can only incubate their own strength and find a suitable path for survival and profitability.

Product positioning: training or inference?

Inference is an application scenario more favored by AI chip start-ups.

OpenAI CEO Sam Altman reported that he would invest in an AI chip company at the end of 2023 when he was involved in a "court fight". Later, it was revealed that he would spend US$51 million to purchase Rain AI's NPU based on the RISC-V architecture for edge use. AI reasoning on the side; Etched.ai’s ASIC chip for large language models focuses on AI reasoning; MatX stated on the official website that “we focus on low-cost large model pre-training and inference”, while adding: “Inference first ".

Inference has become a common choice for most start-ups. Behind this is the consideration of market growth in two different scenarios: training and inference.

In terms of training, the downstream customers of chip companies, that is, the AI ​​market that purchases GPUs or computing chips for large model training, is at risk of saturation.

For new AI companies, the threshold for participating in large model competition is increasing. In the process of "wild growth" of general large models, training data continues to expand, and parameter magnitudes become larger and larger. This also means that training requires AI companies to prepare a larger number of computing power chips. The computing power threshold will lead the future large model landscape to the convergence stage of oligopoly competition - can we invest US$500 million like Elon Musk to purchase tens of thousands of NVIDIA H100s to train our own large models or chat robots? ? This is a question that all AI companies need to evaluate before entering the market.

For cloud service providers that have already reached a certain scale, they are the ones with more choices. Companies with deep software development technology and financial support, such as Google and Microsoft, can also choose to develop their own computing chips, which are more suitable for their own AI products.

Compared to training, AI chip startups have more opportunities in reasoning. In the process of transitioning from "refining large models" to "using large models", using 8 NVIDIA H100 or AMD MI300 for inference is not cost-effective, and there are also problems such as delay and energy consumption. These have become a problem for downstream cloud service providers. The focus of the reasoning phase. Small chip start-ups can open up breakthroughs through these pain points and find a place in the fierce competition.

Of course, not all companies focus only on inference, and some companies are trying to use other ways to solve the problems caused by large model training. Compared with the mature solution of GPU, several companies have more imaginative ideas in training.

Cerebras Systems launches a huge chip WSE-3. It is understood that WSE-3 has more than 4 trillion transistors and a silicon area of ​​46225mm². Compared with connecting 8 or more H100s through NVLink, maintaining integrity can reduce interconnection costs and power consumption.

Area comparison between WSE-3 and traditional GPU (Image source: Cerebras Systems)

Extropic hopes to use thermodynamics and information technology to build an AI supercomputer, which has now entered the hardware assembly stage. Lightmatter launches photon processor Envise. Compared with traditional silicon-based chips, photonic processors can achieve a balance between high power consumption and high performance. "Human beings are investing a lot of energy in the development of AI, and this energy consumption is increasing rapidly, and chip technology has reached the point where it cannot solve this problem." Lightmatter stated on its official website. Although conceptually fantastic, the products of the two companies are still some time away from being launched.

Facing big manufacturers: competition or cooperation?

An interesting phenomenon is that companies targeting the field of reasoning have made NVIDIA's products their main targets. The second problem faced by AI chip start-ups is how to measure their relationships with major manufacturers such as Nvidia.

Etched.ai's ASIC chip Sohu is designed for large model inference. The person in charge of Etched.ai said: "By burning the Transformer architecture into Sohu, we are creating the world's most powerful Transformer inference server." Etched.ai's official website shows that under the premise of also using 8 chips, Sohu's inference The efficiency is higher than both H100 and A100.

The number of Tokens generated by Sohu per second is much higher than that of H100 and A100 (Image source: Etched.ai)

The LPU (Language Processor) launched by Groq claims that its inference performance is 10 times that of H100, and the cost is one-tenth of H100.

In comparison with NVIDIA, d-Matrix's product Corsair has better performance in terms of data throughput, latency, and cost. It is understood that Corsair uses PCIe5 to interconnect 8 chiplets, which have about 130 billion transistors, and the bandwidth between chiplets reaches 8TB/s, which can ultimately save about 90% of costs. "All of our hardware and software are built to accelerate Transformer models and generative AI." said d-Matrix CEO and CEO.

In addition to participating in competition, some companies also choose to become partners of large manufacturers and serve as a link in the supply chain.

The successfully launched Astera Labs products focus on devices that connect data and memory. Jitendra Mohan, one of the founders of Astera Labs, believes that with the development of AI and machine learning, in addition to computing power, data connectivity will also be a key issue. Astera Labs’ official website introduces itself as “connectivity built specifically for AI and cloud infrastructure.” Its main products include Aries PCIe/CXL smart timers, Leo memory controllers, and Taurus active smart cable modules that help enterprises connect chips , storage and servers to build a GPU computing cluster. Because of this, chip and cloud service providers such as Intel, Google, and Amazon will become its potential customers.

Astera Labs’ memory controller (Image source: Astera Labs)

In the current market environment, whether they compete directly with large manufacturers or become part of the supply chain, these start-ups must demonstrate their own differentiated characteristics to ensure survival. In other words, companies need to continue to innovate.

Facts have proved that richer design ideas are emerging in the current AI chip field. Sohu of Etched.ai chose to burn the Transformer architecture on the chip (Etched means "etching"), and Groq uses SRAM and TSP (Tensor Flow Processor) to improve inference efficiency. New design concepts are emerging one after another, and differentiated innovation cannot stop here. As an ASIC, Sohu can adapt to the optimization and upgrading of the Transformer architecture, and how Groq's chips can handle the previously controversial cost issue still requires time and further market development. test.

Development ecology: self-research or joint venture?

If product quality determines whether AI chip start-ups can gain a foothold, then the completeness and solidity of the development ecosystem will determine whether the company can develop in the long term.

CUDA has always been regarded as NVIDIA's moat. During its long-term use, CUDA has secretly raised the migration threshold for developers. After Nvidia banned third-party hardware from being compatible with CUDA in March, its market monopoly was consolidated.

Facing the current ecological competition, on the one hand, start-ups are trying to conduct self-research, and d-Matrix launched the open source software stack Aviator. It is understood that Aviator uses open source software to enable users to easily deploy models and integrate system software into the inference server for process generation and extended communication. Modular releases the Mojo programming language, which can support programming for a variety of hardware such as CPU, GPU, TPU and ASIC.

On the other hand, the practices of AMD and Intel can also provide reference ideas.

AMD's ecological consideration is to "convenience developers to migrate and use." The ROCm platform is an open source computing platform and ecosystem developed by AMD, aiming to provide developers with a cross-platform programming model. ROCm provides APIs and function libraries similar to CUDA, so that code written for NVIDIA GPUs can run on AMD GPUs with only minor modifications, thereby reducing the migration cost for developers to transfer programs from CUDA to ROCm.

In September last year, Intel jointly established the UXL Foundation (Unified Acceleration Foundation) with companies such as Arm, Fujitsu, Google, Imagination, Qualcomm and Samsung to build an open ecosystem in the form of an alliance. This move was also regarded as an ideal by the industry. A joint action to get rid of NVIDIA’s CUDA ecological monopoly.

UXL Foundation member (Image source: Intel)

“The foundation’s goal is to unite the accelerator ecosystem around open standards and open source software so developers can build applications that can target multi-vendor, multi-architecture systems—now and in the future. If you don’t write software Target processors need to be considered, then we've done our job," said Rod Burns, UXL Ecosystem Vice President and Foundation Steering Committee Chairman.

It is reported that the foundation is based on the project specifications of oneAPI, a developer interface launched by Intel. "The specification and projects, provided by Intel for the foundation, cover the basics developers need when writing code. The projects will operate under the UXL Foundation's principles of open governance, meaning all contributions are treated equally, and the foundation Council members will also have a say in public proposals and discussions regarding the future of the project," added Rod Burns.

In the early morning of April 10, as Intel released Gaudi 3, the competition for AI chips became more intense. Leading companies were "fighting", start-ups were in trouble, and even Nvidia CEO Jen-Hsun Huang was "worried about whether the company will go bankrupt" every day. Facing a more complex environment, "AI chip unicorns" are also relying on their own resilience to continue exploring, seeking survival, change, and prosperity.


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号