What should AI chips compete on?

Latest update time：2022-01-30

Reads：

In the past few years, there has been a global AI chip startup boom around artificial intelligence, and many so-called "Nvidia killer" chip startups have emerged. The main reason behind this is that Nvidia, with its leading GPU advantage, has captured most of the market for AI chips (especially cloud AI chips) that everyone is paying attention to.

Although these "challengers" have tried their best, as Nvidia founder Jensen Huang said in an earlier media interview: "There are 'Nvidia killers' every year, but none of them are successful." This situation, coupled with the extensive discussions in recent years about the difficulty of implementing AI chips, has forced everyone to think deeply about the focus of AI chips.

In response to this issue, Lu Tao, President of Greater China and Global Chief Revenue Officer of Graphcore, a leading AI chip startup, shared his views in a recent interview with Semiconductor Industry Observer and other media.

What do AI chips compete on?

Founded in 2016, Graphcore debuted with its unique IPU. According to the classification of the company's CEO Nigel Toon, AI hardware is divided into three categories: very simple miniaturized acceleration products, ASICs, and programmable processors. The IPU launched by the company belongs to the third category.

According to the latest data released by Graphcore, the company's latest IPU has caught up with or even surpassed its competitors in performance. This can be seen in the earlier report of Semiconductor Industry Watch, "Latest MLPerf Training 1.1: Graphcore's IPU AI Performance Outperforms Nvidia GPU". It can be seen that if we only look at the hardware, Graphcore has enough confidence to challenge Nvidia.

But Lu Tao said: "For a chip company, especially one that makes computing chips, the ecosystem is very important." He further pointed out that there are some differences between AI computing and the traditional CPU market.

"In the past five or ten years, the business done by CPU is either Web services, database services, or storage services. These service types are very clear. But in the field of AI, it is relatively difficult and takes a certain amount of time to catch up in terms of ecology, because AI as a whole is highly dynamic." Lu Tao continued.

We can also understand the logic behind Lu Tao's remarks from Nvidia's performance in the ecosystem.

At the end of 2007, Nvidia released its parallel computing platform and programming model CUDA. At that time, many people, including the company's investors, did not understand the logic behind Huang Renxun's decision. However, after more than a decade of development, CUDA has become the key to the chip giant being difficult to replace.

Many AI practitioners have told me that it is relatively easy for them to switch from GPU to other hardware platforms, but the various libraries and models built on CUDA on GPU are the reason why it is difficult for them to leave GPU. NVIDIA also revealed at the Fall GTC conference held in November last year that as of that time, the number of NVIDIA developers had approached 3 million, and CUDA had been downloaded 30 million times in the past 15 years, with 7 million downloads a year.

Lu Tao also said that in the field of AI computing, it will be a long process to convince customers to use "alternative" solutions.

In his opinion, customers are not particularly concerned about the application features of the product, but rather the performance of the product and whether it has strong advantages. After that, customers will be concerned about whether the software migration is difficult and whether it will take a long time. Even if the AI model can be migrated, customers will still be concerned about whether the solution can be deployed and scheduled in large quantities. This is why Lu Tao believes that this will be a long and complicated process.

However, we also see many other opportunities, mainly because there are many innovations and new research. "For example, ViT, there is now a large category of models based on Transformer computer vision, but Transformer was originally used as the underlying technology for natural language processing," Lu Tao said. He also pointed out that we still need to make some predictions before we have the opportunity to go from following the leader to catching up with the leader. "No one dares to say that their judgment is definitely accurate, but we still have to be brave enough to make this decision and take this risk," Lu Tao said.

This adventurous spirit is also the reason why Graphcore can support some models better than GPU. "Prediction + taking a little risk + a little luck may lead to a relatively leading position in the field." Lu Tao concluded.

Graphcore's "hard and soft approach"

After sharing some thoughts on the challenges of AI chip implementation, Lu Tao introduced to me how Graphcore builds the core competitiveness of the company's products along the above ideas. As for the progress in hardware, it has been deeply introduced in the article mentioned above, so we will not repeat it here.

As for software, Graphcore has become more mature over the past year.

As shown in the figure above, according to Lu Tao, the middle part is mainly the Poplar SDK part: the purple part is related to the hardware and is the driver, the pink part is the Poplar protocol stack, and the light pink part is the software in the middle layer between Poplar and the machine learning framework (including XLA, graph compiler and PopART, etc.). At the framework level, Graphcore also added some "new faces" last year. For example, PyTorch Lightning, Baidu PaddlePaddle and Hugging Face were all released in 2021.

On the developer front, Graphcore introduced Jupyter Notebook last year, allowing programmers to use the company's development platform in the same way as using Notepad.

Graphcore also invests in AI applications and developer ecosystems including visualization tools to help users visualize and optimize applications. Graphcore also provides system-level software to help developers easily implement everything from hardware management to IPU virtualization to system-level cluster and task scheduling.

In addition, Graphcore has expanded several partners in 2021, including Weights & Biases, Spell and Gradient.

Graphcore's investment in 2021 also includes some updates at the deployment level. First, thanks to their efforts, VMware's Project Radium will support Graphcore IPU as part of its hardware disaggregation initiative; secondly, Docker and Kubernetes also support IPU, which is also the progress they have made in 2021.

"The company also added four new partners in 2021, Atos, NEC, Supermicro and 2CRSi. Together with the existing Dell and Inspur, we now have six OEM partners." Lu Tao told reporters. In 2021, Graphcore also established cooperation with internationally mainstream commercial storage equipment manufacturers such as DDN, Pure Storage, Vast Data and WekaIO.

Based on the progress made in these software and hardware, Graphcore also expanded its cooperation with many companies in 2021.

For example, we cooperated with Anjie Zhongke to use IPU for weather forecasting, precision irrigation, disaster prevention and mitigation, etc.; we cooperated with the University of Paris on cosmology applications; we cooperated with Deepin Technology to complete the migration of molecular dynamics simulation software DeePMD-kit to IPU hardware, and explored scientific computing, drug design, material design and new energy scenarios based on molecular dynamics simulation; we cooperated with the European Centre for Medium-Term Weather Forecasts to carry out some weather forecasting applications; in finance and insurance, Oxford-Man uses IPU for stock price prediction, and Tractable uses IPU to accelerate accident and disaster recovery; in telecommunications, we cooperated with Korea Telecom to launch IPU Cloud; regarding sustainable development of urban environment, we launched a cooperation based on IPU with Shengzhe Technology to carry out applications related to urban sustainable development; in medicine and life sciences, we cooperated with Stanford University School of Medicine to use IPU to conduct some research and exploration with "medical + privacy computing" as the core direction.

"In 2022, we will cooperate with domestic public cloud vendors to release some IPU products, and will also release some new hardware products. We will also pay attention to the direction of autonomous driving." Lu Tao revealed.

More intense competition in the future

Although Graphcore has made considerable achievements in the past year, as Lu Tao said, the company still faces considerable challenges. For example, he said that the company has also done a lot of work in ecology and software to reduce the migration work of customers, and the gap is indeed narrowing. However, some users want to migrate without modifying the code, which means that related challenges will continue to exist.

In my opinion, this is not just a problem faced by Graphcore, but a common problem faced by newcomers in the entire AI chip field. From Lu Tao's introduction, we can see that Graphcore is in a relatively leading position.

Lu Tao said that to compare today's giants to being on Mount Everest, there are seven steps below: the first step is to build a team; the second step is to preach the concept; the third step is to have chips; the fourth step is to have chips and be able to send samples to customers for testing; the fifth step is to have products and be implemented; the sixth step is to have products and many implementations; the seventh step is to have a large market share.

"I think we are currently in the fifth to sixth stage," said Lu Tao. In Lu Tao's view, although the competition has lasted for many years, this fierce competition will continue this year. He said that because different companies have different strategies, many companies will take different paths starting from 2022. This may be the reason why he sees that the competition will continue.

In order to continue to climb to the top, Graphcore has updated its business strategy: the first thing to do is to plan the product roadmap and conduct cutting-edge research; the next step is to make good products; and then focus on revenue. "Our work goals can be sorted by priority, which can be divided into 'someone uses my product', 'someone uses my product to serve his business', and then it is the turn of performance and profit." Lu Tao told reporters.

"In the past, people's traditional perception of Graphcore was that it was a challenger to the current market leader. But in 2022, we hope to change this perception and establish our own leadership in multiple dimensions such as performance, innovation, TCO and software usability," Lu Tao finally said.

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 2937th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Latest articles about

■SiC giant, rebirth: how to predict the future?

■Apple chips may hit Qualcomm hard

■Chip cost per car: soaring to $1,000

■TSMC 2nm, important information

■Huang Renxun's latest views

■The risks of this type of chips that are promising have increased significantly!

■NPU, how to see it?

■Storage giants are abandoning DDR 4

■Intel, why?

■Nvidia will definitely be disrupted