Article count:25239 Read by:103424336

Account Entry

Four new DPUs share their thoughts

Latest update time:2022-07-14 12:22
    Reads:

Source: Content by Semiconductor Industry Observer (ID: icb ank) reprinted from the public account Enterprise Storage Technology, author: Winnie Shao, thank you.

Recently, four leading startups in the popular DPU track shared their industry insights and product plans.


Here, let us summarize how they view this market with unlimited potential.

1. Dayu Zhixin: Driving underlying innovation from business needs

As the first speaker, Dayu Zhixin made a good start. Dayu Zhixin positioned DPU as the third engine of cloud computing and described its product goals very clearly: to provide easy-to-use products for a wide range of markets.

The first speaker was responsible for popularizing the history of DPU . In the same way, Dayu analyzed and deduced the product logic behind DPU from the perspective of real products and real applications in the industry , which is consistent with the fact that Dayu's founding team all came from large public cloud companies and therefore they are the team that understands the business best.

Li Shuang, CEO of Dayu Zhixin and former general manager of Meituan Cloud, once shared the team's product strategy, "driving the innovation of underlying hardware such as chips based on upper-level scenario needs." This is also the logic behind the emergence of DPU products. DPU has developed from public cloud business and is a product deduced from business scenarios, rather than a product actively defined by semiconductor companies and then promoted to customers.

This product logic also explains why Dayu Zhixin's first-generation product is composed of a multi-core ARM general-purpose processor SOC . It was not until the second generation that FPGA was added to accelerate the higher-bandwidth IO interface. The SOC architecture of the general-purpose ARM processor means that this is a product that focuses on solving business offloading (rather than a product that accelerates business and reduces costs). This is the same development experience as AWS Nitro , the most successful case of DPU . Don't think that DPU is a hardware product. This highly programmable chip works more in software. Pensando , which was just acquired by AMD for US$ 1.9 billion, has 2/3 of its employees as software engineers.


Dayu Zhixin did not disclose any information about their third-generation products during this sharing session, only saying that they had started research and development at the end of last year. I believe that Dr. Jiang Xiaowei, a member of the HPCA Hall of Fame who joined in June this year , will deliver a good answer.

2. Zhongke Yushu: Storage and Acceleration

As the second speaker, Zhongke Yushu thanked Dayu Zhixin for the comprehensive introduction of DPU , and went straight to the storage solution of NVMeOF . I don’t know if it was a tacit understanding. In the four technical sharing sessions, Dayu focused on the business panorama, Zhongke Yushu focused on storage, Yunbao talked about the control part in more detail, and the last Yisixin focused on the network around P4 . There was no scene of gunpowder smell when you talked about the control solution of OpenStack , I compared it with K8s , and when you introduced the 25G card, I brought out the 100G card.

Zhongke Yusu spent quite a while introducing the NVMeOF protocol, evolution, and implementation difficulties. If you are interested in NVMeOF , it is well worth a listen.

Of course, I am more interested in the latest KPU 2nd generation architecture demonstrated by Zhongke Yusu , which is mentioned in the famous DPU white paper. This time it is a more detailed interpretation.

I don't quite understand the "software-defined" accelerator technology of Zhongke Yusu. After asking for advice, I understand that "soft" means that the KPU 2nd generation uses a large number of dedicated processors, and the data processing is completed by software code on the dedicated processing core. The "soft" here reflects the programmability. The "soft definition" of the "hardcore" accelerator is reflected in the programmable scheduling order of the hardcore. It doesn't matter whether it is programming a dedicated processor or programming a hardcore schedule, both are programmable and software-defined, although it sounds a bit difficult.

The on-chip network FlashNOC in KPU is similar to the AXI cross bar structure. The 128- core network engine PPE programmable with P4 , plus the network acceleration engine NOE that has been hardened for more than two years to fully harden the TCP/IP stack , takes into account the performance and flexibility requirements of the network data plane. The database / big data acceleration engine DOE is the most unique acceleration engine, which is not available in general DPUs . I have only seen it in IBM's high-end CPUs .

In short, in the field of accelerators, there are surprises but no surprises. On the one hand, Zhongke Yushu was incubated in the Institute of Computing Technology of the Chinese Academy of Sciences and the Key Laboratory of Computer Architecture, which researches the design of special processors, and is familiar with it; on the other hand, Zhongke Yushu's product theory is that DPU should do computing tasks that " CPU cannot do well and GPU cannot do", and accelerators are an effective way. This concept is good and bad. Accelerators are a place where it is easier to make a difference, but they are also walking in the opposite direction of general standardization. This direction requires strong industry leadership to control.



KPU 2nd generation has been taped out and will be released in the second half of the year. It is the earliest chip to be released among the four companies, although it does not integrate an ARM processor and is not yet a SoC solution. I hope that after the release of the chips, I can read papers on FlashNOC and PPE . The DPU white paper led by Zhongke Yushu is a first-class white paper in the industry, and the level of the paper supported by actual products should not be wrong.

3. Yunbao Intelligence: High-performance chips drive infrastructure innovation

Yunbao Intelligence 's historical interpretation of DPU is that the bandwidth of data centers has been upgraded from 10G/25G to 100G , but the computing power of servers has not kept up with the development of the bandwidth of data centers, thus forming a scissors gap. In addition, the CPU, which has always occupied a core position, is better at serial complex processing and is not good at large-scale parallel fixed network data processing. The technological development itself is the reason for this.

Compared with Dayu Zhixin, which started from the public cloud business, Yunbao Intelligence is more like following the perspective of traditional chip companies, promoting the innovation of underlying hardware from the bottom up. But Yunbao Intelligence is definitely not a traditional chip company. It has always emphasized "software-defined chips". The similarity between this and Zhongke Yusu's "software-defined accelerator technology" is that the flexibility of the DPU architecture is achieved through software programmability. The difference is that Yunbao has been building a software-defined chip architecture since the first day of design, starting from the demand.

This technology sharing roughly follows the company's information sharing principles. The spokesperson of Yunbao Intelligence did not directly state their positioning of DPU , a world-class extremely complex high-end chip, but rather shared a simple architecture diagram quite conservatively, spending more time on the software framework. The positioning of this high-end complex chip is also consistent with the founder of Yunbao Intelligence's deep background in semiconductor companies.


Yunbao Intelligence released an FPGA - based 25G network card product last year. In terms of software stack, it will be seamlessly connected with subsequent 100G DPU products and can be used as a low-speed preview version.


During the entire technical sharing session, Yunbao listed a list of challenges that DPU needs to solve, which really reflects their understanding of the pain points of cloud computing business. Although they did not share how they will solve them one by one. I hope that the Yunbao DPU chip delivered next year can bring the final answer. A company that intends to produce the most world-class chip has not yet announced any hardware indicators, which is also quite exciting.

4. Yisixin: P4

To be fair, as the last speaker, it is not easy for Yisixin to avoid the content that has been shared in the previous key points. Yisixin was very careful to set its theme as P4 . P4 , a domain-specific programming language, does provide a packet processing language that simplifies hardware design and software programming. P4 was originally designed for switches. After its development and expansion, its coverage has expanded to include all network devices from the core to the edge. It is particularly suitable for the needs of overlay networks that are constantly evolving and have a strong tendency to be customized. Even better, if the server network card side supports P4 and the interconnected switch also supports P4 , in theory, a linkage of the switch + server data plane of the entire data center can be formed. This is the concrete presentation of the data center as a computer .

Although it was David Patterson who popularized the term DSA and was promoted by people in the AI ​​field, it was people in the network field who made great achievements. P4 is an outstanding Domain-Specific language, a simple match-action model that accurately describes the data packet processing model, and strikes a good balance between abstraction and concreteness. It is both targeted at the network and independent of the protocol, and it abstracts the data plane very well. Moreover, it has evolved over the past decade, with Intel buying Barefoot and adding the P4 engine to its own IPU , and AMD buying Pensando . P4 follows these two major manufacturers and is on the road to becoming the de facto standard language for the data plane.

The current FPGA version of Yisixin , the future P4 engine, the current FPGA accelerator of Dayu Zhixin, the DSA network engine of the next-generation SOC , the NP -type PPE of Zhongke Yushu , and the fully programmable DPU engine of Yunbao can all perform similar P4 functions. In theory, the efficiency ratio of CPU : NPU : FPGA : DSA is 1:10:20:80 . The actual performance and power consumption depends on the implementation capabilities of each company. Let us wait for the test data.

In this technical sharing session, Yisixin generously released the actual (not estimated) 3- layer forwarding performance on the current 2X25G FPGA card , which is worthy of encouragement, and the actual measured numbers are also very good.


Conclusion:

Whether starting from business offloading isolation, promoting hardware innovation from top to bottom, or accelerating business from bottom to top with DSA hardware, the industry has reached a consensus on the chip architecture of DPU , which requires four major subsystems: a general-purpose CPU subsystem, a programmable fast data plane, NVMe/RDMA/ security / compression and other acceleration engines, plus a high-speed IO and storage interface subsystem.

The acceleration engine may be the part that can best differentiate the designs of different manufacturers, and it will also be the key technical point that determines performance and flexibility. However, the acceleration engine is also a double-edged sword. If the hardware is only developed but the software ecosystem fails to keep up, the effect will be zero. The first generation of SmartNICs that focused on acceleration , such as LiquidIO from Cavium and Marvell , and Stingay from Broadcom, did not end well.

Of course, such a powerful DPU chip will not only appear in the form of a network card. As the pattern opens up, more product forms will appear, such as firewalls, load balancers, 5G RAN controllers, switches, etc. For example, Asterfusion 's programmable switch is a super deluxe version of a P4 Switch+DPU .


The muscles that everyone has shown in white papers /PPTs/ live broadcasts will eventually be put into practice in the chassis and on the racks. Talk is cheap , show me your chips .



Note : This article only represents the author's personal views and has nothing to do with any organization.


*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.


Today is the 3100th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Recommended Reading

Semiconductor Industry Observation

" The first vertical media in semiconductor industry "

Real-time professional original depth


Scan the QR code , reply to the keywords below, and read more

Wafers|ICs|Equipment |Automotive Chips|Storage|TSMC|AI|Packaging

Reply Submit your article and read "How to become a member of "Semiconductor Industry Observer""

Reply Search and you can easily find other articles that interest you!

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us About Us Service Contact us Device Index Site Map Latest Updates Mobile Version

Site Related: TI Training

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

EEWORLD all rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号 Copyright © 2005-2021 EEWORLD.com.cn, Inc. All rights reserved