Source: Content by
Semiconductor Industry Observer (ID: icb
ank) reprinted from the public account Enterprise Storage Technology, author:
Winnie Shao,
thank you.
Recently, four leading startups in the popular DPU track shared their industry insights and product plans.
Here, let us summarize how they view this market with unlimited potential.
1. Dayu Zhixin: Driving underlying innovation from business needs
As the first speaker, Dayu Zhixin made a good start. Dayu Zhixin
positioned
DPU
as the third engine of cloud computing and described its product goals very clearly: to provide easy-to-use products for a wide range of markets.
The first speaker was responsible for popularizing
the history of
DPU
. In the same way, Dayu analyzed and deduced
the product logic behind
DPU
from the perspective of real products and real applications in the industry
, which is consistent with the fact that Dayu's founding team all came from large public cloud companies and therefore they are the team that understands the business best.
Li Shuang, CEO
of Dayu Zhixin
and former general manager of Meituan Cloud, once shared the team's product strategy, "driving the innovation of underlying hardware such as chips based on upper-level scenario needs." This is also
the logic behind the emergence of
DPU
products.
DPU
has developed from public cloud business and is a product deduced from business scenarios, rather than a product actively defined by semiconductor companies and then promoted to customers.
This product logic also explains why Dayu Zhixin's first-generation product is composed of a multi-core
ARM
general-purpose processor
SOC
. It was not until the second generation that
FPGA
was added
to accelerate the higher-bandwidth
IO
interface.
The SOC
architecture
of the
general-purpose
ARM
processor
means that this is a product that focuses on solving business offloading (rather than a product that accelerates business and reduces costs). This
is the same development experience
as
AWS
Nitro
,
the most successful case of
DPU
. Don't think that
DPU
is a hardware product. This highly programmable chip works more in software.
Pensando
,
which was just
acquired by
AMD
for
US$
1.9
billion, has
2/3
of its employees as software engineers.
Dayu Zhixin did not disclose any information about their third-generation products during this sharing session, only saying that they had started research and development at the end of last year. I believe that
Dr. Jiang Xiaowei, a member of
the HPCA
Hall of Fame
who joined
in
June
this year
, will deliver a good answer.
2. Zhongke Yushu: Storage and Acceleration
As the second speaker, Zhongke Yushu thanked Dayu Zhixin for
the comprehensive introduction of
DPU
, and went straight to
the storage solution of
NVMeOF
. I don’t know if it was a tacit understanding. In the four technical sharing sessions, Dayu focused on the business panorama, Zhongke Yushu focused on storage, Yunbao talked about the control part in more detail, and the last Yisixin focused on the network around
P4
. There was no scene of gunpowder smell when you talked about
the control solution of
OpenStack
, I compared it with
K8s
, and when you introduced
the
25G
card, I brought
out the
100G
card.
Zhongke Yusu spent quite a while introducing
the
NVMeOF
protocol, evolution, and implementation difficulties. If you
are interested
in
NVMeOF
, it is well worth a listen.
Of course, I am more interested in the latest KPU 2nd
generation architecture
demonstrated by Zhongke Yusu
, which is mentioned in the famous
DPU
white paper. This time it is a more detailed interpretation.
I don't quite understand the "software-defined" accelerator technology of Zhongke Yusu. After asking for advice, I understand that "soft" means that the
KPU 2nd
generation uses a large number of dedicated processors, and the data processing is completed by software code on the dedicated processing core. The "soft" here reflects the programmability. The "soft definition" of the "hardcore" accelerator is reflected in the programmable scheduling order of the hardcore. It doesn't matter whether it is programming a dedicated processor or programming a hardcore schedule, both are programmable and software-defined, although it sounds a bit difficult.
The on-chip network FlashNOC
in
KPU
is similar to
the
AXI cross bar
structure.
The
128-
core network engine
PPE
programmable with
P4
, plus
the network acceleration engine
NOE
that has been hardened for
more than
two
years to fully harden the
TCP/IP
stack
, takes into account the performance and flexibility requirements of the network data plane. The database
/
big data acceleration engine
DOE
is the most unique acceleration engine, which is
not available in
general
DPUs
. I have only seen it in
IBM's
high-end
CPUs
.
In short, in the field of accelerators, there are surprises but no surprises. On the one hand, Zhongke Yushu was incubated in the Institute of Computing Technology of the Chinese Academy of Sciences and the Key Laboratory of Computer Architecture, which researches the design of special processors, and is familiar with it; on the other hand, Zhongke Yushu's product theory is that
DPU
should do computing tasks that "
CPU
cannot do well and
GPU
cannot do", and accelerators are an effective way. This concept is good and bad. Accelerators are a place where it is easier to make a difference, but they are also walking in the opposite direction of general standardization. This direction requires strong industry leadership to control.
KPU 2nd
generation has been taped out and will be released in the second half of the year. It is the earliest chip to be released among the four companies, although it does not integrate
an
ARM
processor and is not yet
a
SoC
solution. I hope that after the release of the chips, I can read papers on
FlashNOC
and
PPE
.
The DPU
white paper
led by Zhongke Yushu
is a first-class white paper in the industry, and the level of the paper supported by actual products should not be wrong.
3. Yunbao Intelligence: High-performance chips drive infrastructure innovation
Yunbao Intelligence
's historical interpretation of
DPU
is that the bandwidth of data centers has been upgraded from
10G/25G
to
100G
, but the computing power of servers has not kept up with the development of the bandwidth of data centers, thus forming a scissors gap. In addition,
the CPU,
which has always occupied a core position,
is better at serial complex processing and is not good at large-scale parallel fixed network data processing. The technological development itself is the reason for this.
Compared with Dayu Zhixin, which started from the public cloud business, Yunbao Intelligence is more like following the perspective of traditional chip companies, promoting the innovation of underlying hardware from the bottom up. But Yunbao Intelligence is definitely not a traditional chip company. It has always emphasized "software-defined chips". The similarity between this and Zhongke Yusu's "software-defined accelerator technology" is that
the flexibility of the
DPU
architecture is achieved through software programmability. The difference is that Yunbao has been building a software-defined chip architecture since the first day of design, starting from the demand.
This technology sharing roughly follows the company's information sharing principles. The spokesperson of Yunbao Intelligence did not directly state their
positioning of
DPU
, a world-class extremely complex high-end chip, but rather shared a simple architecture diagram quite conservatively, spending more time on the software framework. The positioning of this high-end complex chip is also consistent with the founder of Yunbao Intelligence's deep background in semiconductor companies.
Yunbao Intelligence released an
FPGA
-
based
25G
network card product last year. In terms of software stack, it will be
seamlessly connected
with subsequent
100G DPU
products and can be used as a low-speed preview version.
During the entire technical sharing session, Yunbao listed a
list of challenges that
DPU
needs to solve, which really reflects their understanding of the pain points of cloud computing business. Although they did not share how they will solve them one by one. I hope that the Yunbao
DPU
chip delivered next year can bring the final answer. A company that intends to produce the most world-class chip has not yet announced any hardware indicators, which is also quite exciting.
To be fair, as the last speaker, it is not easy for Yisixin to avoid the content that has been shared in the previous key points. Yisixin was very careful to set its theme as
P4
.
P4
, a
domain-specific
programming language, does provide a packet processing language that simplifies hardware design and software programming.
P4
was originally designed for switches. After its development and expansion, its coverage has expanded to include all network devices from the core to the edge. It is particularly suitable for
the needs of
overlay
networks that are constantly evolving and have a strong tendency to be customized. Even better, if the server network card side supports
P4
and the interconnected switch also supports
P4
, in theory, a linkage of the switch
+
server data plane of the entire data center can be formed. This is
the concrete presentation of
the data center as a computer
.
Although it was
David Patterson
who popularized
the term
DSA
and was promoted by people in the
AI
field, it was people in the network field who made great achievements.
P4
is an outstanding
Domain-Specific
language, a simple
match-action
model that accurately describes the data packet processing model, and strikes a good balance between abstraction and concreteness. It is both targeted at the network and independent of the protocol, and it abstracts the data plane very well. Moreover, it has evolved over the past decade, with
Intel
buying
Barefoot
and
adding
the
P4
engine
to its own
IPU
, and
AMD
buying
Pensando
.
P4
follows these two major manufacturers and is on the road to becoming the de facto standard language for the data plane.
The current FPGA
version
of Yisixin
, the future
P4
engine, the current
FPGA
accelerator of Dayu Zhixin,
the DSA
network engine
of
the next-generation
SOC
, the
NP
-type
PPE
of Zhongke Yushu
, and the fully programmable
DPU
engine
of Yunbao
can all perform similar
P4
functions. In theory,
the efficiency ratio of
CPU
:
NPU
:
FPGA
:
DSA
is
1:10:20:80
. The actual performance and power consumption depends on the implementation capabilities of each company. Let us wait for the test data.
In this technical sharing session, Yisixin generously released
the actual (not estimated)
3-
layer forwarding performance
on the
current
2X25G
FPGA
card
, which is worthy of encouragement, and the actual measured numbers are also very good.
Conclusion:
Whether starting from business offloading isolation, promoting hardware innovation from top to bottom, or accelerating business from bottom to top with
DSA
hardware, the industry has reached a consensus on
the chip architecture of
DPU
, which requires four major subsystems: a general-purpose
CPU
subsystem, a programmable fast data plane,
NVMe/RDMA/
security
/
compression and other acceleration engines, plus a high-speed
IO
and storage interface subsystem.
The acceleration engine may be the part that can best differentiate the designs of different manufacturers, and it will also be the key technical point that determines performance and flexibility. However, the acceleration engine is also a double-edged sword. If the hardware is only developed but the software ecosystem fails to keep up, the effect will be zero. The first generation of
SmartNICs
that focused on acceleration
, such as
LiquidIO
from
Cavium
and
Marvell
, and
Stingay
from Broadcom,
did not end well.
Of course, such a powerful
DPU
chip will not only appear in the form of a network card. As the pattern opens up, more product forms will appear, such as firewalls, load balancers,
5G RAN
controllers, switches, etc. For example,
Asterfusion
's programmable switch is
a super deluxe version of
a
P4 Switch+DPU
.
The muscles that everyone has shown in white papers
/PPTs/
live broadcasts will eventually be put into practice in the chassis and on the racks.
Talk is cheap
,
show me your chips
.
Note
: This article only represents the author's personal views and has nothing to do with any organization.
*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.
Today is the 3100th content shared by "Semiconductor Industry Observer" for you, welcome to follow.
Semiconductor Industry Observation
"
The first vertical media in semiconductor industry
"
Real-time professional original depth
Scan the QR code
, reply to the keywords below, and read more
Wafers|ICs|Equipment
|Automotive Chips|Storage|TSMC|AI|Packaging
Reply
Submit your article
and read "How to become a member of "Semiconductor Industry Observer""
Reply
Search
and you can easily find other articles that interest you!