Suiyuan releases second-generation training chip: from sparks to cloud cores

Publisher:EEWorld资讯Latest update time:2021-07-13 Source: EEWORLD Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Recently, Suiyuan Technology held its second offline product launch event, officially announcing the launch of the second-generation training chip Suisi 2.0, as well as the training accelerator card Yunsui T20 and the training OAM module T21. Compared with the theme of the first launch event, the theme of Core Cloud Changtian this time further demonstrates Suiyuan's confidence and determination. The most important reason behind the confidence is that the company has launched two training chips and an inference chip in just three years since its establishment, as well as supporting development software, accelerator cards, supercomputing clusters and other product series.


Zhao Lidong, CEO of Suiyuan Technology, said: "In the era of AI computing power explosion, computing power has increased more than 10 times each year. However, in the current high-end cloud training market, the ecosystem and products are monopolized. Suiyuan hopes to break this monopoly and build an independent ecosystem without dependence. The support behind this includes high-performance chips, acceleration cards, AI servers, clusters, and supporting software stacks."


image.png

Zhao Lidong, CEO of Suiyuan Technology (left), and Zhang Yalin, COO of Suiyuan Technology (right)


Three years of success at Suiyuan


Three years ago, when Suiyuan was founded, it had already set its vision and goal - to become a leading company in both cloud AI and high-end chips, to establish a world-class localized R&D and engineering team, to realize domestically-developed core technologies of independent innovation, and to develop the best cloud AI products for data centers or intelligent computing centers, thereby forming a complete closed-loop solution for training and reasoning.


Zhao Lidong introduced that Suiyuan currently has more than 500 full-time employees, 90% of whom are R&D personnel, and more than 70% have master's and doctoral degrees or above. So far, Suiyuan has obtained 52 patents covering chip architecture, functional modules, packaging, system design and software stack.


In December 2019, Suiyuan released its first generation of training products, which were commercialized in September 2020. In October of the same year, it won the China Core Annual Major Innovation Award from the Electronic Information Research Institute of the Ministry of Industry and Information Technology. This was the first time that the award was given to an AI chip in its 15th history. In December 2020, Suiyuan released its first generation of inference product, Yunsui T10.


The official release of the second-generation training chip Suisi 2.0 and the second-generation inference products Yunsui T20 and T21 now marks that Suiyuan is the first in China to step into the second-generation training products.


Make friends with the "Liaoyuan" plan


With the implementation of artificial intelligence, there will not only be a demand for Internet companies, but also a reshaping of traditional industries. From financial security, medical care, education, and smart cities, the implementation of AI+ is becoming more and more extensive, and the demand for computing power is the foundation of the entire development of artificial intelligence. With the slowdown of Moore's Law, the development of computing power has also shown a new style. The emergence of technologies such as heterogeneous computing, chiplets, and advanced packaging will accelerate the progress of computing power. But at the same time, computing power will be a comprehensive cross-industry competition. It is not a relatively simple product competition, but a complex and ecological competition.


"Suiyuan's product development and business implementation are inseparable from the cooperation of industry partners, including major industry alliances, CPU and server manufacturers, research institutes, universities, outstanding software companies, and solution companies. We carry out all-round cooperation in chip development, large-scale cluster interconnection, heterogeneous computing, green liquid cooling, domestic frameworks, software stacks, compilers, operator libraries, algorithms, and the development and application of large-scale training models." said Zhao Lidong.


For this reason, Zhao Lidong announced the launch of the "Liaoyuan" plan at the press conference. The plan includes three major features: First, it is to make original innovations, build a foundation from scratch, and not be controlled by others. Second, it is to build standards, actively participate in the formulation of standards with domestic institutions, and build a testing platform. Third, it is to build an ecosystem with an open attitude. The ecological content covers three aspects: developer ecology, industrial ecology, and scientific research ecology.


Zhao Lidong said that in addition to the fields of deep learning computing and general artificial intelligence, the "Liaoyuan" plan will also cover scientific computing and engineering computing through general heterogeneous computing, support video encoding and decoding related to visual computing, and so on.


Interpretation of Suiyuan's new products


Zhang Yalin, COO of Suiyuan Technology, gave a detailed interpretation of the specific products of this press conference.


The first is the Green Integrated Supercomputing Intelligent Cluster 2.0 (CloudBlazer Matrix), which is the world's leading computing cluster jointly built by Enflame and its partners. It contains 8,192 CloudBlazer training cards, which can reach 1.3E Flops computing power. This is the first time in the world that Enflame has used 8,000 cards to achieve a computing power of over 100 million. It can lead China's green digitalization, and the PUE can be reduced to 1.15 under liquid cooling, greatly improving the energy efficiency of the entire cluster. Each training card has a single-precision computing power of 160T, 80% cluster linearity, and a maximum interconnection bandwidth of 2.5P. Compared with the performance of the 1.0 product: it supports up to 1,280 cards, a computing power of 28P Flops, and a maximum interconnection bandwidth of 0.25P. It can be said that "all indicators have been rapidly improved, which gives Enflame Technology the capital to help China's new infrastructure in a clustered and green way." Zhang Yalin said.


The core of CloudBlazer is the Suis DTU 2.0, which is currently the largest computing chip in China and breaks through the packaging limit. In the package area of ​​57.5 mm × 57.5 mm, 10 chips are integrated, including the main chip and Samsung's HBM2E memory. This package size has reached the historical limit of packaging partners.


image.png

Suisi DTU 2.0


Because of the high degree of integration, the peak computing power of single-precision FP32 of Suis 2.0 has reached 40T Flops. At the same time, the product supports the computing mode of single-precision tensor TF32, with a computing power of 160T Flops. The computing power of half-precision BF16 and P16 is also 160T Flops, and the fixed-point integer precision is 320T Flops. "Suiyuan is the first to realize full computing power products in China, covering FP32 to TF32, BF16, FP16 and INT8." Zhang Yalin said.


In terms of data engine, Suiyuan has implanted a fully programmable data flow inside the chip, further enhancing the advantages of programmable performance. The fully instruction-driven transmission and assisted computing ensure data throughput and data efficiency under different models, fully supporting efficient data processing of scalars, vectors and tensors, as well as multi-address broadcasting.


In terms of storage, Sosi 2.0 is the first product in China to adopt HBM2E. Four HBM2E chips help Sosi 2.0 achieve 64GB storage on a single chip, and the maximum storage bandwidth reaches 1.8T. Compared with Sosi 1.0, the capacity is increased by 4 times and the bandwidth is increased by 3 times.


In terms of interconnection, interconnection is also considered a necessary factor for multi-host and multi-card cluster training in data centers. Suiyuan supports 6 inter-card interconnection ports, each with a bidirectional 500GB capacity, which is 1.5 times that of Suisi 1.0.


Suiyuan Technology also simultaneously launched accelerator cards using Suisi 2.0, namely Yunsui T21 and T20. Among them, T21 is a standardized OAM module, adapted for liquid-cooled servers. T20 is a full-height and full-length PCIE card. Due to power consumption limitations, the single-precision computing power of T20 is reduced to 134.4T Flops, the single-precision tensor is reduced to 33.6 TFlops, and the storage bandwidth is reduced from 1.8T to 1.6T.


image.png

Yunsui T20

image.png

Yunsui T21


TopsRider2.0 is a software platform developed by Suiyuan Technology, which has undergone tremendous innovation compared to 1.0. First, the entire programming model and operator interface have been greatly optimized, making the entire user access more efficient. Secondly, the compilation technology has been improved, and a lot of efforts and progress have been made in the entire graph compilation and operator automation, which has greatly improved the entire Suiyuan software stack in the underlying operators and frameworks. Third, it is optimized in terms of interconnection, supporting 6 inter-card interconnection ports and clusters with up to 8,000 cards.


The Suiyuan TopsRider 2.0 software platform is designed based on a software-hardware collaborative architecture and adopts an integrated original design. The operator is fully automated and automatically generated directly through machine learning. It can support a more efficient parallel communication library and provide a turnkey solution for the entire system. It provides a user programming model that can easily define operators and enable more user operator development. The complete tools and compilation system provide users with a graphical integrated development environment that is ready to use out of the box, and support AI model adjustment, dynamicization and high-performance operation. It provides more flexible and general support, supports full-solution virtualization and automatic deployment, supports four users to split computing power in parallel, and facilitates cloud service providers to deploy more quickly.

[1] [2]
Reference address:Suiyuan releases second-generation training chip: from sparks to cloud cores

Previous article:Huawei's infringement lawsuit against Verizon for more than 6.4 billion yuan opens in the US today
Next article:The Ministry of Industry and Information Technology and ten other departments jointly issued the "5G Application "Sailing" Action Plan"

Latest Network Communication Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号