Tencent Cloud is eyeing the chip design track
In addition to the chip design industry, Tencent will also focus on multiple high-performance computing tracks such as cloud rendering and life sciences.
As the demand for cloud migration and use gradually deepens, cloud vendors are actively penetrating into the industry and creating best customer practices.
Not long ago, Tencent Cloud teamed up with Suishi Technology to create an industry solution for HPC (high-performance computing) scenarios for the chip design company Suiyuan Technology. This solution is based on the one-stop chip design and R&D cloud platform jointly built by Tencent and Suishi Technology. It quickly and automatically calls Tencent Cloud IaaS resources to build a simulation environment, which meets the business flexibility needs of Suiyuan Technology and improves the overall project R&D efficiency.
"It is a visible blue ocean with huge potential," said Kevin, senior manager of Tencent Cloud's high-performance computing industry. Tencent Cloud will increase investment in this area. Data Intelligence Frontier learned that in addition to the chip design industry, Tencent will also focus on multiple high-performance computing tracks such as cloud rendering and life sciences.
01
Moving the chip design industry to the cloud is becoming a trend
As a leading domestic AI chip design company, Suiyuan Technology once set a record of successfully tapeouting an AI training chip with a high technical threshold in just 18 months.
However, as processes become more and more advanced, Suiyuan also faces the contradiction that IT resources and efficiency cannot meet business needs.
The research and development cycle of chips is usually tight, especially for large chips. In the middle and later stages, tasks are often scheduled on a daily basis. The industry generally uses self-built IDCs (data centers). Kevin told Digital Intelligence Frontier that this was mainly because the chip technology at the time was not that advanced and the demand for computing power was not that great.
Moreover, Vincent, head of IT at Suiyuan Technology, revealed that a lot of demonstration and planning will be done in the early stages of the chip project, including how much computing power and storage is required. But the problem is that there are often changes during the project advancement process, including process improvements, functional changes, and performance index adjustments. This change will create a large and sudden demand for computing power. If you want to meet the demand by purchasing servers or renting servers, from deployment to online testing, it will take a long time for the business team to use the computing power, which will affect the research and development progress.
Such efficiency is obviously unacceptable. In particular, the epidemic in recent years has made the hardware purchase cycle uncontrollable, but the chip project cycle is clear, which means that chip design companies face the risk of uncertain IT assets. For example, one or two hundred servers need to be prepared in one day. This can only be achieved by moving to the cloud. If it is the original IT process, from confirming the server model to purchasing, from installing the server in the cabinet to the operation and maintenance of the computer room, it will take 8 to 12 days at the fastest. weeks, and the IT capital cost is too high.
"This is an opportunity for us to go to the cloud." Vincent mentioned.
The design cycle of large chips exceeds 12 months, including product definition, front-end design, IP verification, SOC verification, synthesis, layout and routing, etc. Different stages have different requirements for computing power. The verification phase is the peak period of computing power usage. Therefore, Suiyuan also chose to move part of the simulation verification to the cloud. "The front-end IP verification process has basically been moved to the cloud. In the future, we definitely hope to move the entire elastic part to the cloud as much as possible." Eli, project leader of Suiyuan Technology explain.
Suiyuan has a large number of flexible job requirements, such as the need to configure hundreds of servers at the same time, which has very high requirements for stability and real-time response. At present, Tencent Cloud and Sushi can enable customers to quickly run simulation jobs within an hour, allowing customers to run simulation and verification tasks more frequently within a limited time, improving the success rate before tape-out. At the same time, based on Suishi's capabilities in business scenario optimization and CAD, it helped Suiyuan reduce the overall job running time by 50%, speeding up the research and development progress of the entire project.
Moreover, the chip design industry has now entered the 7nm or even 3nm era. There will be tens of billions of transistors on a chip, and its demand for computing power will greatly increase. This means that chip companies have very obvious computing power needs during peak periods, and chip design companies such as Suiyuan have begun to seek flexible computing power solutions from cloud manufacturers.
"Moving to the cloud is an industry trend," Vincent said. "Everyone is trying, but it will take some time for everything to go to the cloud."
02
The iron triangle of safety, efficiency and cost
The core of chip design companies are various chip codes and intellectual property rights. Compared with many industries, this track has higher requirements for data security.
Suiyuan Technology's attitude towards cloud migration is that all data should be stored locally, only the elastic part should be on the cloud, and no data should be stored in the middle. Therefore, under Suiyuan's suggestions and inspiration, Tencent Cloud and Suishi explored a hybrid cloud computing architecture with "separation of storage and computing" and spent five or six months to verify it.
It can connect with the local computing cluster through Sushi's scheduling platform while ensuring that core data and code are stored locally, so that computing tasks can flexibly select local or cloud computing queues.
Chen Lintao, technical director of Suishi Technology, revealed that the storage and computing separation solution adopted this time is essentially a hybrid cloud solution. In Suiyuan’s project, the solution faces further technical challenges, such as the overall hybrid cloud construction architecture and the impact on the network. The requirements for latency, bandwidth throughput and efficiency are very high, which requires the three parties to jointly seek the optimal architecture layout in this project.
Vincent said frankly that because of the separation of storage and computing architecture, the data is local, so enterprises' concerns about security will be reduced.
The previous separation of storage and computing was within the same autonomous domain, such as on Tencent Cloud. But now Suiyuan's solution is to deploy hybrid cloud in two autonomous domains, which increases the physical distance, and the scheduling of various interfaces becomes more complicated, which further tests the capabilities of cloud vendors and partners. The Sushi platform does not change the user's usage habits, allowing users to call cloud resources without any sense, making the call of resources more convenient and reducing the learning cost of cloud migration.
This is also a problem that cloud vendors often encounter when delving into the industry. What Tencent Cloud and Sushi previously considered was to directly upload customer data to the cloud, which is convenient and efficient. However, after communication, it was discovered that the data security requirements of chip customers are still most suitable to adopt a hybrid cloud storage and computing separation architecture. Tencent Cloud currently only provides computing power support, and the Suishi platform provides automated and efficient environment construction. Suiyuan's knowledge code and other enterprise core data are placed offline. However, in Kevin's view, some insensitive data can theoretically be moved to the cloud and simulation efficiency improved through caching technology.
Kevin told Digital Intelligence Frontier that early-stage start-ups have less existing data and assets and are less concerned about security. All-cloud solutions are the first choice. However, as the scale grows, many enterprises tend to adopt hybrid cloud architecture.
Moreover, many chip design companies have had a lot of IDC assets before. How to utilize the original resources is also what the companies want. They can better balance the investment in existing assets while taking into account the elasticity, flexibility, speed, and flexibility of the cloud. Convenience. "So from this perspective, hybrid cloud is currently a better choice."
For example, Suiyuan has not moved all its business to the cloud, and some of it still uses local computing power. For example, the early operation of the project is more suitable for the existing local computing power. In fact, many chip design companies still focus on local production and do the flexibility part on the cloud.
The hybrid cloud deployment method has gradually become a consensus to save IT costs.
Suiyuan has done some calculations. If he purchases his own servers and builds his own computer room, and compares it based on the financial cycle of three to five years, the monthly cost will be evenly lower than the monthly cost of cloud migration. However, if you consider saving time and manpower, improving efficiency, and overall comprehensive costs, the advantages of moving to the cloud are still very obvious. Because the cloud does not require water and electricity, nor does it require its own operation and maintenance, these parts are saved, and the ability to quickly deploy and elastically expand can allow expensive R&D personnel to improve efficiency and shorten the R&D cycle.
In addition to adopting an architecture that separates storage and computing, Tencent Cloud and Suishi have also created a complete security solution from the terminal to the cloud for chip design customers such as Suiyuan: At the terminal, Tencent Cloud’s zero-trust secure iOA solution can protect Suiyuan’s entire country. R&D personnel from all over the world can seamlessly experience a consistent simulation environment while ensuring terminal security, information protection, and protection of some vulnerabilities.
In the cloud, Tencent's host security is used to ensure that the entire computing environment is safe and credit-granting. This part ensures that the entire computing process will not have problems such as intrusions, data leaks, ransomware viruses, etc. Even at the transmission level, there is an ultra-large bandwidth network guarantee between Tencent Cloud and Suiyuan, ensuring that the entire transmission channel is safe and trustworthy.
It is not difficult to find that through the storage-computing separation architecture and hybrid cloud deployment solution, it not only meets the needs for elastic computing power and efficiency, but also meets the needs of cost saving and data security. These are the things that enterprises are most concerned about when migrating to the cloud and using the cloud, and they are also the aspects that cloud vendors need to pay attention to and solve.
At present, the hybrid cloud architecture of "separation of storage and computing" has helped Suiyuan save considerable IT investment. The amount of task concurrency can be increased simultaneously through the elasticity of the cloud, and at the same time, some simulation cycles have been shortened by 30%-50%.
Of course, Eli also mentioned that at this stage, using this set of storage and computing separation solutions co-created by three parties meets various definitions of the needs of some businesses in terms of flexible computing power usage. In the next step, we need to further optimize and improve usage efficiency. "How to use cloud machines more efficiently, how to optimize efficiently in line with business usage, and migrate more businesses, this is what we have to do next."
It is not difficult to find that through the storage-computing separation architecture and hybrid cloud deployment solution, it not only meets the needs for elastic computing power and efficiency, but also meets the needs of cost saving and data security.
In the future, GPU accelerating chip simulation and providing intelligent chip design optimization are new directions in the industry. Tencent Cloud will also cooperate with domestic and foreign EDA software to build an accelerated simulation ecosystem, bringing several times acceleration to chip simulation operations and providing AI intelligent PPA optimization. ability. At the same time, Tencent Cloud is also trying to explore cloud development, deploying the pre-chip design process on the cloud, and building the chip design process based on the full cloud to further improve the efficiency of large chip R&D and design. In high-concurrency scenarios, Tencent Cloud uses the massive large-scale scheduling capabilities of the Aochi Cloud native operating system and the rich and diverse bare metal instances and GPU instances to complete many tasks in one stop during chip simulation verification and performance comparison testing. Generation and multi-card verification work saves self-build and purchase costs and greatly improves deployment efficiency and testing efficiency.
© This article is the original content of Digital Intelligence Frontier (szqx1991) and is reproduced with permission
Some image sources: pixabay.com
Disclaimer
All information and charts published on this platform are for reference only. The publication of these documents does not constitute an invitation or intention to acquire, purchase, subscribe for, sell or hold any shares. The profits and losses caused by investors' financial, securities and other investment projects based on the information, materials and charts provided by this website have nothing to do with this website. Except for original works, the articles, pictures, videos and music used on this platform belong to the original rights holders. Due to objective reasons, there may be improper use, such as some articles or part of the quoted content cannot be obtained from the original author in a timely manner. Contact, or the author's name and original source are incorrectly marked, etc., which is a non-malicious infringement of the relevant rights and interests of the original rights holder. We kindly ask the relevant rights holders to understand and contact us to handle it in a timely manner, so as to jointly maintain a good online creation environment.
XinTongshe
- SemiWebs -
Focus on semiconductors-mobile communications-artificial intelligence
Please press and hold the QR code below to follow XinTong News
▼
Partners
You may miss it for a lifetime. Why
don't you follow us soon?