"
Large models growing on the cloud will eventually change the landscape of cloud computing.
"
Author | He Sisi Xixi
Editor
| Xixi
In 2021, the China Academy of Information and Communications Technology issued a low-key report, pointing out a trend that is contrary to the mainstream perspective:
Although
the majority of global computing power shipments at that time were CPUs, starting from 2021, China's smart computing power (GPU) accounted for more than general computing power, accounting for more than 50% of China's computing power. Speed 85%.
Two years ago, this number had not yet aroused widespread recognition in the field of cloud computing, but some cloud vendors were already taking precautions. For example, Baidu proposed in 2020 that cloud computing must consider the impact of artificial intelligence (AI) technology.
During this period, the value of AI technology in core production scenarios was once questioned, and the topic of intelligent computing was put on hold for the time being. But by 2023, with the explosion of large models, all walks of life will usher in disruptive technologies that may determine future development and even the fate of enterprises. The demand for computing infrastructure construction that runs models with hundreds of billions of parameters and above has returned to the industry's vision. GPUs supporting intelligent computing platforms have also ushered in the second spring after deep learning in 2012. The market response is even more popular than in 2012. .
The most intuitive data is that NVIDIA's Q3 financial report released in 2023 shows that NVIDIA's revenue during the period reached 18.12 billion U.S. dollars, a year-on-year increase of 206%, and its market value exceeded 1.2 trillion U.S. dollars, nearly 1 trillion U.S. dollars higher than Intel, becoming the world's highest market value chip company. The driving force behind this is undoubtedly the big model changes that have dominated major technology headlines this year.
The emergence of ChatGPT has not only changed the development pattern of the AI field. The status of language AI technology has gradually risen, replacing visual AI as the C-level topic of today's general artificial intelligence (AGI). It has also changed the development pattern of cloud computing - intelligent computing. The role of power will become more critical. Enterprise technology architecture will gradually shift from CPU as the computing core in the past to intelligent computing represented by GPU as the core. The heterogeneous computing architecture of GPU+CPU+DPU+... will replace the single XPU and become the cloud The main computing power mode of computing.
Some industry insiders even predict that the first large-scale reshaping of domestic DPU will be completed before 2025.
Regardless of whether the prediction is accurate, what is certain is that 2023 has reached the end of the year. AI technology centered on large models is changing with each passing day, and there is not much preparation time left for cloud computing manufacturers.
Before the cloud computing landscape is completely reconstructed, the new challenges posed to computing power by the era of large models still need to be thought through rationally and treated with caution.
A paradigm shift in computing
In the past year when large models have been booming, people can most intuitively feel the impact of large models on cloud computing, which is probably the rush for GPU computing power by technology peers.
A computing power procurement practitioner told Leifeng.com that at the beginning of the year, after someone went through all the trouble to find NVIDIA's sales, he thought he had money in his pocket and was not in a panic, so he offered to buy 2000 NVIDIA's sales with "deep pockets". Zhang A100 was rejected by Nvidia's sales because "the quantity requested was too small." Before large-scale model making was at its peak in the first half of the year and before the chip export controls were announced, it was once rumored that the giant's GPU card shipments would start at 4,000 units. However, there were still a large number of peers who spent huge sums of money and were still disappointed.
There is no doubt that under the influence of large models, the transformation of cloud computing from CPU cloud-based in the Internet era to GPU cloud-based in the AI era has become an industry consensus and a general trend. The lower-level chip side was the first to react. In addition to NVIDIA, manufacturers such as Qualcomm, Intel, and Arm have also begun to put chip design and production for large model training and reasoning on the agenda to prepare for the possibility of the next era.
But in addition to changes in chip types and quantities, Leifeng.com has observed that the impact of large models on cloud computing manufacturers is actually reflected in deeper dimensions.
Although GPUs have been used in the training and reasoning of AI algorithms since the rise of deep learning in 2012, due to the new features of large models based on Transformer and ultra-large parameter scale, the general generalization ability is far stronger than that of small AI models in the past. Training , The demand for inference computing power has soared exponentially, placing extremely high requirements on computing power (cluster) scale, energy efficiency, and stability. Simply relying on simple computing power stacking is completely unsuitable for the era of large models.
Against the background of such trends, past cloud service models also need to be changed and adjusted to adapt to the times. Compared with the "expansion" of computing volume, the dimensions of cloud vendors' service models have not received much attention.
Specifically, in the era of large models, to participate in a new round of competition, cloud computing vendors may need to face up to three major propositions and provide solutions:
Transformations in computing infrastructure
Regarding the comparison of engineering quantities between small models and large models, we can use different types of aircraft models for comparison:
Although they are all airplanes, although they all have wings, fuselages, engines, landing gear, and tails, toy airplanes, small airplanes, medium airplanes, and large airplanes have different sizes and functions, and the technologies required for construction and operation and maintenance. There are also great differences in talent, project volume, etc. Correspondingly, AI models with different parameter scales require different computing infrastructure.
In the past, the training of small AI models generally only required a single card or multiple cards on a single machine, but the training of large models required thousands of GPUs to run. In the same analogy with the aircraft model example, a single-digit GPU and a cluster construction of tens of thousands of GPU cards are not on the same scale of engineering. It is unrealistic to completely replace it with GPU. In actual computing, GPU often needs to be combined with CPU, DPU, etc. to form a very large-scale intelligent computing cluster to complete training and inference.
The construction of a large-scale computing cluster cannot be completed by simply stacking 10,000 graphics cards. It also requires special design and optimization to make the performance and stability of model training and inference meet actual needs. Take graphics card utilization as an example. In the past, the parallel utilization rate of kilo-card CPUs in the industry was usually between 60% and 70%. This is already a very high level, but it is still not enough. The wheel of the times is rapidly turning, and the transition from CPU to CPU+GPU+DPU only takes a very short time. Improving graphics card utilization has always been a difficult problem for cloud manufacturers.
In the face of large models, such issues will become more critical. When the base of graphics cards expands, the impact of a 5% or even 10% increase in graphics card utilization becomes even greater. According to Leifeng.com, although the sales rate of some intelligent computing centers is very high, the utilization rate is extremely low, with the percentage only in single digits. In other words, there is still a lot of room for cost reduction and efficiency improvement in the management of computing clusters.
In addition, as the parameter scale and training complexity of large models increase, the failure rate of the graphics card also increases. Many technicians told Leifeng.com that a common failure of NVIDIA graphics card training for large models is "card drop", that is, the graphics card suddenly loses connection or fails to work properly during operation. The training cycle of large models is long. If a failure occurs midway, the task that has been carried out for more than ten days may have to be restarted.
Therefore, reconstructing the cloud computing infrastructure system for large models is a task that cloud vendors have to promote.
Large model services become mainstream, MaaS is the trend
In the past year, it has become an industry consensus to reconstruct upper-layer products and applications around large model technology. Although the current number of large-model native applications in China is far from reaching public expectations and is still far from the vision of endless apps in the mobile Internet era, since the second half of this year, there has been an increasing discussion around large-scale models imagining an era of native AI. more.
Take Baidu Wenxinyiyan as an example. Baidu once released a set of data, saying that in the four months since Baidu Wenxinyiyan was fully opened on August 31, the daily call volume of the large model API on Baidu Qianfan large model platform has increased by 10 times, and the calling industry has not only It is limited to the Internet, education and e-commerce scenarios, as well as traditional industries such as marketing, mobile phones and automobiles that no one can imagine.
As enterprises pay more attention to the application of large models, the business models of large models are also changing. Under the trend of MaaS (Model as a Service), customers' focus in the future will turn to whether the model is good, whether the framework is good, etc., rather than just looking at the computing power. MaaS will completely change the business model and market structure of cloud services, and bring fertile ground for explosive growth of AI native applications in various industries.
In the future, it is very likely that large models will no longer be billed based solely on API calls and inference based on the number of token words. Some manufacturers are developing cloud computing services around GPUs, hoping to charge customers based on their actual usage.
Hou Zhenyu, vice president of Baidu Group, believes that MaaS revenue will be divided into at least two categories:
One type is model-oriented R&D income, that is, SFT (supervised fine-tuning) based on a large general base model. This part will gradually replace part of the income from bare computing power for model training. In the craze for chasing base large models, companies tend to purchase computing power for model training. However, as the number of large models increases, more companies realize that it is not advisable to train a large model from scratch. , it is more practical to carry out secondary development based on the existing large general model. This judgment is in line with the current response to the widespread phenomenon of "reinventing the wheel" in the field of large models.
The other category is reasoning revenue after the explosion of AI native applications. In addition to early training, the greater profit potential of cloud vendors lies in providing developers with powerful base large models and charging inference fees for future AI applications that go deep into business scenarios and users. With this goal in mind, stable computing services and reasoning experience have naturally become a watershed for cloud vendors to compete.
The application development paradigm is turned upside down
In the past decade, the implementation of deep learning algorithms often required model training based on a single specific scenario. From data annotation to algorithm training to end-to-end deployment, it often took weeks or even months. However, with the birth of more and more large base models with strong generalization capabilities and the maturity of the MaaS model, AI models in the large model era no longer need to be trained from scratch, but can be supervised based on a powerful general large model. Comes with fine-tuning.
Under this change in R&D methods, companies focus on the data of their own scenarios. Coupled with the generalization advantages of general large models, the computing power scale and training time required for industry users to develop large model applications will be greatly shortened. The result is faster iterations. In this mode, the utilization of computing resources will also be greatly improved.
Specifically, the unique understanding, generation, logic and memory capabilities of large models will also bring about subversion of the entire technology stack, data flow and business flow, giving rise to new scenarios (such as personal assistants, code generation), new architectures (such as search enhancement) Generate RAG) and new development ecology.
In preparation for adapting to the new AI application development paradigm, a flexible and innovative cloud computing system and cloud service facilities are better suited to the future commercialization and implementation of large models. Large models were born in large-scale cloud computing clusters. However, as industry needs change, cloud computing must also change its attitude and focus on customers to keep up with the development of large models.
Baidu Solution: A Reconstruction Enlightenment
In response to the problems of difficulty in training large models and high computing power requirements, domestic and foreign cloud computing vendors have also carried out their own thoughts and measures since this year.
How can a cloud vendor keep up with the big model era? This question is not a small one, and there are many solutions. But no matter what each company’s answer is, answering this question cannot avoid the essence of large models - the competition of large models is not a 100-meter sprint, but a 5,000-meter long-distance run or even a half-marathon. . Judging from the spiritual tenet of delaying urgent matters, the key to cloud vendors' success lies not only in rapid response, but also in comprehensive layout and careful demining.
Take the idling phenomenon of computing power centers as an example. A cloud computing salesperson told Leifeng.com that in the first half of this year, some operators and small intelligent computing centers grabbed a batch of graphics cards but didn't know how to use them. From the perspective of cloud vendors, the most ideal goal is long-term leasing of computing power. If the subsequent demand is unclear after the short-term lease ends, the original computing power resources may be idle, resulting in a waste of resources.
In addition, the current focus of the industry is on the development and application of large models, and little attention is paid to the refined operation of computing centers. During the training process of large models, the management of computing resources is also relatively extensive. If a cloud vendor only pursues hot spots without long-term planning and management, the essence behind the waste of resources is the collapse of the business model.
Recently, Baidu held the 2023 Baidu Intelligent Computing Conference. Leifeng.com learned that Baidu has adopted a strategy of cost reduction and efficiency improvement, precise attack, and comprehensive strategy in the reconstruction of cloud computing. Judging from Baidu's technical genes, Baidu has both a large model of Wenxin and the earliest practical experience in exploring the integration of cloud and intelligence in China. It is reasonable to adopt a multi-front approach to the layout of the intelligent cloud and make steady progress. This is a move to adapt to the needs of the cloud computing industry and is also Baidu's strength.
Specifically, the reconstruction of Baidu Intelligent Cloud is reflected in three aspects:
First of all, in terms of reconstructing intelligent computing infrastructure,
Baidu Intelligent Cloud launched Baige·AI Heterogeneous Computing Platform 3.0.
The research and development of Baidu Baige·AI heterogeneous computing platform can be traced back to 2009. In this year, Baidu began to use GPU for AI acceleration. Baidu began to use GPU for AI acceleration and continued to expand the cluster scale, paving the way for the final launch of the Baige platform to the market. Foundation. Baige·AI heterogeneous computing platform 1.0 will be released in 2021 and will be upgraded to version 2.0 in 2022.
Compared with 1.0 and 2.0, the upgraded 3.0 is mainly designed for the development of large model training and inference scenarios. It has been upgraded in terms of efficiency, stability, and ease of operation and maintenance, achieving an effective training time of more than 98% for 10,000-level tasks, and bandwidth The effectiveness can reach 95%. Baige heterogeneous computing platform can improve training and inference acceleration of open source large models by up to 30% and 60% respectively.
In response to the supply imbalance of intelligent computing power in the AI native era, Baidu Intelligent Cloud released an intelligent computing network platform. The platform supports global access to intelligent computing nodes such as intelligent computing centers, supercomputing centers, and edge nodes built by Baidu and third parties, and connects dispersed and heterogeneous computing resources to form a unified computing network resource pool. , and then use the computing power scheduling algorithm independently developed by Baidu to intelligently analyze the status, performance, utilization and other indicators of various computing resources, and uniformly schedule computing power to achieve flexible, stable and efficient utilization of intelligent computing resources.
At the same time, in order to meet the requirements of AI native scenarios, Baidu Smart Cloud continues to update and enhance the product capabilities of Baidu Taihang Computing, releases a new generation of cloud servers, high-performance computing platforms, a new generation of gateway platforms, etc., and provides ubiquitous services through distributed clouds. intelligent computing power.
In terms of data infrastructure, Baidu Canghai Storage Upgrade released a unified technology base that can support larger-scale and higher-performance computing scenarios. At the same time, it released the cloud-native database GaiaDB 4.0, database smart cockpit, upgraded big data management platform Serverless capabilities, etc.
In order to strengthen the service capabilities of intelligent infrastructure, Baidu Intelligent Cloud has carried out a number of tasks early this year, such as upgrading the Yangquan data center to an intelligent computing center in March, launching the country's first large model data annotation center in August, and uniting multiple The local government has jointly built an intelligent computing center and an AI data annotation base.
Secondly, comprehensively upgrade the MaaS service platform
. Under the changes in the MaaS model, Baidu Intelligent Cloud aims to enable enterprises to more rationally select and effectively utilize large models and create an efficient and easy-to-use model capability scheduling environment for the development of upper-layer AI applications. Baidu Intelligent Cloud has The Qianfan large model platform has been upgraded.
At the Intelligent Computing Conference, Baidu announced Qianfan’s latest “report card.”
Since the Wenxin Big Model was fully opened to the whole society on August 31, the daily call volume of the Big Model API on the Qianfan Big Model platform
has increased by 10 times. At present, the Qianfan platform has served more than 40,000 corporate users, and has helped corporate users fine-tune nearly 10,000 large models.
Compared with Qianfan Platform 2.0, the upgraded Qianfan Platform has increased the number of models to 54, ranking first in the country, and has targeted model capacity enhancements; new functions such as data statistical analysis and data quality inspection have been added, combined with data cleaning Visual Pipeline can build high-quality data fuel for large model scenarios; it launches an automated + manual dual model evaluation mechanism to greatly improve the efficiency and quality of model evaluation.
In addition, in order to help customers customize their own large models faster, the Qianfan platform quickly iterates the full process tool chain of model development. After testing, it was found that compared with training large models using a self-built system, the cost of training using the Qianfan platform can be reduced by up to 90%.
Third, fully open the AI native application workbench
-
App
Builder
At the 2023 Baidu Cloud Intelligence Conference and Intelligent Computing Conference, Baidu Group Vice President Hou Zhenyu pointed out that the typical system architecture in the AI native era includes at least three parts: model, data and application. Therefore, after reconstructing the intelligent computing infrastructure and MaaS service platform, the fully open AI native application workbench Qianfan AppBuilder has become an important closed loop for Baidu to build an AI native application ecosystem.
Qianfan AppBuilder condenses common patterns, tools, and processes for developing AI native applications based on large models into a workbench to help developers focus on their own business without involving unnecessary energy in the development process.
In response to the needs of developers at different levels, Appbuilder provides two product forms: "code state" for users who require in-depth AI native application development capabilities, and "low code state" suitable for rapid customization and launch of smart products, allowing enterprises and developers to AI native application development can be carried out quickly and efficiently.
In the era of large models, should a cloud vendor develop its own large models? In the past year, the relationship between large model manufacturers and cloud vendors has also been interesting. But in the business world, gold diggers and shovel sellers are often not at odds with each other. What's more, only those who have panned for gold know what kind of shovel is the best. Baidu's experience is that cloud computing underpins large models, and large models also support cloud computing.
Since Baidu has a layout in the model, computing, and application layers, on Baidu's technology platform, large models can achieve end-to-end connectivity from the underlying computing power to the upper-layer applications, thereby achieving better iterations.
With the support of technology, when Baidu released the ERNIE-Bot-Turbo version on June 6, the inference performance had been improved by 50 times; on July 7, Wenxin Big Model 3.5 was released, the effect was improved by 50%, the training speed was increased by 2 times, and the inference speed was improved. 30 times; on August 2, Baidu Qianfan large model platform was upgraded, and the model's inference cost was reduced by another 50%.
A set of data provided by Hou Zhenyu is:
since the release of Wen Xinyiyan in March, the cost of reasoning has dropped to 1% of the original level.
If large models are the key to the AI era, then this key must be inseparable from three layers of blessing: models, computing power and applications. Whether it is the development of large models or the development of cloud computing, Baidu's reconstruction combines the three into one discussion, rather than taking them apart and discussing them separately. This also makes Baidu's large model layout balanced and the whole can go hand in hand.
The large model industry has just started. In fact, whether it is a large model unicorn or a major Internet company with both cloud and model, they are still exploring and crossing the river by feeling the stones.
There is more than one solution to cloud computing reconstruction in the big model era, and Baidu took the lead in delivering an answer sheet to the industry. As an AI company that has been deeply involved in the field of artificial intelligence for more than 10 years, Cloud for AI is Baidu's destiny and advantage. In addition to a comprehensive layout and a steady pace of progress, perhaps the spirit of long-termism is more in line with the requirements of the large model era. Fast runners win the 100-meter dash, but marathons require patience and tenacity. Cloud computing is heading towards 2024. Who will take the lead depends on today.