What’s the logic behind Microsoft’s self-developed chips?-EEWORLD

Collect

The domestic 100-model war is gaining momentum, and computing power is in short supply around the world. GPU companies, as the main driver of AI computing power, have become the first companies to "eat the soup" in the war of large models, and CPUs have also taken advantage of the momentum. It can be said that after many years of AI deployment, CPUs and GPUs are finally able to "make money while lying down."

Everyone wants to get a share of the AI chip market, and Microsoft also has such ambitions. Yesterday, after several years of preparation, Microsoft’s own artificial intelligence (AI) chip finally arrived, following Google and Amazon. So, can it threaten the status of the three "red, green and blue" manufacturers (Intel, Nvidia, AMD)?

Wang Zhaonan, Fu Bin丨Authors

Produced by Electronic Engineering World (ID: EEworldbbs)丨

Microsoft, heading towards AI chips and CPUs

First, let's take a look at what products Microsoft has released.

Microsoft's self-developed chips are divided into two types, one is the Microsoft Azure Maia 100, an AI chip (ASIC) specially used for cloud training and inference, and the other is the Microsoft Azure Cobalt 100, the first CPU (central processing unit) designed by Microsoft. . Both will be prioritized to support Microsoft's own cloud services.

In addition to chips, at the Microsoft Ignite global technology conference that day, Microsoft also released a series of content such as new features of Microsoft 365 Copilot, Security Copilot demonstrations, and Azure's latest feature demonstrations. But what attracts the most attention is Microsoft's first AI chip Maia 100, which will power its Azure cloud data center and lay the foundation for its various artificial intelligence services.

The summary of saving money is - one is an AI accelerator (ASIC) and the other is a CPU.

Next, let’s take a look at the technical details of Microsoft’s two chips.

Maia 100 is Microsoft's first AI chip designed for large language model training and inference in the Microsoft Cloud. It uses TSMC's 5nm process and has a transistor count of 105 billion, so it is not lightweight when it comes to transistors or clock speeds. magnitude. At the same time, it is optimized for AI and generative AI, and supports Microsoft's first sub-8-bit data type (MX data type).

A Microsoft spokesperson introduced the chip like this:

● Rani Borkar, vice president in charge of the Azure chip department, said that Maia 100 has tested this chip on Bing and Office artificial intelligence products, and ChatGPT developer OpenAI is also conducting relevant tests. Microsoft is also building racks with Maia 100 accelerators, which will be allowed to power external workloads via the Azure cloud next year.

● Microsoft Chairman and CEO Satya Nadella said, “Our goal is to ensure that we and our partners can bring the ultimate efficiency, performance and scale to customers. Maia 100 is designed to run large Language models, which help AI systems process large amounts of data faster, will first power Microsoft's own AI applications before being made available to partners and customers."

● Scott Guthrie, executive vice president of Microsoft's Cloud and Artificial Intelligence Group, said: "We believe Maia 100 provides a way for us to provide customers with faster, lower-cost, higher-quality solutions."

To sum up, a chip with lower cost and better energy consumption is specially designed for AI.

Microsoft Chairman and CEO Satya Nadella Source: Live broadcast screenshot

Cobalt 100 is a cloud-native chip based on Arm architecture optimized for performance, power and cost-effectiveness for general-purpose workloads. It has 128 cores and is billed as "the fastest CPU among all cloud computing vendors." It is already used in some of Microsoft's businesses and will be available next year.

How does this chip perform? Microsoft said preliminary tests show that Cobalt 100 performance improves data center performance by 40% over existing commercial Arm servers.

At present, Microsoft has not disclosed the details of Cobalt 100, but there are rumors that Cobalt 100 is designed based on Arm “Genesis” Neoverse CSS N2 IP.

Data shows that Neoverse CSS N2 can be expanded from 24, 32 and 64 cores per chip, and has interfaces to connect DDR5, LPDDR5, PCIe, CXL and other types of IP. The die areas are 53 square millimeters (24 cores), 61 square millimeters (32 cores), and 198 square millimeters (64 cores) respectively.

The choice of Arm technology is a key element of Microsoft's sustainability goals, which aim to optimize "performance per watt" across the data center, which essentially means getting more computing power per unit of energy consumed.

Microsoft has long wanted to have an alternative to the X86 architecture in its fleet, saying back in 2017 that it aimed to have Arm servers accounting for 50% of its server computing power.

Microsoft was an early customer of Cavium/Marvell and its "Vulcan" ThunderX2 Arm server CPUs a few years ago, and when Marvell decides to mothball ThunderX3 in late 2020 or early 2021, Microsoft is expected to be a big buyer of the "Triton" ThunderX3 follow-up CPUs.

In 2022, Microsoft accepted Ampere Computing's Altra series of Arm CPUs and began to put them into its server fleet in large numbers, but there have been rumors that Microsoft is developing its own Arm server CPUs.

Internet giants all love chips

By 2023, it seems that Internet giants making chips are nothing new. To put it bluntly, whether it is CPU or AI chips, if the supply is all from external sources, or even products can only be obtained from one or two companies, it will be very scary. thing. The intention of Maia 100 and Cobalt 100 is also very obvious, which is to face the dominance of the three "red, green and blue" manufacturers in the world today, face the challenge of insufficient supply of top AI chips, and face the challenge of x86 architecture dominating cloud services.

It is worth mentioning that before Microsoft, there were already two precedents, Google and Amazon. So how are these two doing now?

First, let’s look at Google.

Google began launching its self-developed AI tensor processing unit (TPU) in 2016, and by September this year it has developed to the fifth generation - Cloud TPU v5e, which is designed to provide the cost-effectiveness and performance required for large and medium-scale training and inference. And design. TPU v5e Pods can balance performance, flexibility and efficiency, allowing up to 256 chip interconnects, aggregate bandwidth exceeding 400Tb/s and INT8 performance of 100petaOps, enabling corresponding platforms to flexibly support a range of inference and training requirements.

Currently, Google is using TPU chips on a large scale to support its application products, such as the chatbot Bard and Google Cloud Platform. Currently, more than 90% of Google's artificial intelligence training work uses these chips. The TPU chip system supports Google's main businesses including search engines.

Thomas Kurian, CEO of Google Cloud Platform, said that the latest TPU chips are becoming one of the biggest selling points of Google Cloud. In addition to Anthropic, other high-profile startups in the field of artificial intelligence, such as Hugging Face and AssemblyAI, are also using Google TPU chips on a large scale.

From a technical perspective, compared with general-purpose GPUs such as NVIDIA A100/H100, the original intention of Google's TPU design is to focus on the field of deep learning, especially to comprehensively accelerate neural network training and inference efficiency. Nvidia's A100 and H100 are general-purpose GPUs in a broad sense, and are not limited to deep learning and artificial intelligence. These GPUs have general computing capabilities and are suitable for a variety of computing workloads, including but not limited to: high-performance computing (HPC), deep learning, and large-scale data analysis.

Compared with NVIDIA's general-purpose GPU, Google TPU uses low-precision calculations, which greatly reduces power consumption and speeds up calculations without affecting the deep learning processing effect. It is especially sufficient for medium-sized LLM designers, so they There may be no need to rely on the high-performance NVIDIA A100/H100. At the same time, TPU uses designs such as systolic arrays to optimize matrix multiplication and convolution operations. Google TPU pursues focusing on AI training and reasoning, so it streamlines part of the design architecture. This is part of the reason why TPU power consumption, memory bandwidth, and FLOPS are significantly lower than Nvidia H100.

Secondly, let’s look at Amazon.

Amazon Cloud Technology (AWS) announced in 2020 the launch of its self-developed chip Trainium for training AI models.

Previously, Amazon launched the first machine learning chip called Amazon Inferentia. As the name suggests, it is used for inference. The workload of inference in actual applications is very large. The performance and throughput of the Amazon Inferentia chip can meet actual requirements, and , the cost of Inf1 instances is much lower than that of GPU-based solutions.

Although the load of inference is large, general enterprises also often encounter training tasks. The training process of machine learning often requires the use of expensive GPUs, so the cost of training is usually very high. In order to reduce costs, Amazon launched the Amazon Trainium chip. It is said that Trn1 instances (or clusters) using this chip can provide the fastest and lowest-cost training service in the cloud.

The Trn1 instance has a maximum memory bandwidth of 13.1TB/s, a computing power of 3.4 PFLOPS, FP32 TFlops of up to 840, a clock frequency of 4GHz, and contains 55 billion transistors.

According to information provided by Amazon, when training deep learning models, the cost of Trn1 instances using Amazon Trainium chips is up to 40% lower than P4d instances using NVIDIA A100, and the speed is up to 50% faster.

AWS's self-developed Trainium chip is gradually gaining a place in the field of AI large model training. It has hundreds of customers both internally and externally, and has a tendency to surpass Google TPU to become the second largest player.

[1] [2]

Keywords：CPU GPU AI Reference address：What’s the logic behind Microsoft’s self-developed chips?

Previous article：Chinese scientists develop the first fully analog optoelectronic intelligent computing chip
Next article："Aixin Yuansu" - Aixin Yuanzhi officially launches the automotive brand

Popular Resources
Popular amplifiers