AI computing, why use GPU?

Latest update time：2024-01-09

Reads：

The industry usually divides semiconductor chips into digital chips and analog chips. Among them, digital chips account for a relatively large market share, reaching about 70%.

Digital chips can be further subdivided into: logic chips, memory chips and micro control units (MCU).

This article will focus on logic chips.

Logic chips are actually computing chips to put it bluntly. It contains various logic gate circuits, which can realize operations and logical judgment functions. It is one of the most common chips.

The CPU, GPU, FPGA, and ASIC that everyone often hears about are all logic chips. And the so-called "AI chips " used in AI, which is particularly popular now , mainly refer to them .

█ CPU (Central Processing Unit)

Let’s first talk about the most familiar CPU , the full English name is Central Processing Unit, central processing unit.

CPU

As we all know, the CPU is the heart of the computer.

Modern computers are all based on the von Neumann architecture, which was born in the 1940s. In this architecture, it includes components such as arithmetic unit (also called logical operation unit, ALU), controller (CU), memory, input device, and output device.

Von Neumann architecture

When the data comes, it will be put into the memory first. Then, the controller will get the corresponding data from the memory and give it to the arithmetic unit for calculation. After the operation is completed, the result is returned to memory.

This process also has a more interesting name: "Fetch (fetch ) -Decode ( decode ) - Execute ( execute ) -Memory Access ( memory access ) -Write Back ( write back ) ".

As you can see, the CPU is responsible for the two core functions of the arithmetic unit and the controller.

Specifically, arithmetic units ( including adders, subtractors, multipliers, and dividers ) are responsible for performing arithmetic and logical operations and are the real work. The controller is responsible for reading instructions from memory, decoding instructions, and executing instructions.

In addition to the arithmetic unit and controller, the CPU also includes components such as clock modules and registers (cache).

The clock module is responsible for managing the CPU time and providing a stable time base for the CPU. It drives all operations in the CPU and schedules the work of each module by periodically sending out signals.

Registers are high-speed memories in the CPU that are used to temporarily store instructions and data. Its "buffer" between the CPU and memory (RAM) is faster than ordinary memory, preventing the memory from "drag " the CPU's work.

The capacity and access performance of the register can affect the number of accesses to the memory by the CPU, thereby affecting the efficiency of the entire system. We will mention it later when we talk about memory chips.

CPUs are generally classified based on instruction set architecture, including x86 architecture and non-x86 architecture. x86 is basically a complex instruction set ( CISC ), while non-x86 is basically a reduced instruction set ( RISC).

PCs and most servers use the x86 architecture, dominated by Intel and AMD. There are many types of non-x86 architectures, which have risen rapidly in recent years, including ARM, MIPS, Power, RISC-V, Alpha, etc. It will be introduced specifically later.

█ GPU (graphics processing unit)

Let’s take a look at the GPU.

GPU is the core component of the graphics card . Its full English name is Graphics Processing Unit, graphics processing unit (graphics processor).

GPU cannot be equated with graphics card. In addition to the GPU, the graphics card also includes video memory, VRM voltage stabilizing module, MRAM chip, bus, fan, peripheral device interface, etc.

graphics card

In 1999, NVIDIA took the lead in proposing the concept of GPU.

The reason why GPU is proposed is because the game and multimedia business developed rapidly in the 1990s. These businesses put forward higher requirements for the computer's 3D graphics processing and rendering capabilities. Traditional CPUs couldn't handle it, so GPUs were introduced to share the work.

According to the form, GPU can be divided into independent GPU (dGPU, discrete/dedicated GPU) and integrated GPU (iGPU, integrated GPU), which are often referred to as independent graphics and integrated graphics.

GPUs are also computing chips. Therefore, like the CPU, it includes components such as arithmetic units, controllers, and registers.

However, because the GPU is mainly responsible for graphics processing tasks, its internal architecture is very different from that of the CPU.

As shown in the figure above, the number of CPU cores (including ALU) is relatively small, only a few dozen at most. However, the CPU has a large cache (Cache) and a complex controller (CU) .

The reason for this design is because the CPU is a general-purpose processor. As the main core of the computer, its tasks are very complex. It must deal with different types of data calculations and respond to human-computer interaction.

Complex conditions and branches, as well as synchronization and coordination between tasks, will bring a lot of branch jumps and interrupt processing work. It requires a larger cache to save various task states to reduce the delay during task switching. It also requires more complex controllers for logic control and scheduling.

The strength of the CPU is management and scheduling. The real work function is not strong (ALU accounts for about 5%~20%).

If we think of the processor as a restaurant, the CPU is like an all-round restaurant with dozens of senior chefs. This restaurant can cook all kinds of cuisines, but because there are so many cuisines, it takes a lot of time to coordinate and prepare the dishes, and the serving speed is relatively slow.

The GPU is completely different.

GPUs are designed for graphics processing, and their tasks are very clear and single. What it does is graphics rendering. Graphics are composed of massive pixels , which are large-scale data with highly unified types and no dependence on each other.

Therefore, the task of the GPU is to complete parallel operations on a large amount of homogeneous data in the shortest possible time . The so-called "chores" of scheduling and coordination are actually very few.

Parallel computing, of course, requires more cores.

As shown in the previous figure, the number of cores of the GPU far exceeds that of the CPU, and can reach thousands or even tens of thousands (so it is called "many-core").

RTX4090 has 16384 stream processors

The core of the GPU, called Stream Multi -processor (SM ), is an independent task processing unit .

The entire GPU is divided into multiple streaming processing areas. Each processing area,contains hundreds of cores. Each core is equivalent to a simplified version of the CPU, with the functions of integer operations and floating point operations, as well as queuing and result collection functions.

The GPU controller has simple functions and relatively few caches. Its ALU ratio can reach more than 80%.

Although the processing power of a single GPU core is weaker than that of a CPU, its large number makes it very suitable for high-intensity parallel computing. Under the same transistor scale, its computing power is stronger than that of the CPU.

Let’s take the restaurant as an example. A GPU is like a monolithic restaurant with thousands of junior chefs. It is only suitable for certain cuisines. However, because there are many chefs and the side dishes are simple, everyone cooks together and the food is served faster.

CPU vs GPU

█ GPU and AI computing

As we all know, everyone is rushing to buy GPUs for AI computing now . Nvidia also made a lot of money from this. Why is this so?

The reason is simple, because AI computing, like graphics computing, also contains a large number of high-intensity parallel computing tasks.

Deep learning is currently the most mainstream artificial intelligence algorithm. From a process perspective, it includes two links: training and inference .

In the training process, a complex neural network model is trained by feeding a large amount of data. In the reasoning process, the trained model is used to infer various conclusions using a large amount of data.

Since the training process involves massive training data and complex deep neural network structures, it requires a very large scale of calculation and requires relatively high computing power performance of the chip. The reasoning process has high requirements for simple specified repeated calculations and low latency.

The specific algorithms they use, including matrix multiplication, convolution, loop layer, gradient operation, etc., are decomposed into a large number of parallel tasks, which can effectively shorten the time to complete the task.

With its powerful parallel computing capabilities and memory bandwidth, GPU can handle training and inference tasks well, and has become the industry's preferred solution in the field of deep learning.

Currently, most companies use NVIDIA GPU clusters for AI training. If properly optimized, a GPU card can provide the computing power equivalent to dozens to hundreds of CPU servers.

NVIDIA HGX A100 8 GPU components

However, in the inference segment, the market share of GPU is not that high. We will talk about the specific reasons later.

Applying GPU to calculations other than graphics first originated in 2003.

That year, the concept of GPGPU (General Purpose computing on GPU, GPU-based general computing) was first proposed. It means using the computing power of GPU to perform more general and extensive scientific calculations in non-graphics processing fields.

On the basis of traditional GPU, GPGPU has been further optimized and designed to make it more suitable for high-performance parallel computing.

In 2009, several Stanford scholars caused a sensation by demonstrating for the first time the results of using GPUs to train deep neural networks.

A few years later, in 2012, two students of Geoffrey Hinton, the father of neural networks — Alex Krizhevsky and Ilya Sutskwo (Ilya Sutskever ) , using the "deep learning + GPU " solution, proposed the deep neural network AlexNet, which increased the recognition success rate from 74% to 85%, winning the Image Net Challenge in one fell swoop.

From left: Ilya Sutskvo, Alex Krichevsky, Jeffrey Hinton

This completely detonated the "AI+GPU" wave. Nvidia quickly followed up and invested a lot of resources, improving GPU performance by 65 times in three years.

In addition to hard-core computing power, they are also actively building a development ecosystem around GPUs. They have established a CUDA (Compute Unified Device Architecture) ecosystem based on their own GPUs , providing a complete development environment and solutions to help developers more easily use GPUs for deep learning development or high-performance computing.

These early careful layouts ultimately helped Nvidia reap huge dividends when AIGC broke out. Currently, their market value is as high as US$1.22 trillion (nearly 6 times that of Intel), making them the veritable “uncrowned king of AI.”

So, is computing in the AI era a one-size-fits-all approach to GPUs? The FPGAs and ASICs we often hear about seem to be good computing chips as well. What are their differences and advantages?

Welcome everyone to add in the comment area~

references:

1. "Understand the concept and working principle of GPU in one article", open source LINUX;

2. "Overview of AI Chip Architecture System", Zhihu, Garvin Li;

3. "What are the differences between GPU, FPGA, and ASIC accelerators?" ", Zhihu, nonsense talk;

4. "Take you in-depth understanding of GPU, FPGA and ASIC", a frontline observation of the automotive industry;

5. "Why GPU is the core of computing power in the AI era", Mu Xi Integrated Circuit;

6. "A comprehensive overview of the three mainstream chip architectures for autonomous driving", digital transformation;

7. " AIGC Computing Power Panorama and Trend Report ", Qubits;

8. Baidu Encyclopedia and Wikipedia.

-END-

This article is reprinted from "Xianzao Classroom" by Global Internet of Things Observation . The content is the independent opinion of the author and does not represent the position of Global Internet of Things Observation. It is for communication and learning purposes only. If you have any questions, please contact us at info@gsi24.com .