"Computing power, storage power, and transportation power" are the three swords of today's data centers. As the number of parameters in generative AI and multimodal large models grows exponentially, their demand for computing capacity and high-bandwidth memory is becoming more urgent.
HBM (High Bandwidth Memory) is highly favored due to its huge memory bandwidth capability and has gradually become the "strongest auxiliary" of GPU. In other words, if you want to bring out the full capabilities of GPU, you must do a good job of HBM.
Today, JEDEC is moving towards finalizing the HBM4 memory specification, and the entire industry is also accelerating the commercialization of HBM 4. In September this year, Rambus announced the launch of the industry's first HBM4 controller IP. As a manufacturer that has moved quickly on HBM 4, what is its understanding of HBM 4 and how does it quickly complete product iterations? Recently, Rambus revealed more details to EEWorld at a communication meeting.
Why HBM Wins in the AI Era
According to Rambus researcher and distinguished inventor Dr. Steven Woo, AI can usually be divided into two processes: AI training and AI reasoning. AI training is one of the most challenging and difficult tasks in the current computing field. The amount of data managed and processed at this stage is extremely large. If the training process can be completed faster, it means that the AI model can be put into use earlier, thereby maximizing the return on investment. AI reasoning has high performance requirements, especially reasoning speed and accuracy.
The two stages pose their own challenges to memory performance: in the training stage, the memory needs to be fast enough, powerful enough, and small enough; in the inference stage, the memory also needs shorter latency and higher bandwidth, because the inference results must be given quickly and almost in real time.
This trend is reflected in market data. According to research, memory requirements for speed, capacity, and size are increasing more than 10 times each year, and this trend has shown no signs of slowing down since 2012.
For example, GPT-3, released in November 2022, uses 175 billion parameters, while the latest version GPT-4o, released in May this year, has more than 1.5 trillion parameters, which is equivalent to a 400-fold increase, but in the same period, the hardware memory size has only increased by 2 times. This means that in order to complete these AI model tasks, additional GPUs and AI accelerators must be invested to meet the demand for memory capacity and bandwidth.
At present, DRAM memory is mainly divided into several mainstream forms: DDR, LPDDR, GDDR, and HBM. In comparison, DDR is the memory in the traditional PC field; LPDDR is a low-power DDR designed for mobile devices, which also performs well in AI edge reasoning systems; GDDR is a DDR memory designed for graphics processing, which is now widely used in AI reasoning tasks. Because it achieves a good balance in bandwidth, cost and reliability, it also has certain applications in automobiles and networks; HBM high-performance memory has an extremely high bandwidth density, which is much higher than the common ordinary DRAM on the market, and is very suitable for AI training, high-performance computing and network applications.
Many AI systems will choose the corresponding memory type based on the specific needs of the actual application scenario. But from a performance perspective, HBM is far ahead.
Why does HBM have advantages in bandwidth, capacity and speed?
Steven Woo explained that from a structural point of view, in the HBM interposer, DRAM is connected to the processor, and the interposer is connected to the substrate, and finally the substrate is soldered to the PCB. In such a structure, the DRAM stack uses a multi-layer stack architecture to improve memory bandwidth, capacity and energy efficiency. In a large number of chip stacks, some memory chips are directly connected to the processor to achieve extremely fast transmission speeds; more importantly, taking HBM 3 as an example, there are 1024 wires connecting the DRAM stack and the SoC, and thousands of signal paths far exceed the range that a standard PCB can support. The silicon interposer can be similar to an integrated circuit, and can etch signal paths with very small spacing, thereby achieving the required number of signal lines to meet the requirements of the HBM interface.
It is precisely because of this sophisticated structural design and the stacking method of HBM DRAM that HBM memory can provide extremely high memory bandwidth, excellent energy efficiency, and extremely low latency while occupying the smallest area.
The rapid development of the HBM specification is driven by the growing demand for AI applications as they evolve from machine learning to more general and widely deployed AI. These applications pose critical performance and efficiency challenges to the underlying computing infrastructure.
What is different about HBM 4?
If you follow the news on memory, you will find that the giants have paid close attention to HBM 4 recently: SK Hynix's next-generation HBM4 was taped out in October and will be used in NVIDIA Rubin R100 AI GPU. NVIDIA, TSMC, and SK Hynix have formed a "triangle alliance" to develop the next-generation AI GPU + HBM4; Samsung will manufacture logic chips for the next-generation HBM4 memory on the 4nm node and will enter mass production at the end of 2025. At the same time, Samsung will develop "custom HBM4" solutions for Meta and Microsoft; Micron is also focusing on hybrid bonding technology for HBM4.
Regarding the development of HBM, Steven Woo introduced that from HBM, HBM2, HBM2E, HBM3E to the latest HBM 4, the most obvious change in each generation is the sharp increase in the bandwidth of a single stack, and the bandwidth of a single HBM3E device exceeds 1.2TB/s. Judging from the current market trend, HBM3 devices are becoming very popular. Major DRAM manufacturers such as SK Hynix, Micron and Samsung have announced the launch of HBM3E devices with a data transmission rate of up to 9.6Gbps.
Although JEDEC has not yet finalized the HBM 4 specification, it is certain that the bandwidth of each stack has exceeded HBM3E. From a single stack perspective, HBM4 bandwidth will reach 1.6TB/s, and this is only the bandwidth of a single stack. The actual bandwidth may be higher in the end. Therefore, a GPU with 8 HBM4 devices will achieve an aggregate memory bandwidth of more than 13 TB/s.
Compared to HBM3, the number of channels per stack will double, and the physical footprint will be larger. HBM4 will support 6.4 Gb/s speeds, and support for higher data rates is under discussion. Although the rate per pin is reduced from HBM 3, the interface is expanded from 1024 bits to 2048 bits, significantly improving the overall bandwidth.
Details behind the HBM 4 Controller IP
As the industry introduces faster and faster HBM memory devices, Rambus plays an important role in this process as a memory controller IP provider. Its industry-first HBM4 controller IP is designed to accelerate next-generation AI workloads.
Steven Woo introduced that Rambus' HBM4 controller IP provides 32 independent channel interfaces with a total data width of up to 2048 bits. Based on this data width, when the data rate is 6.4Gbps, the total memory throughput of HBM4 can be more than twice that of HBM3, reaching 1.64TB/s.
The working principle of the Rambus HBM4 controller IP is that the core of the controller itself accepts commands using a simple local interface and converts them into the command sequence required by the HBM4 device. At the same time, it also performs all initialization, refresh and power-down functions. The core queues multiple commands in the command queue, which allows for both short-distance transmission of highly random address locations and optimal bandwidth utilization for long-distance transmission of continuous address space. The command queue is also used to perform pre-read activation, pre-charge and automatic pre-charge in a timely manner, further improving the overall throughput. The reorder function is fully integrated into the controller command queue, improving throughput and minimizing the number of gates.
To summarize briefly, the features of Rambus' HBM 4 controller IP are: support for all standard HBM4 channel densities (up to 32 Gb); support up to 10 Gbps/pin; support refresh management (RFM); maximize memory bandwidth and minimize latency through Look Ahead command processing; integrated Reorder function; achieve high clock rate with minimal routing restrictions; self-refresh and power-down low power modes; support HBM4 RAS function; built-in hardware-level performance Activity Monitor; compatible with DFI; end-to-end data parity; support AXI or native interface for user logic.
More importantly, the controller can be delivered standalone or integrated with the customer's choice of PCIe 6.2.1 PIPE compliant SerDes. The delivery includes core (source code), testbench (source code), complete documentation, expert technical support, and maintenance updates.
Steven Woo emphasized that, like the previous generation HBM3E controller, the HBM4 controller IP is also a modular, highly configurable solution. "Based on the unique needs of customers in application scenarios, we provide customized services covering size, performance, and functionality. Key optional features include ECC, RMW, and error cleaning. In addition, in order to ensure that customers can choose a variety of third-party PHYs and apply them to the system as needed, we have cooperated with leading PHY suppliers to ensure that customers can succeed in the first tape-out during the development process."
Challenges in implementing HBM 4
Previous article:Synopsys and ZAP join hands at the 2024 CIIE to help realize the world's first shieldless radiotherapy surgical robot
Next article:最后一页
Recommended ReadingLatest update time:2024-11-17 03:30
- Popular Resources
- Popular amplifiers
- Network Operating System (Edited by Li Zhixi)
- MATLAB and FPGA implementation of wireless communication
- Introduction to Internet of Things Engineering 2nd Edition (Gongyi Wu)
- Virtualization Technology Practice Guide - High-efficiency and low-cost solutions for small and medium-sized enterprises (Wang Chunhai)
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- Vietnam's chip packaging and testing business is growing, and supply-side fragmentation is splitting the market
- The US asked TSMC to restrict the export of high-end chips, and the Ministry of Commerce responded
- ASML predicts that its revenue in 2030 will exceed 457 billion yuan! Gross profit margin 56-60%
- ASML provides update on market opportunities at 2024 Investor Day
- It is reported that memory manufacturers are considering using flux-free bonding for HBM4 to further reduce the gap between layers
- Intel China officially releases 2023-2024 Corporate Social Responsibility Report
- Mouser Electronics and Analog Devices Launch New E-Book
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- Semiconductor material bottleneck
- Electronic engineers, how can we better embrace GaN? Participate in the survey to get a gift!
- EEWORLD University----UCD3138 Digital PWM (DPWM) Module
- A joke only programmers can understand
- Battery voltage detection simulation + program based on MSP430
- Help: STM32 macro definition compilation fails, please correct me
- [Nucleo G071 Review] I2C OLED&AD Collection
- The Three Realms of Oscilloscopes.pdf
- Summary of DSP 28377 online upgrade examples
- About MSP430F5438A upgrade failure