The savior of big chips: heterogeneous integration

Latest update time：2024-11-05

Reads：

????If you hope to meet more often, please mark the star ?????? and add it to your collection~

The launch of ChatGPT in 2022 has sparked exponential growth in artificial intelligence (AI) and high-performance computing (HPC) applications, making AI increasingly important to everyday life. Large AI models excel at handling complex tasks, but they require large training data sets and large computing systems. These large computing workloads result in larger chips with higher power densities, making it more difficult to design energy-efficient architectures. However, even as traditional scaling slows, the demand for computing continues to grow.

Therefore, heterogeneous integration (HI) of chips is critical to achieving high system throughput (tera operations per second or TOPS) and energy efficiency (TOPS/W) to meet the growing computing needs. By splitting the system on chip (SoC) into multiple chiplets and integrating them into a single package, the design flexibility, functionality, bandwidth, throughput, and latency of the system can be significantly improved. This can be achieved by pulling the chiplets closer horizontally, vertically, or even bidirectionally, allowing more memory or logic to be integrated in a single package. In addition, reducing the size of the die and performing known good chip (KGD) testing before packaging can achieve a higher level of control over chip performance, thereby improving yield and reducing overall cost.

HI is a potential solution for implementing high-performance systems dedicated to training large generative AI models. By integrating chips such as high bandwidth memory (HBM), central processing unit (CPU) and graphics processing unit (GPU) into one package, throughput, latency and energy efficiency are significantly improved, and the limitations of traditional 2D monolithic chip design are overcome.

Today, semiconductor companies such as Nvidia, Intel, and AMD have leveraged HI technologies in their products to run real-time generative AI models and train LLMs (large language models) with billions of parameters. In this review, we first introduce current and emerging HI technologies and discuss their advantages and current limitations. We then survey recent commercial deployments of HI architectures designed for high-computation AI workloads by semiconductor companies such as Cerebras, Nvidia, AMD, Intel, and Tesla.

Finally, we also summarize the recent advances in glass core packaging and evaluate their advantages and limitations.

Current Trends in Heterogeneous Integration Technologies

The main motivation for partitioning SoC into chiplets is to improve system functionality and reduce manufacturing costs. To improve the performance of these chiplet-based systems, multiple innovations have emerged in multi-chip HI architectures. We classify multi-die architectures as 2D, 2.5D, or 3D based on the definition of the IEEE Electronic Packaging Society (EPS) Heterogeneous Integration Roadmap and provide an overview in Figure 1. Table 1 summarizes current heterogeneous integration technologies.

Multi-chip module architecture

Multi-chip-Modules (MCMs) are one of the earliest multi-chip 2D architectures where chips are placed laterally on an organic substrate to reduce wire length and increase package bandwidth, thereby improving system performance and design flexibility. This is one of the simplest integration technologies, however, the interconnect density of MCMs can be limited due to the use of traditional organic substrates and coarse solder-based bonding techniques. These solder-based interconnects (such as C4 bumps) are difficult to scale down to finer pitches because adjacent interconnects short during the bonding process, limiting system performance. For large AI systems, low latency and efficient memory access are required, however, scaling MCMs to larger systems is difficult due to limited interconnects, which can become a bottleneck.

Middleware architecture

These challenges have led to the emergence of 2.5D architectures, which utilize substrates such as glass, silicon interposers, or local silicon bridges to increase lateral interconnect density. Fine-pitch microbumps and through-silicon via (TSV) technologies can increase the interconnect density of chips stacked on glass or silicon interposers.

However, as computing demands grow, scaling the intermediary layer to large-scale AI systems can be costly.

Therefore, bridge-based architectures, such as Intel’s Embedded Multi-die Interconnect Bridge (EMIB), utilize local silicon embedded in the package substrate and multiple routing layers to achieve finer routing pitches. Inter-die signals are located in the local silicon bridge, and power/ground interconnects and other signals are located in the organic package, eliminating the need for TSVs and simplifying the assembly process.

Similar to EMIB, the elevated fanout bridge (EFB) uses a local silicon bridge to increase the interconnect density between chips, and the bridge is located above the package substrate. This approach can further reduce assembly cost and complexity. Compared with 3D HI, bridge-based technology has higher design functionality, lower design complexity, and simpler thermal management, so it is expected to be used in large-scale AI systems. However, traditional interconnect technologies such as microbumps may limit their system performance. This has led to new bonding technologies such as copper-to-copper bonding as a potential solution to overcome this limitation.

Wafer Level Packaging

Wafer-Level Packaging (WLP) technologies are of great significance for advanced chip-based architectures because they can achieve high interconnect density, reduce interconnect delay, and increase bandwidth. By fanning out chip I/O signals instead of using traditional interconnects (such as wire bonding or C4 bumps), high integration density can be achieved, making WLP suitable for high-performance systems. In traditional WLP, KGDs are encapsulated in epoxy mold compound (EMC) to form a reconstructed wafer.

However, EMC can cause manufacturing problems due to the mismatch in coefficient of thermal expansion (CTE) between the EMC and the chip, resulting in warping and chip shift/misalignment, and the low thermal conductivity of the material makes power dissipation difficult for high-power systems. Therefore, alternative materials have been proposed to embed/package the chip.

3D Architecture

3D HI technology is a promising approach to meet the computing needs of AI systems. Using TSV and fine-pitch interconnect technologies such as micro-bumps or hybrid bonding, 3D stacking can achieve high-bandwidth and low-latency systems. Many semiconductor companies have developed their own 3D architectures, including Intel's Foveros, Samsung's X-Cube, and AMD's 3D V-Cache product, which uses TSMC's integrated chip system (SoIC) technology. SoIC technology divides the SoC into multiple chips that can be reintegrated into various 3D configurations. This allows flexible integration of different technology nodes,

Passive and active chips of materials and die sizes (see Figure 2) to support memory bandwidths exceeding 20 Tbps.

Compared to traditional 3D IC microbumps, hybrid bonding significantly increases bonding density by 16 times and reduces electrical parasitic effects such as IR drop, reducing energy consumption per bit. In addition to finer interconnect pitch, SoIC technology also has higher metal wiring density and thinner bonding layers, which can improve thermal performance. However, the technology faces similar challenges as traditional 3D IC. Shrinking the hybrid bonding pitch is becoming increasingly difficult due to strict surface cleanliness and chemical mechanical polishing (CMP) requirements.

It is important to note that 3D system bandwidth is determined by the total number of chips in the stack and the size of the bottom die. While increasing the number of chips in a 3D stack is desirable for increasing memory bandwidth or computing power, assembly complexity and cost can increase significantly. Additionally, heat dissipation and mechanical stability become more difficult. Liquid cooling has been proposed as a potential solution to aid heat dissipation, however, this area is beyond the scope of this article.

Recently, other 3D architectures using WLP technology have also emerged. M.-J. Li et al. proposed a wafer-level chip reconstruction technology called three-dimensional integrated chip packaging (3D-ICE), in which multiple chips are encapsulated in low-temperature SiO2 to form a reconstructed SiO2 layer, as shown in Figure 3. This SiO2 layer can then be post-processed to achieve high-density 3D HI. Similarly, Intel proposed a quasi-monolithic chip (QMC) as a new 3D HI architecture, in which the chip is also encapsulated in an ultra-thick silicon dioxide layer. SiO has multiple advantages as a packaging material. Due to its low-loss characteristics, it can facilitate high-speed signal transmission, and since no curing is required, there is basically no chip shift or misalignment, and it is compatible with existing CMOS manufacturing processes, blurring the boundary between packaging processing and device processing.

Although SiO2 has excellent electrical properties, the material has low thermal conductivity, which can lead to poor thermal performance. Therefore, A. Victor et al. proposed a chip reassembly process with an integrated heat sink. A 30 µm thick passive chip is encapsulated in 15 µm thick ICP-PECVD SiO2. The oxide deposited on top of the chip is etched away, and then 36 µm of copper is electroplated on the chip. The monolithic copper heat sink helps reduce the maximum junction temperature of the chip layer, thus solving the electrical and thermal performance trade-off faced by most FOWLP solutions.

Heterogeneous integration trend of artificial intelligence

Current landscape of HI products

The rapid development of AI has driven multiple commercial deployments of HI architectures specifically designed to accelerate the largest AI workloads. In this section, we survey recently reported industry offerings and summarize their specifications in Table 2.

In 2024, Cerebras launched the WSE-3, a wafer-scale AI accelerator that is twice as fast as the WSE-2 and designed to train models 10 times larger than GPT-4 and Claude. Interestingly, Cerebras uses traditional device scaling and wafer-level integration to go beyond Moore's Law. With TSMC's 5nm technology, four trillion transistors are manufactured on a single wafer, and the chip size is about 57 times that of a GPU. However, the compute and memory components are separated to enable memory capacity expansion, so that a single WSE-3 system can store and train a model with 24 trillion parameters more efficiently than a cluster of 10,000 GPUs.

Compared to Cerebras, other semiconductor companies are using advanced packaging technologies to design large-scale AI systems. Nvidia announced the launch of the GB200 Grace Blackwell chip, which consists of two Blackwell GPUs and one Grace CPU. The chip is designed for large language models with more than 10 trillion parameters and 384 GB of off-chip memory, and the total device power is 2700 W. To achieve this goal, Nvidia used TSMC's wafer-on-substrate (CoWoS)-L packaging technology. This packaging technology uses local silicon interconnect (LSI) chips and reconstructed interposers to achieve high-performance systems with large integration area, bandwidth, and low latency.

AMD uses a chiplet approach in its MI300X package, combined with interposer technology and 3D stacking to achieve high performance and memory bandwidth. The MI300X consists of multiple GPU chiplets, I/O chips, and 192 GB of high-bandwidth memory (HBM), with a total device power of 750 W. The CPU complex chip (CCD) and accelerator complex chip (XCD) are stacked in 3D on the I/O chip (IOD) to achieve low signal latency. Finally, a large silicon interposer is used to integrate the 3D stack and high-bandwidth memory (HBM) chips to achieve a high-performance system.

Intel's Gaudi-3 accelerator product uses its embedded bridge chip technology to integrate two Intel computing chips with 128 GB HBM to enhance large-scale AI systems. Similar to other bridge-based interposer technologies, EMIB allows Intel to increase design functionality and reduce assembly costs. Although the Gaudi-3 accelerator is not as powerful as Nvidia's H100, it is a cost-effective high-performance system.

Finally, Tesla entered the AI market with Dojo, a chip optimized for training large neural networks.

With a total device power of 400 W, which is much lower than competitors, Dojo is designed for real-time data processing in driving situations. Tesla is using TSMC’s Integrated Fan-Out System on Wafer (InFo-SoW) technology to achieve a high-density, low-latency system.

In summary, as the size and complexity of AI models continue to grow, technology has shifted toward HI and emerging HI technologies.

Inter-chip interface and communication protocol

As the number of chips in a single system increases, chip-to-chip (D2D) interfaces become increasingly important for data movement between the various components. AMD’s Infinity Fabric and Intel’s Advanced Interface Bus (AIB) are D2D interfaces used in their AI accelerator products to minimize latency and maximize bandwidth.

However, as systems become more diverse, with chips from different vendors, the Universal Chip Interconnect Express (UCIe) protocol has begun to become a common industry standard. Standard D2D protocols are essential for design flexibility and scalability, especially for large-scale AI and HPC systems and network systems. Figure 4 shows a summary of different standard protocols for heterogeneous computing.

Glass packaging

The emergence of glass core substrate packaging

AI applications typically require larger interposers and very high-density interconnects to achieve high bandwidth. These stringent requirements, coupled with reliability and performance, require the development and implementation of advanced packaging technologies to build large packages.

With the demand for more advanced packaging technologies for AI and HPC applications, the use of glass as a core substrate has recently attracted great attention due to its many advantages. Intel recently demonstrated their first glass substrate test chip and announced their trajectory towards glass packaging to meet the demand for more powerful computing. (Figure 5 (a)) Absolics Inc., a subsidiary of South Korea's SKC, has also begun preparing for small-volume manufacturing (SVM) of its glass substrates (Figure 5 (b)), aiming to target hyperscalers such as Amazon, Meta, and Microsoft as potential customers.

Advantages of glass core packaging

Glass-based interposers enhance the bandwidth capabilities of semiconductor packages for AI applications by improving signal integrity, supporting high-density interconnects, integrating optical communications, optimizing thermal management, and ensuring reliability and scalability. These properties make glass interposers an important component for enabling high-performance computing and realizing advanced AI capabilities. The smooth surface/very low surface roughness of glass allows for scaling of fine lines and spaces, which is critical for achieving very high-density interconnects.

In addition, the surface structure of glass composed of Si-O bonds facilitates the adhesion of various polymer materials used as dielectric resins and photosensitive resins. Combining the low dielectric constant of glass with the low dielectric constant accumulation layer of the multi-layer interposer structure can significantly reduce the latency of the system. This feature plays a vital role in minimizing signal propagation delay and reducing crosstalk between adjacent interconnects, which is particularly beneficial for high-speed electronic devices and co-packaged optical devices.

In addition, glass substrates reduce the capacitance between interconnects, enabling faster signal transmission and improving overall system performance. In critical applications where speed is critical, such as data centers, telecommunications, and high-performance computing, the use of glass substrates can greatly improve system efficiency and increase data throughput.

In addition, the low dielectric constant of glass also supports excellent impedance control, which is critical to maintaining signal integrity throughout the circuit. This feature is particularly beneficial in RF applications, where precise impedance matching is critical to optimizing power transfer and minimizing signal loss. Glass substrates ensure consistent electrical properties across the entire substrate surface, enabling the design and production of high-frequency circuits with higher reliability and performance.

In addition, compared to organic packaging, glass has excellent dimensional stability, which helps improve interlayer accuracy, which is the key to achieving very high interconnect density in multi-layer glass interposers. This not only helps reduce pad size, but also helps shrink fine lines and traces to <1μm, thereby increasing the number of IOs in each redistribution layer in the multi-layer interposer. In addition, the coefficient of thermal expansion (CTE) of glass substrates is in the range of 3-12 ppm/◦C. This can alleviate the CTE mismatch problem between glass and silicon (CTE=3 ppm/◦C) chips and glass and printed wiring boards (CTE=17 ppm/◦C).

The ability to structure glass is another advantage of glass core substrates for packaging and interposer applications.

Glass structures can be any of the following types: (a) Through Glass Vias (TGV), (b) Blind Glass Cavities (BGC), or (c) Through Glass Cavities (TGC). TGVs can be formed by Laser Induced Deep Etching (LIDE), where the glass is first locally laser modified, followed by a wet chemical etching process to minimize the accumulation of microcracks during the manufacturing process. BGCs and TGCs can be easily formed by laser processing, followed by a wet etching process when necessary. BGCs and TGCs are important for embedding chips into BGCs and TGCs, which is called Glass Panel Embedding (GPE). Cavities of the required size are manufactured and chips are placed into these cavities using automated chip pick and place tools with an accuracy of a few microns. The GPE process is well suited for heterogeneous integration, where chips of different sizes and functions, including passive components such as capacitors and magnetic inductors, are built into the package. In this approach, capacitors and inductors are kept close to where they are needed for applications such as power delivery/IVR etc. Figure 6 shows a typical process flow used in GPE.

With advanced GPE processes, thermal solutions can be easily integrated into the package to remove heat. For example, for GPE with TGC, thermal insulation and heat sinks can be attached to the back of the glass substrate. For BGC, heat sinks can be added after thinning/grinding the substrate to remove heat. GPE architecture can be easily adapted from 2.5D architecture to include 3D integration, where one of the following approaches can be used:

(a) For example, a logic chip can be embedded in a glass cavity together with RDLs on the top and bottom of the glass core, and then a memory chip can be assembled on top to produce a 3D structure with short interconnect distances and much smaller form factors, thereby significantly reducing the height of the package;

(b) Passive chips can be embedded in structured glass, and multiple chips can be assembled on the glass packaging structure through flip chip technology;

(c) In addition, GPE enables advanced packaging concepts such as co-packaged optics, where the electronic chip can be embedded in a glass cavity (with the aforementioned thermal solution on the back of the chip) and the photonic chip (PIC) assembled on top of the package. With the PIC mounted on top, the fiber coupler can be easily installed from the top, as well as any required thermal solution.

Finally, in addition to its various superior properties, glass has fewer restrictions on the substrate format in the package. While silicon can only be processed in round wafers, glass can achieve panel processing, thereby reducing costs. For example, a 300 mm wafer can accommodate 2,500 packages of 6 mm x 6 mm size, while a 600 mm x 600 mm panel can accommodate 12,000 packages.

Current limitations of glass

The inherent fragility of glass substrates presents significant challenges, especially as the industry adopts thinner substrates to meet the demand for higher device integration and performance. Thin glass sheets, sometimes as thin as 100µm or less, are particularly susceptible to damage during handling and manufacturing. This risk of cracking or shattering under pressure highlights the need for specialized equipment and customized processes designed to safely handle this material.

In addition to processing difficulties, glass exhibits relatively low heat dissipation. Although glass conducts heat better than organic laminates, it conducts heat poorly compared to silicon. To overcome the limitations associated with the low thermal conductivity of glass, methods have been demonstrated to incorporate copper structures such as through-package vias (TPVs), copper bumps, and copper traces in redistribution layers (RDLs) into glass substrates [107]. In addition, next-generation thermal interface materials (TIMs) for embedded and substrate-based packaging are also under active development, with a focus on reducing the thermal interface resistance to achieve maximum heat transfer from the chip.

Acknowledgements

The authors of this article are gratefully acknowledged: MADISON MANLEY, ASHITA VICTOR, HYUNGGYU PARK, ANKIT KAUL, MOHANALINGAM KATHAPERUMAL, AND MUHANNAD S. BAKIR from Georgia Institute of Technology.

END

????Semiconductor boutique public account recommendation????

▲Click on the business card above to follow

Focus on more original content in the semiconductor field

▲Click on the business card above to follow

Focus on the trends and developments of the global semiconductor industry

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 3937th content shared by "Semiconductor Industry Observer" for you, welcome to follow.