One trillion transistor GPU is coming, TSMC chairman writes an article
????If you want to see each other often, please mark it as a star???? and add it to your collection~
Editor's note
In previous speeches, TSMC has repeatedly talked about the roadmap for trillions of transistors. Today, an article signed "How We'll Reach a 1 Trillion Transistor GPU" was published on the IEEE website, describing how TSMC achieved the goal of a trillion transistor chips.
It is worth mentioning that the signed authors of this article are MARK LIU (Liu Deyin) and H.-S. PHILIP WONG, among whom Liu Deyin is the chairman of TSMC. H.-S Philip Wong is a professor at the School of Engineering at Stanford University and chief scientist at TSMC.
Here, we translate this article for the benefit of readers.
The following is the text of the article:
In 1997, IBM's Deep Blue supercomputer defeated world chess champion Garry Kasparov. It was a groundbreaking demonstration of supercomputer technology and the first demonstration that high-performance computing might one day surpass human intelligence. Over the next 10 years, we began to use artificial intelligence for many practical tasks, such as facial recognition, language translation, and recommending movies and products.
In another fifteen years, artificial intelligence has developed to the point where it can "synthesize knowledge". Generative AI, like ChatGPT and Stable Diffusion, can compose poetry, create art, diagnose disease, write summary reports and computer code, and even design integrated circuits that rival those made by humans.
Artificial intelligence faces huge opportunities as it becomes a digital assistant for all human endeavors. ChatGPT is a great example of how artificial intelligence can democratize the use of high-performance computing, bringing benefits to everyone in society.
All of these fantastic AI applications are due to three factors: innovations in efficient machine learning algorithms, the availability of large amounts of data to train neural networks, and advances in energy-efficient computing through advances in semiconductor technology. Despite its ubiquity, this final contribution to the generative AI revolution does not receive the recognition it deserves.
Over the past three decades, major milestones in artificial intelligence have been enabled by leading semiconductor technology at the time, and would not have been possible without it. Deep Blue uses a hybrid implementation of 0.6 micron and 0.35 micron node chip manufacturing technologies; the deep neural network that won the ImageNet competition and ushered in the current era of machine learning used a chip built with 40 nanometer technology; AlphaGo conquered the game of Go using 28 nanometer technology Game; The initial version of ChatGPT was trained on computers built with 5nm technology. ;The latest version of ChatGPT is powered by servers using more advanced 4nm technology. Every layer of the computer system involved, from software and algorithms to architecture, circuit design, and device technology, acts as a multiplier for AI performance. But it's fair to say that underlying transistor device technology has driven advances in the layers above.
If the AI revolution is to continue at its current pace, it will require even more contributions from the semiconductor industry. Within ten years, it will require a 1 trillion transistor GPU, which means that the number of GPU devices is 10 times the number of typical devices today.
The growing size of AI models has increased the compute and memory access required for AI training by orders of magnitude over the past five years. For example, training GPT-3 requires the equivalent of more than 5 billion computing operations per second for an entire day (i.e., 5,000 petaflops/day), and 3 terabytes (3 TB) of memory capacity.
Both the computing power and memory access required for new generative AI applications continue to grow rapidly. We now need to answer a burning question: How can semiconductor technology keep pace?
From integrated devices to integrated chiplets
Since the invention of the integrated circuit, semiconductor technology has focused on shrinking feature sizes so that we can cram more transistors into thumbnail-sized chips. Today, the level of integration has gone up a level; we are moving beyond 2D scaling into 3D system integration. We are now combining many chips into a tightly integrated, massively interconnected system. This is a paradigm shift in semiconductor technology integration.
In the era of artificial intelligence, the power of a system is directly proportional to the number of transistors integrated in the system. One of the major limitations is that photolithography chip fabrication tools are designed to fabricate ICs no larger than about 800 square millimeters, the so-called reticle limit. But we can now extend the size of integrated systems beyond the limits of photolithography reticles. By connecting multiple chips to a larger interposer (a piece of silicon with built-in interconnects), we can integrate a system that contains many more devices than is possible on a single chip. For example, TSMC's CoWoS (chip-on-wafer-on-substrate) technology can accommodate up to six reticle area computing chips, as well as a dozen high-bandwidth memory (HBM) chips.
CoWoS is TSMC’s advanced chip-on-silicon wafer packaging technology, which is currently being used in products. Examples include Nvidia Ampere and Hopper GPUs. Each of these consists of a GPU chip and six high-bandwidth memory cubes, all on a silicon interposer. Calculate the size of the GPU chip to be approximately what chip manufacturing tools currently allow. Ampere has 54 billion transistors and Hopper has 80 billion. The transition from 7nm technology to the denser 4nm technology allows for a 50% increase in the number of transistors packed into essentially the same area. Ampere and Hopper are the mainstays of large language model (LLM) training today. Training ChatGPT requires tens of thousands of such processors.
HBM is an example of another key semiconductor technology that is increasingly important to AI: the ability to integrate systems by stacking chips together, what we at TSMC call SoIC (system-on-integrated-chips). HBM consists of a stack of vertically interconnected DRAM chips on top of a control logic IC. It uses vertical interconnects called through silicon vias (TSVs) to pass signals through each chip and solder bumps to form connections between memory chips. Today, high-performance GPUs widely use HBM.
Looking to the future, 3D SoIC technology can provide a "bumpless alternative" to today's traditional HBM technology, providing denser vertical interconnections between stacked chips. Recent developments have shown that HBM test structures stacked 12 layers of chips using hybrid bonding technology, which provides higher density copper-to-copper connections than solder bumps can provide. The memory system is cryogenically bonded to a larger base logic chip and has a total thickness of just 600 µm.
For high-performance computing systems consisting of scores of chips running large AI models, high-speed wired communications can quickly limit computing speeds. Today, optical interconnects are used to connect server racks in data centers. We will soon need optical interfaces based on silicon photonics, packaged with GPUs and CPUs. This will allow for energy- and area-efficient scaling of bandwidth to enable direct optical GPU-to-GPU communication so that hundreds of servers can act as a single giant GPU with unified memory.
Due to the demand for artificial intelligence applications, silicon photonics will become one of the most important enabling technologies in the semiconductor industry.
Towards a trillion-transistor GPU
As mentioned before, typical GPU chips used for AI training have reached the reticle field limit. Their transistor count is about 100 billion. The continuation of the trend of increasing transistor count will require multiple chips to perform computations via 2.5D or 3D integrated interconnects. Integrating multiple chips through CoWoS or SoIC and related advanced packaging technologies can make the total number of transistors per system much larger than what can be squeezed into a single chip. For example, AMD MI 300A is manufactured using this technology.
The AMD MI300A accelerated processor unit not only utilizes CoWoS but also TSMC's 3D technology SoIC. The MI300A combines GPU and CPU cores and is designed to handle the largest artificial intelligence workloads. The GPU performs intensive matrix multiplication operations for AI, while the CPU controls the operations of the entire system, and high-bandwidth memory (HBM) serves both. Nine compute chips built in 5nm technology are stacked on top of four base chips in 6nm technology, which are dedicated to cache and I/O traffic. The base chip and HBM sit on top of the silicon interposer. The computing portion of the processor consists of 150 billion transistors.
We predict that within ten years, multi-chip GPUs will have more than 1 trillion transistors.
We need to connect all these chiplets together in a 3D stack, but fortunately the industry has been able to quickly shrink the pitch of vertical interconnects, thereby increasing connection density. And there's plenty of room for more. We see no reason why interconnect density can't increase by an order of magnitude, or even more.
GPU energy-saving performance trends
So, how do all these innovative hardware technologies improve system performance?
If we look at the steady improvement in a metric called power-efficiency performance, we can see a trend that already exists in server GPUs. EEP is a comprehensive measure of the energy efficiency and speed of a system. Over the past 15 years, the semiconductor industry's energy efficiency performance has improved approximately threefold every two years. We believe this trend will continue at a historic rate. It will be driven by innovation in many aspects, including new materials, device and integration technologies, extreme ultraviolet (EUV) lithography, circuit design, system architecture design and the co-optimization of all these technical elements.
In particular, the increase in EEP will be achieved through the advanced packaging technologies we discuss here. In addition, concepts such as system-technology co-optimization (STCO) will become increasingly important, in which different functional parts of the GPU are separated onto their own chiplets and used with the best performance and most economical technology to build each part.
3D Integrated Circuit’s Mead-Conway Moment
In 1978, Caltech professor Carver Mead and Lynn Conway of Xerox Palo Alto Research Center invented the computer-aided design method of integrated circuits. They use a set of design rules to describe chip scaling so that engineers can easily design very large-scale integration (VLSI) circuits without knowing much about the process technology.
3D chip design also requires the same capabilities. Today, designers need to understand chip design, system architecture design, and hardware and software optimization. Manufacturers need to understand chip technology, 3D IC technology and advanced packaging technology. As we did in 1978, we again need a common language that describes these technologies in a way that electronic design tools can understand. This hardware description language allows designers to freely design 3D IC systems without considering the underlying technology. It's on the way: an open source standard called 3Dblox has been adopted by most technology companies and electronic design automation (EDA) companies today.
The future beyond the tunnel
In the era of artificial intelligence, semiconductor technology is a key enabler of new capabilities and applications of artificial intelligence. New GPUs are no longer limited by the standard sizes and form factors of the past. New semiconductor technologies are no longer limited to shrinking next-generation transistors on a two-dimensional plane. Integrated AI systems can be composed of as many energy-efficient transistors as possible, efficient system architectures for specialized computing workloads, and optimized relationships between software and hardware.
The development of semiconductor technology over the past 50 years has been like walking in a tunnel. The road ahead is clear because there is a clear path. Everyone knows what needs to be done: shrink transistors.
Now, we have reached the end of the tunnel. From here, semiconductor technology will become more difficult to develop. Beyond the tunnels, however, there are many more possibilities. We are no longer bound by the past.
Original link
https://spectrum.ieee.org/trillion-transistor-gpu
END
*Disclaimer: This article is original by the author. The content of the article is the personal opinion of the author. The reprinting by Semiconductor Industry Watch is only to convey a different point of view. It does not mean that Semiconductor Industry Watch agrees or supports the view. If you have any objections, please contact Semiconductor Industry Watch.
Today is the 3718th issue of "Semiconductor Industry Observation" shared with you. Welcome to pay attention.
Recommended reading
★ EUV lithography machine blockbuster report, released in the United States
★ Silicon carbide "surges": catching up, involution, substitution
★ The chip giants all want to “kill” engineers!
★ Apple, playing with advanced packaging
★ Continental Group, developing 7nm chips
★
Latest interview with Zhang Zhongmou: China will find a way to fight back
"Semiconductor's First Vertical Media"
Real-time professional original depth
Public account ID: icbank
If you like our content, click "Watching" to share it with your friends.