Since entering the era of artificial intelligence, China has been Nvidia's big customer.
In 2022, 25% of Nvidia's sales data will come from Chinese customers, especially some major Internet companies. They are crazy about Nvidia's computing products, such as H800.
However, as the United States updated its export controls on advanced semiconductors and computing equipment in October 2023, NVIDIA's China special edition GPU A800 and H800 will be prohibited from exporting to China after November 17. What’s even more difficult to deal with is that under Washington’s new export controls, almost the vast majority of high-computing products can no longer be exported to China.
However, Nvidia has their own little thoughts. 25% of the huge revenue source cannot be cut off at once, so Nvidia decided to play a cat-and-mouse game with the US government and steal the cheese under the cat's nose. So Nvidia began to lay out a new product-H20. Note that this H20 is "H" and "twenty", not the chemical formula of water molecules.
The floating point computing power of the H20 product is only 296TFLOPs, and the performance density is only 2.9. This means that it is already an ultra-low-performance product on paper. However, the interesting thing is that there are still companies that will buy such a low-performance product, and it continues to trigger concerns from U.S. regulatory authorities and NVIDIA competitors. panic. As mentioned in Silicon Star's previous article "Nvidia's U.S. rivals have begun to use China to attack Nvidia", Nvidia's U.S. rivals criticized it for not being American enough, and Nvidia believed that these criticisms were illogical; and the latest news is that U.S. business Secretary Gina Raimondo also named Nvidia and warned it to stop designing AI chips for China that bypass export controls.
So, how does NVIDIA do in this cat-and-mouse game?
3A090, is an ECCN coded item in Washington Export Control. 3A090 specifically refers to a specific high-performance integrated circuit. When the input and output bidirectional transmission rate of a chip exceeds 600GB per second, or the computing power exceeds 4800TOPS, it belongs to 3A090, which means that export to China is prohibited.
The last time the United States stipulated export measures for advanced semiconductors, NVIDIA's flagship products A100 and H100 were included in the list of prohibited exports to China. The measure Nvidia took at that time was to launch low-end versions of the A800 and H800, but with the same architecture and high-bandwidth slot versions (that is, SXM versions). The performance of these two products is almost identical to the original A100 and H100. Even the memory chips use HBM2e and HBM3, the top memory chips at the time. It feels quite like Sun Wukong and the six-eared macaque.
When the last ban was issued, the United States only banned the export of specific product models. This led to the situation of H800 and A800. Therefore, this time the new regulations have added stricter restrictions, stipulating the total computing power and performance density. All high-end tensor computing GPUs used in the artificial intelligence industry have been included in the restrictions.
To put it more bluntly, high-performance GPUs are not allowed to be sold, and it is useless to buy low-performance GPUs.
However, here’s the interesting thing. The two words “performance density” and “total computing power” restricted by the ban are actually a word game.
What is performance density? There are two opinions internationally. The first one is commonly used by artificial intelligence companies, floating point computing power FLOPs, which is the number of floating point operations per second divided by the number of transistors per unit area. The second is MIPS, which is the number of million instructions per second divided by the number of transistors per unit area.
We all know one truth, that is, "adult times have changed." Take the HBM3e powered by NVIDIA's newly launched H200, for example. This chip uses a 3D technology to increase memory through three-dimensional space stacking. If you only look at the area, then the performance density of this chip is very high, but if you look at the volume, the performance density of this chip is also quite high, just not as high as the area. So if you want the performance density to be smaller, just use volume as the unit when calculating. After all, in division, the larger the denominator, the smaller the result.
In addition, MIPS is usually larger than FLOPs, because in addition to floating point, there are also integer type (INT) operations during the calculation process. Moreover, floating point itself also includes single-precision (32-bit), double-precision (64-bit) and other types divided by the required storage space. Because traditional FLOPs calculation methods often only count single-precision and double-precision, a GPU used for tensor calculations can also only report single-precision and double-precision floating points during data testing, so that regardless of its MIPS FLOPs will still be very low. After all, in division, the smaller the numerator, the smaller the result.
After all, there are more fancy things to do with your computing power. Total computing power, a term that refers to the sum of the clock speeds of each core. Nvidia H20, or the Hooper architecture used by the entire H series, has a variety of cores, such as the TensorFloat32 core specifically used for tensor calculations, and the brain floating point (BF16) core. As we just talked about, when calculating FLOPs, you can only calculate single precision and double precision, which means that in the test of the total computing power, you can only calculate single precision and double precision cores, and no longer enable the above. Tensor computing core. After all, when adding natural numbers, the fewer the addends, the smaller the result.
To sum up, as a computing chip manufacturer, they can easily make the chip data very low. This is just a possible hypothesis, because Nvidia's ultimate goal is not to pass the restrictions of 3A090, it is to sell products and make profits. A product with extremely low performance has no marketability at all, and even if it is designed, it has no actual value.
The real secret behind H20
It's just a mid-to-upstream gaming graphics card, the RTX 4080. Its floating-point computing power can reach 320TFLOPs, and the performance density of the RTX 4080 is 6.8. H20 is regarded as a GPU for tensor calculation, with 296 floating point and 2.9 performance density. It is like a second-generation only son of a super rich, enjoying the best resources in the world from birth, including pancakes for breakfast. You can add two eggs, and you never have to lick the lid when drinking yogurt. In the end, even if you calculate addition and subtraction within 10, the CPU will dry out and smoke.
But if I say that the die of H20 is 814 square millimeters, which is exactly the same as H100, how should you respond? This is not because good steel is used on the back of the knife. On the contrary, this is the hidden attribute of H20. It is precisely because of this that I believe even more that there is something fishy about NVIDIA's H20 data.
Die refers to the bare die of the chip. Generally speaking, the higher the performance of the chip, the larger the Die size will be. For example, the Die size of the RTX 4080 is 379, while the Die size of the RTX 4090, currently the best-performing gaming graphics card, is 609. Therefore, H20 is not actually a low-end chip. At least in terms of chip manufacturing process, H20 is in the first echelon.
The mouse is not as strong as the cat, nor does it have an advantage in speed. If the mouse does not want to be caught by the cat, it must find ways to hide and not expose itself.
It is true that the floating point computing power of the H20 is very low, but the memory of the H20 SXM is a full 96GB. What is even more terrifying is that its bandwidth has reached 4Tbps. In comparison, the H100 SXM with 1979TFLOPs of floating point computing power has only 80GB of memory and 3.4Tbps of bandwidth. In the field of artificial intelligence, especially the large language model that is currently hot. Memory is the key to model operation. Every 1 billion parameters consumes 3 to 5GB of memory. If the memory overflows, it will seriously affect the quality of the model and produce unpredictable consequences. That is to say, when faced with practical applications, H20 can handle larger language models than H100.
You may ask, H20’s floating-point computing capability is not good. Memory alone is useless and the running speed will be slow. If it is 2022, then this is indeed a big problem. After all, no artificial intelligence company will consider inefficient GPUs. Doing so will drag down the entire training process. But in 2023, Nvidia's TensorRT-LLM has been released, and the H200 with TensorRT-LLM function will be released in 2024.
TensorRT-LLM is an optimization software that helps the GPU quickly solve complex calculations. It is installed inside the GPU, similar to a game graphics card driver. Taking H100 as an example, after using TensorRT-LLM, H100's work efficiency when summarizing articles on some media websites is fully twice as fast as before use. On Llama2 with 70 billion parameters, the former is 77% faster than the latter. Neither the A800 nor the H800 currently on the market is equipped with the TensorRT-LLM function. It is very likely that H20 will be equipped with TensorRT-LLM.
Although Nvidia has always focused on hardware sales, its software capabilities are extraordinary. For example, DLSS is a kind of software specially designed to "cheat". However, the target of DLSS deception is not the user, but the graphics card. When the computer requires a lot of graphics calculations, DLSS will hand over the graphics to the graphics card in a very low-resolution form. It will trick the graphics card and say: "You can do this little work, and you don't have to worry about the rest." Then through DLSS technology Restore to high-resolution graphics, which will greatly reduce the pressure on the graphics card and thereby improve the picture effect.
Back to the present, TensorRT-LLM is also a software technology that reduces pressure and burden on the GPU, which allows the GPU to perform performance that should not belong to it. In addition, if Nvidia really hides the real data of H20, the performance of H20 is actually likely to exceed that of H100.
You think Nvidia is like a gecko, cutting off its tail to survive.
In fact, NVIDIA does not intend to "castrate" at all. What they want to do is find another way to bypass supervision and achieve their goals.
After all, if the computing power of H20 is very low, even if it can be exported to China, no buyer will be willing to buy such a product. When a cat blocks a mouse hole, the mouse can still find a way to escape because there can't be only one exit.
It’s not just Nvidia’s own cat-and-mouse game
NVIDIA has a good friend called SK Hynix. The memory chip HBM3e on NVIDIA's latest flagship product H200 SXM is from SK Hynix. They are currently working together to develop HBM4 to subvert the entire industry. NVIDIA is also one of SK Hynix's largest customers. If NVIDIA loses the Chinese market, SK Hynix's losses will also be huge.
The most important thing is that GPU is a thing that spans the two fields of software and hardware, creating a transaction system with extremely high added value. For example, NVIDIA's Hooper architecture, which is the architecture used by H100, H200, H800, and H20, allows multiple GPUs of the same architecture to be connected in parallel to better allocate computing resources. Generally speaking, artificial intelligence companies buy many GPUs instead of just one. Therefore, when an artificial intelligence company expands its computing power, the first added value of the GPU will be reflected, which will require the artificial intelligence company to continue to purchase the company's GPU products.
The second added value lies in the development of algorithms. Different GPU products, such as AMD's MI and Intel's Gaudi, differ not only in floating point computing power and performance density, but also in instruction sets, logic chips, and underlying languages. Differences and more. An algorithm that can run on H100 may not be perfectly suitable for MI300X. In other words, if development is based on a certain company's products at the beginning, subsequent development will most likely only be based on products from the same company, or even the same architecture.
The third added value is the reverse, given by artificial intelligence companies to GPU companies. There are always various problems encountered during the development of algorithms. When these problems are fed back to the GPU company, the GPU will know what improvements should be made in the next generation of products. For example, HBM4 mentioned above, Nvidia and SK Hynix must be very clear about the shortcomings of current GPUs in the current scenario in order to make products that can subvert the industry.
It is these added values that bind GPU companies and artificial intelligence companies, forming a complex nepotism. Therefore, Nvidia cannot lose the Chinese market, not only for 25% of sales, but also for these added values that are more important than sales figures. China's artificial intelligence level is growing very rapidly, and Nvidia is well aware of the pros and cons.
In the cat-and-mouse game, the mice also cooperate with each other. Some are responsible for attracting the cat's attention, and some are responsible for carrying the cheese. Another point is that both cats and mice understand that although they are opposites, there is a gray space that is used to maintain a balance between them. It is neither black nor white, and both can survive. The cat cannot catch all the mice to death at once, otherwise the cat will lose its effect, and the mice cannot be too rampant, as that will compress the cat's living space.
-END-
The content of this article
is for communication and learning purposes only and does not constitute any investment advice. If you have any questions, please contact us at info@gsi24.com.