If you think Nvidia is just a chip company, you're wrong-EEWORLD

Collect

NVIDIA surprised many people, including long-time NVIDIA watchers, when it announced it would spend $6.9 billion to acquire data center networking company Mellanox. This is by far NVIDIA's largest acquisition ever, and its previous purchases of companies were much smaller and often sold at fire-sale prices. In terms of size, the closest deal was their 2001 purchase of competitor 3dfx assets, as NVIDIA was a much smaller company at the time.

As I explained in a previous article, the purchase of 3dfx assets (and the hiring of 100 employees) was a more understandable move, as the new assets could be immediately put to work on NVIDIA’s core business – PC graphics processors. For many years Mellanox was in a completely different business – data center networking. Mellanox’s products complemented NVIDIA’s products, with no overlap.

With this acquisition, NVIDIA is saying that they are no longer a GPU company. With the accelerator business growing exponentially and entering the network, NVIDIA is now a data center company.

Mellanox CEO Eyal Waldman and NVIDIA CEO Jensen Huang on stage at GTC 2019

There are many interesting aspects to the Mellanox acquisition, such as NVIDIA's deeper entry into Israel's tech industry; Mellanox's other compute-related assets (EZChip and Tilera); how Jensen Huang's management style will play out in Israel; and Mellanox's support for the CCIX compute accelerator connection protocol versus NVIDIA's own NVLink. In later posts, we'll explore each of these differences in depth. But for now, let's explore this new NVIDIA.

How NVIDIA Became a Data Center Company

It all started with the discovery (at Stanford University) around 2006. At that time, the Stanford guys were using Graphics Processing Units (GPUs) for some computationally intensive workloads, and that GPUs provided a significant improvement in performance per watt over traditional processors or CPUs.

It turned out that all the little computing elements used to process pixels (texture processing) could be used for crude scientific computing. The field was initially called GPU Compute. At the same time, graphics also became more complex, and full-featured math processing capabilities were added to GPUs. Some people at NVIDIA, including Professor Bill Dally and the late John Nicholls, noticed an opportunity to expand the use of GPUs and play a major role in the high-performance computing (HPC) market. The result was that NVIDIA built on its Quadro product line for graphics computing, added more features for HPC workloads in its GPUs, and opened up a Tesla product line specifically for numerical computing.

The company also developed the CUDA programming framework for its GPUs, but never supported any other GPUs. AMD, the main competing GPU supplier, chose to wait for OpenCL to develop, but the development speed of this software was much slower. On this basis, NVIDIA has been very successful in HPC and ranked first in the TOP500 list of supercomputers. According to reports, they power the two fastest supercomputers in the world.

NVIDIA CEO Jensen Huang shows off the company's growth in supercomputing

Because of NVIDIA's superior work in GPU computing for HPC, some researchers in the AI field decided to use GPUs to accelerate new machine learning algorithms called deep convolutional neural networks (DCNNs). The combination of new DCNNs and GPUs makes the training and inference of AI neural networks faster and more accurate than before. This has driven an explosion of AI research and applications that was originally in the Cambrian period, and NVIDIA is leading the trend. The company quickly adapted its GPUs for these new workloads, adding new mathematical functions and even fueling specialized processing elements called Tensor Cores. NVIDIA has also developed a series of software libraries called cuDNN that are optimized for CUDA and deep neural networks.

Due to the explosion of AI research, each cloud vendor has also developed its own language. Google has TensorFlow, Facebook has Pytorch/Caffe 2, etc. Even with the fragmentation of AI frameworks, the field is still growing rapidly. Because we continue to research new algorithms, a flexible approach has long-term cost-of-ownership benefits. This is where flexible accelerators such as GPUs (or FPGAs) excel, as they are easily adaptable to new algorithms. In his GTC 2019 keynote, Jensen called this architecture "PRADA", programmable acceleration of multiple domains from one architecture. This architectural compatibility allows building on an installed base of software and systems and reduces the cost of infrastructure.

Jensen Huang explains his acronym PRADA

From chips to systems

In Huang's keynote, he proposed that data science is the fourth support of the scientific method. NVIDIA realizes that there is a shortage of data scientists and AI researchers, so the productivity of these people is very important. To maintain this momentum, it is important to bring resources to a wider range of developers. Therefore, the company has designed a series of DGX workstations and servers, fully loaded with CUDA-X tools and libraries for ML research. The company is expanding its influence on data scientists with new data science platforms from multiple system OEMs, including Dell, HP and Lenovo.

Even with new systems and tools, the industry still faces the challenge of sorting new and existing data for business and scientific insights. This pushes data science to address the problem of too much data. As we enter the era of self-driving cars, they will generate billions of bytes of information that need to be processed. That's why Nvidia believes that more and more data centers need to build AI processing to sort through all this data.

Supercomputers and HPC

In our work in HPC, NVIDIA focuses on delivering maximum computing performance to solve very large problems. Hyperscale data centers typically run many computing tasks simultaneously (scale out). The needs of data science fall somewhere in between - large datasets and many users, with both scale-up and scale-out characteristics.

To meet these different needs, NVIDIA has established many server projects with Mellanox to provide rack networking. Due to the success of Mellanox, it has become an acquisition target for various chip companies and cloud companies, including companies such as Intel and Microsoft. However, instead of going to one of these companies, Mellanox sought a friendlier partner like NVIDIA. Huang Renxun also seized this fleeting opportunity when he had the opportunity to become the white knight of Mellanox.

With the increasing containerization and hyperscaling of workloads for data analytics programs like Hadoop, SPARC, and RAPIDS, they are seeing exponential growth in rack-to-rack communications, often referred to as east-west communications in data centers. That means low-latency networking is critical to creating the fabric of computing.

Mellanox's networking technology can make data centers flexible enough to adapt to these changing workloads. Mellanox's key development is to move networking tasks from CPUs to accelerators, and in the future it will add AI to its switching products to move data more efficiently.

For server scale-out applications like HPC, the goal is to make multiple GPUs act like one giant GPU. That's where NVIDIA's NVLink comes into play, tying multiple GPUs together. For a wider infrastructure, the Tesla T4 cards can be deployed. These 70W half-height PCIe cards fit into a 2U rack chassis, so these cards can be added to existing data centers in large numbers. The T4 is NVIDIA's most flexible data center offering - it can be used for inference, training (at different speeds than the V100), data science, video transcoding, and VDI (virtual desktop) applications.

In the future, NVIDIA will pay more attention to inference in cloud and edge applications, which is also the area where NVIDIA competes most fiercely with Intel.

While there are many contenders for the AI accelerator throne, NVIDIA remains the king of the hill with the largest installed base. With the acquisition of Mellanox, they have opened up their datacenter space.

Keywords：Nvidia Reference address：If you think Nvidia is just a chip company, you're wrong

Previous article：Huawei increases capital expenditure, and the rise of the supply chain is unstoppable
Next article：Mouser: More than just a small-volume distributor

Popular Resources
Popular amplifiers