NVIDIA DGX SuperPOD helps JD Research Institute win the Vega-MT model competition!-EEWORLD

Collect

The model training task was shortened to two weeks, and the computing power and scalability were doubled.

Picture from JD.com

With the help of NVIDIA DGX SuperPOD, JD Research Institute trained a Vega-MT model with nearly 5 billion parameters, which shined at the 17th International Machine Translation Competition (WMT) in 2022. Vega-MT is available in Chinese-English (BLEU 33.5, chrF 0.611), English-Chinese (BLEU 49.7, chrF 0.446), German-English (BLEU 33.7, chrF 0.585), English-German (BLEU 37.8, chrF 0.643), Czech - English (BLEU 54.9, chrF 0.744), English-Czech (BLEU 41.4, chrF 0.651) and English-Russian (BLEU 32.7, chrF 0.584) won the championship in seven translation tracks.

As a large-scale AI infrastructure, NVIDIA DGX SuperPOD has a complete and advanced infrastructure. Compared with the original V100 cluster, DGX SuperPOD has not only achieved nearly twice the improvement in single-card computing power, but also has linear growth in computing power scalability, that is, DGX SuperPOD has also achieved twice the scalability compared to before. improvement. In the case of multiple nodes, a total improvement of about 4 times was obtained. Therefore, the training task that originally took several months to complete a similar model (of considerable size and complexity) was shortened to two weeks, and researchers also had more time to optimize the model.

Customer profile and application background

JD.com is a technology and service company based on the supply chain. JD Explore Academy adheres to the group mission of "being technology-based and committed to a more efficient and sustainable world". It is based on the technological development of each business group and business unit of JD Group and brings together the entire group. resources and capabilities, the established R&D department focusing on cutting-edge technology exploration is an ecological platform to achieve research and collaborative innovation. The Discovery Institute is deeply engaged in three major fields of pan-artificial intelligence, including "quantum machine learning", "trustworthy artificial intelligence" and "super deep learning", achieving disruptive innovation from the basic theoretical level to assist the development of digital intelligence industry and social change. Use original technology to empower JD Group’s entire industry chain scenarios such as retail, logistics, health, and technology, create a source of technological highlands, achieve leap-forward development from quantitative change to qualitative change, and lead the industry forward.

The International Machine Translation Competition (WMT) is recognized by the global academic community as the top international machine translation competition. It is organized by the International Association for Computational Linguistics (ACL) and is the top competition under the Association. Since 2006 to the present, every WMT competition has been a platform for major universities, technology companies and academic institutions around the world to showcase their machine translation capabilities, and has witnessed the continuous progress of machine translation technology.

This major achievement of JD Research Institute in the WMT competition further validates the superiority of large natural language processing models in understanding, generation, and cross-language modeling.

Customer Challenges

Machine translation faces many challenges: several common languages are widely used and rich in data resources, small languages are very necessary in cross-border e-commerce but the data is insufficient, and training of small data sets faces challenges; at the same time, it is also difficult to mine the relationship between languages One is because the complexity and ambiguity of language generation, the diversity of expressions, cultural backgrounds, and differences between languages are all unavoidable problems in machine translation competitions.

From the 110 million parameters of GPT-1 in 2018 to today's large-scale language models with trillions of parameters, the significant improvement in the accuracy of large models on multiple language tasks helps us build a richer understanding of natural language. smart system.

Vega-MT uses many advanced technologies, including multidirectional pre-training, Extremely Large Transformer, cycle translation and bidirectional self-training, to fully tap bilingualism Data, knowledge of monolingual data. In addition, strategies such as noise channel reordering and generalization fine-tuning are also used to enhance the robustness of the Vega-MT system and the level of trustworthiness of the translation.

However, we still face many difficulties when training large models. Previously, a single GPU was sufficient for model training for general tasks, but in large model scenarios, multi-node collaboration is required to complete the final training task, which also poses new challenges to existing GPU computing clusters. Take the well-known GPT-3 as an example. It uses 45 TB of training data and reaches a maximum of 175 billion model parameters. When using mixed precision, it occupies a total of about 2.8 TB of video memory and requires more than 35 GPUs to convert the model. Let it all go.

Therefore, the training challenges focus on single-card computing power and multi-card multi-node communication, and training will also span multiple nodes. At this time, aspects such as data transmission, task scheduling, parallel optimization, and resource utilization are particularly important.

application solution

When building an AI infrastructure, we will face challenges from all aspects, such as computing resources, networks, storage, and even the top-level software used for task scheduling. However, these aspects are not independent and need to be considered comprehensively.

The NVIDIA DGX SuperPOD used by JD Discovery Research Institute is a comprehensive and complete high-performance solution. SuperPOD AI cluster is based on DGX server, HDR InfiniBand 200G network card and NVIDIA Quantum QM8790 switch. The computing network and storage network are isolated, which not only ensures optimal computing power, but also ensures efficient interconnection between nodes and cards, maximizing distribution. efficiency of training.

In terms of computing power, the computing power of a single node is as high as 2.4 PFLOPS. Using a single node for training, BERT only takes 17 minutes to complete training, Mask R-CNN only takes 38 minutes, and RetinaNet only takes 83 minutes. For Transformer XL Base, training can be completed in 181 minutes. At the same time, relying on Multi-Instance GPU (MIG) technology, the GPU can be divided into multiple instances. Each instance has its own independent video memory, cache and streaming multi-processor, and fault isolation between each other. This can further improve GPU utilization and meet tasks requiring different computing power.

At the network level, through Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology, aggregation computing can be migrated from the CPU to the switch network, eliminating the need to send data multiple times between nodes, greatly reducing the network traffic reaching the aggregation node, thereby significantly It reduces the time to execute MPI and at the same time makes the communication efficiency no longer directly related to the number of nodes, further ensuring the scalability of computing power. In addition, it frees the CPU from the task of processing communications, allowing valuable CPU resources to focus on computing, further improving the overall cluster processing capability.

At the storage level, when training a model, it is often necessary to read the training data from the storage multiple times, and the time-consuming reading operation will also affect the timeliness of the training to a certain extent. DGX SuperPOD uses a high-performance multi-tier storage architecture to balance performance, capacity and cost requirements. With the help of GPU Direct RDMA technology, you can bypass the CPU and directly connect to the GPU, storage and network devices for high-speed and low-latency data transmission.

At the software level, in order to build a cluster and ensure the persistence and smooth operation of the cluster, upper-layer monitoring and scheduling management software is indispensable. Base Command Manager is a cluster management system that can perform a series of configurations on the cluster, manage user access, resource monitoring, log recording, and job task scheduling through slurm. At the same time, NGC covers a large number of AI, HPC, and data science-related resources. Users can easily obtain powerful software, container images, and various pre-trained models.

At the same time, the Discovery Institute team monitors and manages the cluster 24/7 to ensure the smooth operation of training tasks for a long time. Monitoring resource utilization also ensures that the computing resources on each node are fully utilized. Under the complete scheduling and monitoring work and the high reliability quality assurance of DGX SuperPOD, all the training nodes used did not have any problems within 20 days of model training (2 weeks of pre-training + 5 days of fine-tuning). The training was finally completed successfully.

Use effect and impact

Vega-MT was successfully used in the Omni-Force AIGC applet released by JD.com during the National Day. The application of the mini program is that users input text to generate corresponding pictures. With the support of Vega-MT, the mini program can support text input in multiple languages, such as Chinese, English, Spanish, etc.

JD Discovery Research Institute said: “With the support of NVIDIA DGX SuperPOD, JD Discovery Research Institute can quickly iterate models and help high-accuracy models to be implemented quickly, further improving user experience, reducing costs, and improving effects and business benefits. This time NVIDIA DGX SuperPOD supports us in winning the WMT competition, which not only increases the company's visibility, but also helps JD.com become a more trusted brand by users."

[1] [2]

Keywords：NVIDIA Reference address：NVIDIA DGX SuperPOD helps JD Research Institute win the Vega-MT model competition!

Previous article：Infineon Technologies Unveils TRAVEO™ T2G-C Microcontroller Family and Altia CloudWare™ Software Platform at CES 2023
Next article：FG25 sub-GHz SoC with long transmission distance, large memory and high security is now generally available

Recommended ReadingLatest update time:2024-11-15 12:04

Nvidia CFO hints again that next-generation chip production may be outsourced to Intel

According to foreign media reports, Nvidia Chief Financial Officer Colette Kress once again hinted that next-generation chip production may be outsourced to Intel. According to reports, Nvidia’s data center GPUs for AI/HPC and gaming chips are currently mainly produced by TSMC, but a generation ago, Nvidia’s gaming GP

[Semiconductor design/manufacturing]

Nvidia enters the medical field AI is increasingly playing its role in the medical field

The potential applications of artificial intelligence are countless. In addition to the business field and game development, AI is increasingly exerting its power in the medical field. Today, NVIDIA announced its latest collaboration with the American College of Radiology (ACR), aiming to deploy its own "Alara" a

[Medical Electronics]

Nvidia enters the medical field AI is increasingly playing its role in the medical field

Nvidia CEO is confident about acquiring Arm, but the EU is very concerned

Recently, in an online exchange session at the Taipei Computer Show, Nvidia CEO Huang Renxun once again talked about the $40 billion acquisition of Arm. Huang Renxun said that he was very confident that the deal would eventually be reached because Nvidia and Arm are complementary, and the two coming together will gene

[Semiconductor design/manufacturing]

Racing against the US ban? Rumor has it that Nvidia has placed an urgent order with TSMC

Nvidia, the leading graphics chip company, is reportedly placing a "super hot runs" order with TSMC (2330) within the buffer period of the US regulation on the sale of AI chips to the mainland, with deliveries starting as early as the end of October to early November. Coupled with the hot sales of Apple's high-end iPh

[Semiconductor design/manufacturing]

Are there still 1% problems in autonomous driving that cannot be overcome? NVIDIA: We must build a supercomputing center

After solving 99% of the technical problems, the implementation of autonomous driving has entered the second half. To solve the Corner Case, perhaps establishing a supercomputing center will become the only way to go. Recently, NVIDIA and IDC (International Data Center) jointly released the white paper "Reality + Si

[Automotive Electronics]

Are there still 1% problems in autonomous driving that cannot be overcome? NVIDIA: We must build a supercomputing center

NVIDIA's chip power consumption control system based on artificial intelligence technology

3D graphics cards began their first transformation in 1999 when NVIDIA launched GeForce 256. This hardware light and shadow conversion technology greatly improved the computer's image display performance, had an important revolutionary significance in the field of gaming, and also brought users a very high sense of pi

[Mobile phone portable]

NVIDIA's chip power consumption control system based on artificial intelligence technology

NVIDIA accelerates quantum computing exploration at Australia's Pawsey Supercomputing Center

Scientists will run state-of-the-art quantum computing simulations using NVIDIA CUDA Quantum platform accelerated by NVIDIA Grace Hopper superchip SCA2024 - NVIDIA announced on February 18 that Australia's Pawsey Supercomputing Research Center will use the NVIDIA® CUDA Quantum platform accelerated by NVIDIA Grace

[Network Communication]

NVIDIA accelerates quantum computing exploration at Australia's Pawsey Supercomputing Center

Nvidia's "epic" autonomous driving chip is unveiled! Computing power 2000TOPS, compatible with cockpit entertainment functions

Still the same leather jacket, still the familiar old Huang. At 11:30 last night, the curtain of NVIDIA's Fall GTC opened on time, and NVIDIA founder and CEO Huang Renxun delivered a speech via live online. For those who are interested in autonomous driving, the biggest highlight of this conference is that NVID

[Automotive Electronics]

Popular Resources
Popular amplifiers