NVIDIA launches inference platform for large language models and generative AI workloads-EEWORLD

Collect

NVIDIA launches inference platform for large language models and generative AI workloads

Google Cloud, D-ID, and Cohere use the new platform for various generative AI services, including chatbots, text-generated image content, AI videos, etc.

SANTA CLARA, Calif. - GTC - March 21, 2023 PT - NVIDIA today launched four inference platforms. These platforms are optimized for a variety of rapidly emerging generative AI applications and can help developers quickly build AI-driven professional applications that provide new services and insights.

These platforms combine NVIDIA's full-stack inference software with the latest NVIDIA Ada, Hopper and Grace Hopper processors, including the NVIDIA L4 Tensor Core GPU and NVIDIA H100 NVL GPU launched today . Each platform is optimized for workloads with surging demands such as AI video, image generation, large language model deployment, and recommendation system inference.

NVIDIA founder and CEO Jensen Huang said: "The rise of generative AI requires a more powerful inference computing platform. The number of generative AI applications is unlimited, and its only limit is human imagination. Provide developers with the most powerful Large, flexible inferential computing platforms will accelerate the creation of new services that will improve our lives in unprecedented ways."

Accelerate a diverse set of inference workloads for generative AI

Each platform includes an NVIDIA GPU and specialized software optimized for specific generative AI inference workloads:

NVIDIA L4 for AI video provides 120 times higher AI video performance than CPU while improving energy efficiency by 99%. This general-purpose GPU for nearly any workload delivers more powerful video decoding and transcoding capabilities, video streaming, augmented reality, generative AI video, and more.

NVIDIA L40 for image generation is optimized for graphics and AI-powered 2D, video and 3D image generation. The L40 platform, the engine behind NVIDIA Omniverse™, a platform for building and running Metaverse applications in the data center, delivers a 7x improvement in Stable Diffusion inference performance and a 12x improvement in Omniverse performance compared to the previous generation.

NVIDIA H100 NVL for large-scale language model deployment is an ideal platform for large-scale deployment of large-scale language models (LLMs) like ChatGPT. This new H100 NVL has 94GB of memory and Transformer engine acceleration, improving inference performance on GPT-3 by up to 12 times at data center scale compared to the previous generation A100.

NVIDIA Grace Hopper for recommendation models is an ideal platform for graph recommendation models, vector databases, and graph neural networks. By connecting the CPU and GPU at 900 GB/s via NVLink-C2C, Grace Hopper's data transfer and query speeds are 7 times faster than PCIe 5.0.

The software layer of these platforms uses the NVIDIA AI Enterprise software suite, including NVIDIA TensorRT™, a software development kit for high-performance deep learning inference, and NVIDIA Triton Inference Server™, an open source inference service software that helps standardize model deployment.

Early adopters and support

Google Cloud is an important cloud partner and early customer of NVIDIA's inference platform. The company is integrating the L4 platform into its machine learning platform Vertex AI and is the first cloud provider to offer L4 instances, with a private preview of its G2 virtual machines launching today.

NVIDIA and Google announced today respectively the first two organizations to use L4 on Google Cloud - Descript and WOMBO . The former uses generative AI to help creators produce videos and podcasts, and the latter provides "Dream", the AI-driven text -Digital art conversion app.

Another early adopter, Kuaishou, offers a short video application that uses GPUs to decode incoming live streaming videos, capture key frames, and optimize audio and video. It then uses a large Transformer-based model to understand multimodal content, thereby improving click-through rates for hundreds of millions of users around the world.

Yu Yue, senior vice president of Kuaishou, said: "The community served by the Kuaishou recommendation system has more than 360 million daily users, and they contribute 30 million UGC videos every day. At the same total cost of ownership, compared to CPUs, NVIDIA GPUs will Increased end-to-end throughput by 11 times and reduced latency by 20%.”

D-ID, the leading generative AI technology platform, supports any content to generate faces by using NVIDIA L40 GPU to generate realistic digital people from text, not only helping professionals improve video content, but also reducing the cost and cost of large-scale video production. trouble.

Or Gorodisky, Vice President of Research and Development at D-ID, said: "The performance of the L40 is amazing. With it, we have doubled the speed of inference. D-ID is very excited to have this new hardware as part of our products, delivering unprecedented performance and resolution enable real-time streaming of AI humans and reduce our computing costs.”

Leading AI production studio Seyhan Lee uses generative AI to develop immersive experiences and engaging creative content for the film, broadcast and entertainment industries.

Seyhan Lee co-founder Pinar Demirdag said: “The L40 GPU brings amazing performance improvements to our generative AI applications. With the L40’s inference power and memory capacity, we can deploy very advanced models and perform them at amazing speeds and Provide innovative services to customers with precision.”

Cohere, a pioneer in linguistic AI, runs a platform that enables developers to build natural language models while protecting data privacy and security.

Aidan Gomez, CEO of Cohere, said: "With NVIDIA's new high-performance H100 inference platform, we can use advanced generative models to provide customers with better and more efficient services, driving conversational AI, multi-lingual enterprise search, information extraction and more Development of various NLP applications.”

Availability

NVIDIA L4 GPU private previews are now available through Google Cloud Platform and a global network of more than 30 computer manufacturers.

NVIDIA L40 GPUs are available now through the world's leading system providers, and the number of partner platforms will continue to grow this year.

The Grace Hopper super chip has begun sample supply and is expected to be put into full production in the second half of the year . The H100 NVL GPU will also be available in the second half of the year.

NVIDIA AI Enterprise is now available through major cloud marketplaces and through dozens of system providers and partners. NVIDIA AI Enterprise provides customers with NVIDIA enterprise-grade support, regular security reviews, and API stability for NVIDIA Triton Inference Server™, TensorRT™, and more than 50 pre-trained models and frameworks.

Try out the NVIDIA inference platform for generative AI with hands-on labs available for free on the NVIDIA LaunchPad. Sample lab content includes training and deploying a customer service chatbot, deploying an end-to-end AI workload, tuning and deploying a language model on the H100, and deploying a fraud detection model using NVIDIA Triton.

Keywords：NVIDIA Reference address：NVIDIA launches inference platform for large language models and generative AI workloads

Previous article：Xingzong IoT Smart Museum environmental monitoring protects the "soul" of the city!
Next article：Shutterstock partners with NVIDIA to build AI foundation for generative 3D artist tools

Recommended ReadingLatest update time:2024-11-15 07:22

In-depth analysis of Tesla, Qualcomm, and Huawei AI processors

Many people will ask, why is there no NVIDIA? At present, the backends of all mainstream deep learning computing frameworks are NVIDIA's CUDA, including TensorFlow, Caffe, Caffe2, PyTorch, mxnet, and PaddlePaddle. CUDA includes micro-architecture, instruction set, and parallel computing engine. CUDA has a monopoly on

[Automotive Electronics]

In-depth analysis of Tesla, Qualcomm, and Huawei AI processors

Huawei releases 2025 top ten trend predictions: 5G, AI, and intelligent technologies will be widely popularized

Huawei released the "Top Ten Trends for 2025". By 2025, intelligent technology will penetrate every person, every family, and every organization. 58% of the world's population will have access to 5G networks, 14% of households will have "robot butlers", and 97% of large companies will use AI. This is the second ye

[Embedded]

Huawei releases 2025 top ten trend predictions: 5G, AI, and intelligent technologies will be widely popularized

Tesla is about to use AI voice assistant to replace the large screen control in the car

When the Model S was first launched, the 17-inch in-car screen shocked the industry and was also the most attractive feature of Tesla to consumers. However, in actual use, especially when driving, the convenience and safety of touch control are not as good as voice control. In recent years, in-car voice control has be

[Embedded]

Tesla is about to use AI voice assistant to replace the large screen control in the car

Radeon RX 6000 series is coming soon, with the potential to outperform NVIDIA RTX 3000 series cards

AMD has revealed their new and upcoming Radeon RX 6000 series, utilizing the second generation architecture which they call RDNA2. This new GPU architecture will power both standalone PC graphics cards and integrated into video game consoles. Built on TSMC's 7 nm manufacturing process, AMD has the potential to outpe

[Embedded]

NVIDIA and global data center system manufacturers vigorously promote the development of AI and industrial digitalization

OVX servers feature new NVIDIA GPUs to accelerate training and inference as well as graphics-intensive workloads and will be available through Dell Technologies, HPE, Lenovo, Supermicro and more LOS ANGELES — SIGGRAPH — August 8, 2023 PT — NVIDIA today announced the launch of NVIDIA OVX™ servers powered by the

[Network Communication]

NVIDIA and global data center system manufacturers vigorously promote the development of AI and industrial digitalization

What ADAS problems can Hailo's edge AI processor solve?

Recently, the news that LeddarTech has joined hands with Hailo to create solutions with higher performance, more affordable price, stronger scalability, higher flexibility and lower power consumption for ADAS ( Advanced Driver Assistance System ) customers appeared online. We can't help but ask, what kind of companies

[Automotive Electronics]

What ADAS problems can Hailo's edge AI processor solve?

Xilinx AI Acceleration + Alibaba Cloud FaaS, allowing AI inference to quickly penetrate the market

China is the world's largest online retail market, and Alibaba is China's largest e-commerce company. As the cloud computing and data intelligence department of Alibaba Group, Alibaba Cloud provides a comprehensive set of global cloud computing services for the online businesses of international customers and Alibaba

[Internet of Things]

Xilinx AI Acceleration + Alibaba Cloud FaaS, allowing AI inference to quickly penetrate the market

Qualcomm and Google team up to develop artificial intelligence for cars

According to Reuters, on October 22, Qualcomm said it was working with Alphabet's Google to create an Android automotive operating system (AAOS) software that can run smoothly on Qualcomm chips, allowing automakers to use the two companies' technology to develop their own artificial intelligence (AI) voice assistant

[Automotive Electronics]