NVIDIA launches inference platform for large language models and generative AI workloads

Publisher:EE小广播Latest update time:2023-03-22 Source: EEWORLDKeywords:NVIDIA Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

NVIDIA launches inference platform for large language models and generative AI workloads


Google Cloud, D-ID, and Cohere use the new platform for various generative AI services, including chatbots, text-generated image content, AI videos, etc.


image.png


SANTA CLARA, Calif. - GTC - March 21, 2023 PT - NVIDIA today launched four inference platforms. These platforms are optimized for a variety of rapidly emerging generative AI applications and can help developers quickly build AI-driven professional applications that provide new services and insights. 


These platforms combine NVIDIA's full-stack inference software with the latest NVIDIA Ada, Hopper and Grace Hopper processors, including the NVIDIA L4 Tensor Core GPU and NVIDIA H100 NVL GPU launched today . Each platform is optimized for workloads with surging demands such as AI video, image generation, large language model deployment, and recommendation system inference.


NVIDIA founder and CEO Jensen Huang said: "The rise of generative AI requires a more powerful inference computing platform. The number of generative AI applications is unlimited, and its only limit is human imagination. Provide developers with the most powerful Large, flexible inferential computing platforms will accelerate the creation of new services that will improve our lives in unprecedented ways."


Accelerate a diverse set of inference workloads for generative AI


Each platform includes an NVIDIA GPU and specialized software optimized for specific generative AI inference workloads:


NVIDIA L4 for AI video provides 120 times higher AI video performance than CPU while improving energy efficiency by 99%. This general-purpose GPU for nearly any workload delivers more powerful video decoding and transcoding capabilities, video streaming, augmented reality, generative AI video, and more.


NVIDIA L40 for image generation is optimized for graphics and AI-powered 2D, video and 3D image generation. The L40 platform, the engine behind NVIDIA Omniverse™, a platform for building and running Metaverse applications in the data center, delivers a 7x improvement in Stable Diffusion inference performance and a 12x improvement in Omniverse performance compared to the previous generation.


NVIDIA H100 NVL for large-scale language model deployment is an ideal platform for large-scale deployment of large-scale language models (LLMs) like ChatGPT. This new H100 NVL has 94GB of memory and Transformer engine acceleration, improving inference performance on GPT-3 by up to 12 times at data center scale compared to the previous generation A100.


NVIDIA Grace Hopper for recommendation models is an ideal platform for graph recommendation models, vector databases, and graph neural networks. By connecting the CPU and GPU at 900 GB/s via NVLink-C2C, Grace Hopper's data transfer and query speeds are 7 times faster than PCIe 5.0.


The software layer of these platforms uses the NVIDIA AI Enterprise software suite, including NVIDIA TensorRT™, a software development kit for high-performance deep learning inference, and NVIDIA Triton Inference Server™, an open source inference service software that helps standardize model deployment.


Early adopters and support


Google Cloud is an important cloud partner and early customer of NVIDIA's inference platform. The company is integrating the L4 platform into its machine learning platform Vertex AI and is the first cloud provider to offer L4 instances, with a private preview of its G2 virtual machines launching today.


NVIDIA and Google announced today respectively the first two organizations to use L4 on Google Cloud - Descript and WOMBO . The former uses generative AI to help creators produce videos and podcasts, and the latter provides "Dream", the AI-driven text -Digital art conversion app.


Another early adopter, Kuaishou, offers a short video application that uses GPUs to decode incoming live streaming videos, capture key frames, and optimize audio and video. It then uses a large Transformer-based model to understand multimodal content, thereby improving click-through rates for hundreds of millions of users around the world.


Yu Yue, senior vice president of Kuaishou, said: "The community served by the Kuaishou recommendation system has more than 360 million daily users, and they contribute 30 million UGC videos every day. At the same total cost of ownership, compared to CPUs, NVIDIA GPUs will Increased end-to-end throughput by 11 times and reduced latency by 20%.”


D-ID, the leading generative AI technology platform, supports any content to generate faces by using NVIDIA L40 GPU to generate realistic digital people from text, not only helping professionals improve video content, but also reducing the cost and cost of large-scale video production. trouble.


Or Gorodisky, Vice President of Research and Development at D-ID, said: "The performance of the L40 is amazing. With it, we have doubled the speed of inference. D-ID is very excited to have this new hardware as part of our products, delivering unprecedented performance and resolution enable real-time streaming of AI humans and reduce our computing costs.”


Leading AI production studio Seyhan Lee uses generative AI to develop immersive experiences and engaging creative content for the film, broadcast and entertainment industries.


Seyhan Lee co-founder Pinar Demirdag said: “The L40 GPU brings amazing performance improvements to our generative AI applications. With the L40’s inference power and memory capacity, we can deploy very advanced models and perform them at amazing speeds and Provide innovative services to customers with precision.”


Cohere, a pioneer in linguistic AI, runs a platform that enables developers to build natural language models while protecting data privacy and security.


Aidan Gomez, CEO of Cohere, said: "With NVIDIA's new high-performance H100 inference platform, we can use advanced generative models to provide customers with better and more efficient services, driving conversational AI, multi-lingual enterprise search, information extraction and more Development of various NLP applications.”


Availability


NVIDIA L4 GPU private previews are now available through Google Cloud Platform and a global network of more than 30 computer manufacturers.


NVIDIA L40 GPUs are available now through the world's leading system providers, and the number of partner platforms will continue to grow this year.


The Grace Hopper super chip has begun sample supply and is expected to be put into full production in the second half of the year . The H100 NVL GPU will also be available in the second half of the year.


NVIDIA AI Enterprise is now available through major cloud marketplaces and through dozens of system providers and partners. NVIDIA AI Enterprise provides customers with NVIDIA enterprise-grade support, regular security reviews, and API stability for NVIDIA Triton Inference Server™, TensorRT™, and more than 50 pre-trained models and frameworks.


Try out the NVIDIA inference platform for generative AI with hands-on labs available for free on the NVIDIA LaunchPad. Sample lab content includes training and deploying a customer service chatbot, deploying an end-to-end AI workload, tuning and deploying a language model on the H100, and deploying a fraud detection model using NVIDIA Triton.


Keywords:NVIDIA Reference address:NVIDIA launches inference platform for large language models and generative AI workloads

Previous article:Xingzong IoT Smart Museum environmental monitoring protects the "soul" of the city!
Next article:Shutterstock partners with NVIDIA to build AI foundation for generative 3D artist tools

Recommended ReadingLatest update time:2024-11-15 07:22

In-depth analysis of Tesla, Qualcomm, and Huawei AI processors
Many people will ask, why is there no NVIDIA? At present, the backends of all mainstream deep learning computing frameworks are NVIDIA's CUDA, including TensorFlow, Caffe, Caffe2, PyTorch, mxnet, and PaddlePaddle. CUDA includes micro-architecture, instruction set, and parallel computing engine. CUDA has a monopoly on
[Automotive Electronics]
In-depth analysis of Tesla, Qualcomm, and Huawei AI processors
Huawei releases 2025 top ten trend predictions: 5G, AI, and intelligent technologies will be widely popularized
Huawei released the "Top Ten Trends for 2025". By 2025, intelligent technology will penetrate every person, every family, and every organization. 58% of the world's population will have access to 5G networks, 14% of households will have "robot butlers", and 97% of large companies will use AI.   This is the second ye
[Embedded]
Huawei releases 2025 top ten trend predictions: 5G, AI, and intelligent technologies will be widely popularized
Tesla is about to use AI voice assistant to replace the large screen control in the car
When the Model S was first launched, the 17-inch in-car screen shocked the industry and was also the most attractive feature of Tesla to consumers. However, in actual use, especially when driving, the convenience and safety of touch control are not as good as voice control. In recent years, in-car voice control has be
[Embedded]
Tesla is about to use AI voice assistant to replace the large screen control in the car
Radeon RX 6000 series is coming soon, with the potential to outperform NVIDIA RTX 3000 series cards
AMD has revealed their new and upcoming Radeon RX 6000 series, utilizing the second generation architecture which they call RDNA2.   This new GPU architecture will power both standalone PC graphics cards and integrated into video game consoles. Built on TSMC's 7 nm manufacturing process, AMD has the potential to outpe
[Embedded]
NVIDIA and global data center system manufacturers vigorously promote the development of AI and industrial digitalization
OVX servers feature new NVIDIA GPUs to accelerate training and inference as well as graphics-intensive workloads and will be available through Dell Technologies, HPE, Lenovo, Supermicro and more   LOS ANGELES — SIGGRAPH — August 8, 2023 PT —  NVIDIA today announced the launch of NVIDIA OVX™ servers powered by the
[Network Communication]
NVIDIA and global data center system manufacturers vigorously promote the development of AI and industrial digitalization
What ADAS problems can Hailo's edge AI processor solve?
Recently, the news that LeddarTech has joined hands with Hailo to create solutions with higher performance, more affordable price, stronger scalability, higher flexibility and lower power consumption for ADAS ( Advanced Driver Assistance System ) customers appeared online. We can't help but ask, what kind of companies
[Automotive Electronics]
What ADAS problems can Hailo's edge AI processor solve?
Xilinx AI Acceleration + Alibaba Cloud FaaS, allowing AI inference to quickly penetrate the market
China is the world's largest online retail market, and Alibaba is China's largest e-commerce company. As the cloud computing and data intelligence department of Alibaba Group, Alibaba Cloud provides a comprehensive set of global cloud computing services for the online businesses of international customers and Alibaba
[Internet of Things]
Xilinx AI Acceleration + Alibaba Cloud FaaS, allowing AI inference to quickly penetrate the market
Qualcomm and Google team up to develop artificial intelligence for cars
According to Reuters, on October 22, Qualcomm said it was working with Alphabet's Google to create an Android automotive operating system (AAOS) software that can run smoothly on Qualcomm chips, allowing automakers to use the two companies' technology to develop their own artificial intelligence (AI) voice assistant
[Automotive Electronics]
Qualcomm and Google team up to develop artificial intelligence for cars

Recommended posts

Who can teach me?
Ihavebeenself-studyingembeddeddevelopmentforayearandnowhaveinvestedinalargenumberofdevelopmentboards.Atfirst,IfoundouthowtogetstartedontheInternet.IlearnedhowtogetstartedwithMitsubishifx2nPLContheBilibili
eew_fxVwmx Embedded System
Do you agree that 3 people do the work of 5 people and get the salary of 4 people?
Allwalksoflifeareinastateofcompetition,especiallytheautomobileindustry,whichhasreacheditspeak.IjustsawarumorthatChery’sinternalspeech:3peopledotheworkof5peopleandget4people’swagestoimproveovertimeefficien
eric_wang Automotive Electronics
Introduction to Automata Theory, Language, and Computation (3rd Edition)
Thisbookisaclassiconformallanguages,automatatheory,andcomputationalsurfaces.Itcoversfiniteautomata,regularexpressionsandlanguages,propertiesofregularlanguages,context-freegrammarsandcontext-freelanguages,push-downautom
arui1999 Download Centre
EEWORLD University Hall - A robot that can type
Robotthatcantype:https://training.eeworld.com.cn/course/5449 Robotscantypenow?Whatisimpossiblenow? Thanksforsharing,I'lllearnfromit.ThankyouThanksforsharing!ComeandhavealookThisvideoisverygood!!!!Thisvideois
13436496207@163 Integrated technical exchanges
The I/O of this MCU is the kind that cannot be seen.
Whichtypeisit? Obviously,itisaconfigurableIO.Fromthecircuitpointofview,itshouldbeabletosupportstandardweakpull-up,push-pulloutputandhigh-impedanceinput.Pleaserefertothedevicemanualfordetails.
小太阳yy MCU
NanoPy claims to be simpler
NanoPyisasimpleandstraightforwardscriptinglanguagethatbothbeginnersandexperienceduserscanquicklypickup.Itisusedinmicrocontrollerprojectssuchassmarthomes,educationalandgamingcomputersorautomationandroboticsprojects
dcexpert MicroPython Open Source section
Latest Internet of Things Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号