NVIDIA launches inference platform for large language models and generative AI workloads
Google Cloud, D-ID, and Cohere use the new platform for various generative AI services, including chatbots, text-generated image content, AI videos, etc.
SANTA CLARA, Calif. - GTC - March 21, 2023 PT - NVIDIA today launched four inference platforms. These platforms are optimized for a variety of rapidly emerging generative AI applications and can help developers quickly build AI-driven professional applications that provide new services and insights.
These platforms combine NVIDIA's full-stack inference software with the latest NVIDIA Ada, Hopper and Grace Hopper processors, including the NVIDIA L4 Tensor Core GPU and NVIDIA H100 NVL GPU launched today . Each platform is optimized for workloads with surging demands such as AI video, image generation, large language model deployment, and recommendation system inference.
NVIDIA founder and CEO Jensen Huang said: "The rise of generative AI requires a more powerful inference computing platform. The number of generative AI applications is unlimited, and its only limit is human imagination. Provide developers with the most powerful Large, flexible inferential computing platforms will accelerate the creation of new services that will improve our lives in unprecedented ways."
Accelerate a diverse set of inference workloads for generative AI
Each platform includes an NVIDIA GPU and specialized software optimized for specific generative AI inference workloads:
NVIDIA L4 for AI video provides 120 times higher AI video performance than CPU while improving energy efficiency by 99%. This general-purpose GPU for nearly any workload delivers more powerful video decoding and transcoding capabilities, video streaming, augmented reality, generative AI video, and more.
NVIDIA L40 for image generation is optimized for graphics and AI-powered 2D, video and 3D image generation. The L40 platform, the engine behind NVIDIA Omniverse™, a platform for building and running Metaverse applications in the data center, delivers a 7x improvement in Stable Diffusion inference performance and a 12x improvement in Omniverse performance compared to the previous generation.
NVIDIA H100 NVL for large-scale language model deployment is an ideal platform for large-scale deployment of large-scale language models (LLMs) like ChatGPT. This new H100 NVL has 94GB of memory and Transformer engine acceleration, improving inference performance on GPT-3 by up to 12 times at data center scale compared to the previous generation A100.
NVIDIA Grace Hopper for recommendation models is an ideal platform for graph recommendation models, vector databases, and graph neural networks. By connecting the CPU and GPU at 900 GB/s via NVLink-C2C, Grace Hopper's data transfer and query speeds are 7 times faster than PCIe 5.0.
The software layer of these platforms uses the NVIDIA AI Enterprise software suite, including NVIDIA TensorRT™, a software development kit for high-performance deep learning inference, and NVIDIA Triton Inference Server™, an open source inference service software that helps standardize model deployment.
Early adopters and support
Google Cloud is an important cloud partner and early customer of NVIDIA's inference platform. The company is integrating the L4 platform into its machine learning platform Vertex AI and is the first cloud provider to offer L4 instances, with a private preview of its G2 virtual machines launching today.
NVIDIA and Google announced today respectively the first two organizations to use L4 on Google Cloud - Descript and WOMBO . The former uses generative AI to help creators produce videos and podcasts, and the latter provides "Dream", the AI-driven text -Digital art conversion app.
Another early adopter, Kuaishou, offers a short video application that uses GPUs to decode incoming live streaming videos, capture key frames, and optimize audio and video. It then uses a large Transformer-based model to understand multimodal content, thereby improving click-through rates for hundreds of millions of users around the world.
Yu Yue, senior vice president of Kuaishou, said: "The community served by the Kuaishou recommendation system has more than 360 million daily users, and they contribute 30 million UGC videos every day. At the same total cost of ownership, compared to CPUs, NVIDIA GPUs will Increased end-to-end throughput by 11 times and reduced latency by 20%.”
D-ID, the leading generative AI technology platform, supports any content to generate faces by using NVIDIA L40 GPU to generate realistic digital people from text, not only helping professionals improve video content, but also reducing the cost and cost of large-scale video production. trouble.
Or Gorodisky, Vice President of Research and Development at D-ID, said: "The performance of the L40 is amazing. With it, we have doubled the speed of inference. D-ID is very excited to have this new hardware as part of our products, delivering unprecedented performance and resolution enable real-time streaming of AI humans and reduce our computing costs.”
Leading AI production studio Seyhan Lee uses generative AI to develop immersive experiences and engaging creative content for the film, broadcast and entertainment industries.
Seyhan Lee co-founder Pinar Demirdag said: “The L40 GPU brings amazing performance improvements to our generative AI applications. With the L40’s inference power and memory capacity, we can deploy very advanced models and perform them at amazing speeds and Provide innovative services to customers with precision.”
Cohere, a pioneer in linguistic AI, runs a platform that enables developers to build natural language models while protecting data privacy and security.
Aidan Gomez, CEO of Cohere, said: "With NVIDIA's new high-performance H100 inference platform, we can use advanced generative models to provide customers with better and more efficient services, driving conversational AI, multi-lingual enterprise search, information extraction and more Development of various NLP applications.”
Availability
NVIDIA L4 GPU private previews are now available through Google Cloud Platform and a global network of more than 30 computer manufacturers.
NVIDIA L40 GPUs are available now through the world's leading system providers, and the number of partner platforms will continue to grow this year.
The Grace Hopper super chip has begun sample supply and is expected to be put into full production in the second half of the year . The H100 NVL GPU will also be available in the second half of the year.
NVIDIA AI Enterprise is now available through major cloud marketplaces and through dozens of system providers and partners. NVIDIA AI Enterprise provides customers with NVIDIA enterprise-grade support, regular security reviews, and API stability for NVIDIA Triton Inference Server™, TensorRT™, and more than 50 pre-trained models and frameworks.
Try out the NVIDIA inference platform for generative AI with hands-on labs available for free on the NVIDIA LaunchPad. Sample lab content includes training and deploying a customer service chatbot, deploying an end-to-end AI workload, tuning and deploying a language model on the H100, and deploying a fraud detection model using NVIDIA Triton.
Previous article:Xingzong IoT Smart Museum environmental monitoring protects the "soul" of the city!
Next article:Shutterstock partners with NVIDIA to build AI foundation for generative 3D artist tools
Recommended ReadingLatest update time:2024-11-15 07:22
Recommended posts
- Who can teach me?
- Ihavebeenself-studyingembeddeddevelopmentforayearandnowhaveinvestedinalargenumberofdevelopmentboards.Atfirst,IfoundouthowtogetstartedontheInternet.IlearnedhowtogetstartedwithMitsubishifx2nPLContheBilibili
- eew_fxVwmx Embedded System
- Do you agree that 3 people do the work of 5 people and get the salary of 4 people?
- Allwalksoflifeareinastateofcompetition,especiallytheautomobileindustry,whichhasreacheditspeak.IjustsawarumorthatChery’sinternalspeech:3peopledotheworkof5peopleandget4people’swagestoimproveovertimeefficien
- eric_wang Automotive Electronics
- Introduction to Automata Theory, Language, and Computation (3rd Edition)
- Thisbookisaclassiconformallanguages,automatatheory,andcomputationalsurfaces.Itcoversfiniteautomata,regularexpressionsandlanguages,propertiesofregularlanguages,context-freegrammarsandcontext-freelanguages,push-downautom
- arui1999 Download Centre
- EEWORLD University Hall - A robot that can type
- Robotthatcantype:https://training.eeworld.com.cn/course/5449 Robotscantypenow?Whatisimpossiblenow? Thanksforsharing,I'lllearnfromit.ThankyouThanksforsharing!ComeandhavealookThisvideoisverygood!!!!Thisvideois
- 13436496207@163 Integrated technical exchanges
- The I/O of this MCU is the kind that cannot be seen.
- Whichtypeisit? Obviously,itisaconfigurableIO.Fromthecircuitpointofview,itshouldbeabletosupportstandardweakpull-up,push-pulloutputandhigh-impedanceinput.Pleaserefertothedevicemanualfordetails.
- 小太阳yy MCU
- NanoPy claims to be simpler
- NanoPyisasimpleandstraightforwardscriptinglanguagethatbothbeginnersandexperienceduserscanquicklypickup.Itisusedinmicrocontrollerprojectssuchassmarthomes,educationalandgamingcomputersorautomationandroboticsprojects
- dcexpert MicroPython Open Source section
- Popular Resources
- Popular amplifiers
- Application of artificial intelligence technology in the new energy vehicle industry_Wanshun
- Research on non-invasive AI diagnosis method for flyback switching power supply fault
- 【2024 DigiKey Creative Competition】AI Full-Function Environmental Monitoring Station Source Code
- Research on collaborative energy-saving optimization control method for connected hybrid vehicle fleet
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- Can NPN stn0214 work with 500V added to VCE?
- The application principle and connection method of AND gate circuit
- Fundamentals of Digital Logic with Verilog Design 3rd Edition
- [2022 Digi-Key Innovation Design Competition] Material Unboxing STM32H745I-DISCO
- Can the USB to serial port name recognized by the computer be changed?
- How to set up a digital oscilloscope to observe eye diagrams without eye diagram analysis software
- [Analog Electronics Course Selection Test] + Basic Knowledge of Operational Amplifiers
- High precision amplifier circuit
- Initialization of MSP430F5529 ADC
- How to choose the capacitor withstand voltage at the power supply end @ [Analog Electronics]