and software bring the power of personal computing to AI PCs, taking productivity, collaboration and innovation to a whole new level. Generative applications, such as AI chat, live in the cloud due to their high processing requirements. In this blog, we will explore the building blocks of Ryzen ™ and show how easy it is to leverage it to build an AI chatbot that runs at peak performance only on Ryzen AI notebooks.
Full-Stack Ryzen™ AI Software
Ryzen AI is equipped with a dedicated neural processing unit (NPU) for AI acceleration integrated on-chip with the core. The Ryzen AI SDK (Software Development Kit) enables developers to take models trained in PyTch or NVIDIA GeForce and run them on a PC powered by Ryzen AI, optimizing tasks and workloads, freeing up CPU and resources, and ensuring the best performance at lower power consumption. Learn more about Ryzen AI.
The SDK includes tools and runtime libraries for optimizing and deploying AI inference on NPUs. Installation is simple, and the kit comes with a wide range of pre-quantized, ready-to-deploy models from the AMD Model Zoo. Developers can start building their applications in minutes, unleashing the full potential of AI acceleration on Ryzen AI PCs.
Building an AI Chatbot
AI chatbots require so much processing power that they usually live in the cloud. nbsp;In fact, we can run ChatGPT on a PC, but the local application sends the prompt to the server over the Internet for LLM model processing and simply displays the response after receiving it.
However, in this case, local and efficient AI chatbots do not require cloud support. You can use an open source pre-trained OPT1.3B model from Hugging Face and deploy it on a Ryzen AI laptop in a simple three-step process using the pre-built Gradio Chatbot application.
Step 1: Download the pre-trained opt-1.3b model from Hugging Face
Step 2: Quantize the downloaded model from FP32 to INT 8
Step 3: Deploy the Chatbot Application Using the Model
Prerequisites
First, you need to make sure you meet the following prerequisites.
AMD Ryzen AI Laptops with Windows®(R) 11
Anaconda, if needed, please get it from here
Ryzen AI AIE Driver and Software. Follow the simple one-click installation here
Supporting material for this blog is posted in the AMD GitHub repository.
Next, clone the repository or download and unzip Chatbot-with-RyzenAI-1.0.zip into the root directory where you installed the Ryzen AI SW. In this case, it is C:UserahoqRyzenAI
cd C:UseahoqRyzenAI
git clone alimulh/Chatbot-with-RyzenAI-1.0
#Activate the conda environment created when installing RyzenAI. In my case it is ryzenai-1.0-20231204-120522
Second vate Ryzenai-1.0-20231204-120522
#Install Gradio package using requirements.txt file. The chatbot browser application is created with Gradio
pinstall -r requirements.txt
# Initialize the path
setup.bat
Now you can create a chatbot in 3 steps:
Step-1 Download the pre-trained model from Hugging Face
In this step, download the pre-trained Opt-1.3b model from Hugging Face. You can modify the run.py script to download a pre-trained model from your own or your repository. Opt-1.3b is a large, ~B model. The download time depends on your network speed. In this case, it took ~6 minutes.
cd Chatbot-with-RyzenAI-1.0
run.py --model_name opt-1.3b --download
The downloaded model is saved in the folder opt-1.3b_pretrained_fp32 as shown below.
Step 2: Quantize the downloaded model from FP32 to Int8
After the download is complete, we reconfigure the model using the following command:
python run.py--model_name opt-1.3b-python
Quantization is a two-step process. First, the FP32 model is "smooth quantized" to reduce the accuracy loss during quantization. It essentially identifies outliers in the activation coefficients and adjusts the weights accordingly. So during quantization, if the outliers are discarded, the error introduction is negligible. Smooth Quant was invented by one of AMD's pioneering researchers, Dr. Song Han, who is a professor in the EECS department at MIT. Below is a visual demonstration of how the smooth quantization technique works.
You can learn more about the smooth quantization (smoothquant) technique here. After the smooth modeling process, the conditional model is saved along with the mart.json file in the "model_onnx" folder of the opt-1.3b_smoothquant folder. Here is a screenshot of the smooth quantization logarithm:
Smooth quantization takes about 30 seconds to complete. Once completed, the best quantizer is used to convert the model to int 8. The int 8 quantized model is then saved in the "model_onnx_int8" folder inside the "opt-1.3b_smoothquant" folder. Quantization is an offline process. It takes about 2-3 minutes to complete and needs to be done in one go. Below is a screenshot of the Int 8 quantization log:
Step-3 Evaluate the model and deploy it using the chatbot application
Next, evaluate the quantized model and run it with the NPU target using the following command. Note that the model path is set to the location where we saved the int8 quantized model in the previous step,
python run.py --model_name opt-1.3b --target aie --local_path。 opt-1.3b_smoothquantmodel_onnx_int8
During the first run, the model is automatically compiled by the inline compiler. Compilation is also a two-step process: First, the compiler identifies the layers that can be executed in the NPU and the layers that need to be executed in the CPU. Then it creates a set of subgraphs. One set for the NPU and the other for the CPU. Finally, it creates a set of instructions for each subgraph targeting the corresponding execution unit. These instructions are executed by two ONNX Execution Providers (EP), one for the CPU and one for the NPU. After the first compilation, the compiled model is saved in this cache, so it avoids compilation in subsequent deployments. Below is a screenshot where the model information is printed out during the compilation process.
After compilation, the model runs on both the NPU and the CPU. The test prompt is applied. The response from the LLM Opt1.3B model shows the correct answer. Keep in mind that we downloaded and deployed a publicly available pre-trained model. As such, its accuracy is subjective and may not always be as expected. We strongly recommend fine-tuning a publicly available model before production deployment. Below is a screenshot of the test prompt and response:
Now, let's start the chatbot using the int 8 quantized model saved in the path opt-1.3b-smoothquantmodel_onnx_int 8
python gradio_appopt_demo_gui. py——model_file. opt—1.3b_smoothquantmodel_onnx_int8
As shown in the command prompt, the chatbot application is running on localhost on 1234.
Open your browser and browse to http://localhost:1234.
On the browser application, set max_output_token=64 and enter the prompt “What does AMD do?” into the text box. The chatbot outputs the response shown below. It also calculates the KPI (Key Performance Indicator) as tokens/sec. In this case, it is about 4.7 tokens per second.
Congratulations, you have successfully built a private AI chatbot. It runs entirely on a laptop and OPT1.3B is a LLM (Large Language Model).
in conclusion
AMD Ryzen™ AI full-stack tools enable users to easily create previously unattainable experiences on AI PCs - developers with AI applications, creators with innovative and engaging content, and business owners with tools to optimize workflows and efficiencies.
We are excited to bring this technology to our customers and partners. If you have any questions or need clarification, we would love to hear from you. Check out our GitHub repository for and example designs, join our discussion, or send us an email at amd_ai_mkt@amd.com.
Review Editor: Huang Yu
Previous article:Midea Group plans to IPO in Hong Kong again! Robot revenue will exceed 33 billion in 2023, raising funds to expand overseas business
Next article:The future of mobile robots: revolutionizing the industry with the latest chips and technologies
- Popular Resources
- Popular amplifiers
- Using IMU to enhance robot positioning: a fundamental technology for accurate navigation
- Researchers develop self-learning robot that can clean washbasins like humans
- Universal Robots launches UR AI Accelerator to inject new AI power into collaborative robots
- The first batch of national standards for embodied intelligence of humanoid robots were released: divided into 4 levels according to limb movement, upper limb operation, etc.
- New chapter in payload: Universal Robots’ new generation UR20 and UR30 have upgraded performance
- Humanoid robots drive the demand for frameless torque motors, and manufacturers are actively deploying
- MiR Launches New Fleet Management Software MiR Fleet Enterprise, Setting New Standards in Scalability and Cybersecurity for Autonomous Mobile Robots
- Nidec Drive Technology produces harmonic reducers for the first time in China, growing together with the Chinese robotics industry
- DC motor driver chip, low voltage, high current, single full-bridge driver - Ruimeng MS31211
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- "Power amplifier experimental case" application of power amplifier in forward and reverse motion of ultrasonic motor
- Share: [Zhongke Blue News] AB32VG1 Review SDIO (File System)
- Design of audio products based on DSP
- 【Chuanglong Technology Allwinner A40i Development Board】Performance Comprehensive Test
- The idea of single chip microcomputer time-sharing control
- TI white paper "IQ: What is IQ and how to use it"
- Which manufacturers produce 16-bit microcontrollers with 251 cores?
- Totem pole in circuit,,,
- PA Test Solution
- Creative Modification Competition: The moderator gave me two booster boards, and I used both of them!