Building a Chatbot with Ryzen™ AI Processors-EEWORLD

Collect

and software bring the power of personal computing to AI PCs, taking productivity, collaboration and innovation to a whole new level. Generative applications, such as AI chat, live in the cloud due to their high processing requirements. In this blog, we will explore the building blocks of Ryzen ™ and show how easy it is to leverage it to build an AI chatbot that runs at peak performance only on Ryzen AI notebooks.

Full-Stack Ryzen™ AI Software

Ryzen AI is equipped with a dedicated neural processing unit (NPU) for AI acceleration integrated on-chip with the core. The Ryzen AI SDK (Software Development Kit) enables developers to take models trained in PyTch or NVIDIA GeForce and run them on a PC powered by Ryzen AI, optimizing tasks and workloads, freeing up CPU and resources, and ensuring the best performance at lower power consumption. Learn more about Ryzen AI.

The SDK includes tools and runtime libraries for optimizing and deploying AI inference on NPUs. Installation is simple, and the kit comes with a wide range of pre-quantized, ready-to-deploy models from the AMD Model Zoo. Developers can start building their applications in minutes, unleashing the full potential of AI acceleration on Ryzen AI PCs.

Building an AI Chatbot

AI chatbots require so much processing power that they usually live in the cloud. nbsp;In fact, we can run ChatGPT on a PC, but the local application sends the prompt to the server over the Internet for LLM model processing and simply displays the response after receiving it.

However, in this case, local and efficient AI chatbots do not require cloud support. You can use an open source pre-trained OPT1.3B model from Hugging Face and deploy it on a Ryzen AI laptop in a simple three-step process using the pre-built Gradio Chatbot application.

Step 1: Download the pre-trained opt-1.3b model from Hugging Face

Step 2: Quantize the downloaded model from FP32 to INT 8

Step 3: Deploy the Chatbot Application Using the Model

Prerequisites

First, you need to make sure you meet the following prerequisites.

AMD Ryzen AI Laptops with Windows®(R) 11

Anaconda, if needed, please get it from here

Ryzen AI AIE Driver and Software. Follow the simple one-click installation here

Supporting material for this blog is posted in the AMD GitHub repository.

Next, clone the repository or download and unzip Chatbot-with-RyzenAI-1.0.zip into the root directory where you installed the Ryzen AI SW. In this case, it is C:UserahoqRyzenAI

cd C：UseahoqRyzenAI

git clone alimulh/Chatbot-with-RyzenAI-1.0

#Activate the conda environment created when installing RyzenAI. In my case it is ryzenai-1.0-20231204-120522

Second vate Ryzenai-1.0-20231204-120522

#Install Gradio package using requirements.txt file. The chatbot browser application is created with Gradio

pinstall -r requirements.txt

# Initialize the path

setup.bat

Now you can create a chatbot in 3 steps:

Step-1 Download the pre-trained model from Hugging Face

In this step, download the pre-trained Opt-1.3b model from Hugging Face. You can modify the run.py script to download a pre-trained model from your own or your repository. Opt-1.3b is a large, ~B model. The download time depends on your network speed. In this case, it took ~6 minutes.

cd Chatbot-with-RyzenAI-1.0

run.py --model_name opt-1.3b --download

The downloaded model is saved in the folder opt-1.3b_pretrained_fp32 as shown below.

Step 2: Quantize the downloaded model from FP32 to Int8

After the download is complete, we reconfigure the model using the following command:

python run.py--model_name opt-1.3b-python

Quantization is a two-step process. First, the FP32 model is "smooth quantized" to reduce the accuracy loss during quantization. It essentially identifies outliers in the activation coefficients and adjusts the weights accordingly. So during quantization, if the outliers are discarded, the error introduction is negligible. Smooth Quant was invented by one of AMD's pioneering researchers, Dr. Song Han, who is a professor in the EECS department at MIT. Below is a visual demonstration of how the smooth quantization technique works.

You can learn more about the smooth quantization (smoothquant) technique here. After the smooth modeling process, the conditional model is saved along with the mart.json file in the "model_onnx" folder of the opt-1.3b_smoothquant folder. Here is a screenshot of the smooth quantization logarithm:

Smooth quantization takes about 30 seconds to complete. Once completed, the best quantizer is used to convert the model to int 8. The int 8 quantized model is then saved in the "model_onnx_int8" folder inside the "opt-1.3b_smoothquant" folder. Quantization is an offline process. It takes about 2-3 minutes to complete and needs to be done in one go. Below is a screenshot of the Int 8 quantization log:

Step-3 Evaluate the model and deploy it using the chatbot application

Next, evaluate the quantized model and run it with the NPU target using the following command. Note that the model path is set to the location where we saved the int8 quantized model in the previous step,

python run.py --model_name opt-1.3b --target aie --local_path。 opt-1.3b_smoothquantmodel_onnx_int8

During the first run, the model is automatically compiled by the inline compiler. Compilation is also a two-step process: First, the compiler identifies the layers that can be executed in the NPU and the layers that need to be executed in the CPU. Then it creates a set of subgraphs. One set for the NPU and the other for the CPU. Finally, it creates a set of instructions for each subgraph targeting the corresponding execution unit. These instructions are executed by two ONNX Execution Providers (EP), one for the CPU and one for the NPU. After the first compilation, the compiled model is saved in this cache, so it avoids compilation in subsequent deployments. Below is a screenshot where the model information is printed out during the compilation process.

After compilation, the model runs on both the NPU and the CPU. The test prompt is applied. The response from the LLM Opt1.3B model shows the correct answer. Keep in mind that we downloaded and deployed a publicly available pre-trained model. As such, its accuracy is subjective and may not always be as expected. We strongly recommend fine-tuning a publicly available model before production deployment. Below is a screenshot of the test prompt and response:

Now, let's start the chatbot using the int 8 quantized model saved in the path opt-1.3b-smoothquantmodel_onnx_int 8

python gradio_appopt_demo_gui. py——model_file. opt—1.3b_smoothquantmodel_onnx_int8

As shown in the command prompt, the chatbot application is running on localhost on 1234.

Open your browser and browse to http://localhost:1234.

On the browser application, set max_output_token=64 and enter the prompt “What does AMD do?” into the text box. The chatbot outputs the response shown below. It also calculates the KPI (Key Performance Indicator) as tokens/sec. In this case, it is about 4.7 tokens per second.

Congratulations, you have successfully built a private AI chatbot. It runs entirely on a laptop and OPT1.3B is a LLM (Large Language Model).

in conclusion

AMD Ryzen™ AI full-stack tools enable users to easily create previously unattainable experiences on AI PCs - developers with AI applications, creators with innovative and engaging content, and business owners with tools to optimize workflows and efficiencies.

We are excited to bring this technology to our customers and partners. If you have any questions or need clarification, we would love to hear from you. Check out our GitHub repository for and example designs, join our discussion, or send us an email at amd_ai_mkt@amd.com.

Review Editor: Huang Yu

Reference address：Building a Chatbot with Ryzen™ AI Processors

Previous article：Midea Group plans to IPO in Hong Kong again! Robot revenue will exceed 33 billion in 2023, raising funds to expand overseas business
Next article：The future of mobile robots: revolutionizing the industry with the latest chips and technologies

Popular Resources
Popular amplifiers

Latest robot Articles

Using IMU to enhance robot positioning: a fundamental technology for accurate navigation
Abstract This article highlights the importance of inertial measurement unit (IMU) sensors for robot positioning and outlines their main advantages. IMUs provide critical motion data and have become an essential component of accurate robot positioning. ...
Researchers develop self-learning robot that can clean washbasins like humans
On November 10, researchers at the Vienna University of Technology (TU Wien) developed a self-learning robot that can imitate humans to complete simple tasks, such as cleaning a washbasin. ...
Universal Robots launches UR AI Accelerator to inject new AI power into collaborative robots
On November 6, 2024, Universal Robots (UR), a global collaborative robot manufacturer, today released the UR AI Accelerator. This is a plug-and-play hardware ...
The first batch of national standards for embodied intelligence of humanoid robots were released: divided into 4 levels according to limb movement, upper limb operation, etc.
On October 29, according to the news released by Pudong, the Humanoid Robot and Embodied Intelligence Innovation Forum was held in Shanghai yesterday. The National and Local Governments jointly built the Humanoid Robot Innovation Center and joined hands with leading companies in the industry and ...
New chapter in payload: Universal Robots’ new generation UR20 and UR30 have upgraded performance
By enhancing the load capacity of large-load collaborative robots, Universal Robots can effectively improve customers' production throughput and overall production efficiency. On October 24, 2024, Universal Robots ( ...
Humanoid robots drive the demand for frameless torque motors, and manufacturers are actively deploying
MiR Launches New Fleet Management Software MiR Fleet Enterprise, Setting New Standards in Scalability and Cybersecurity for Autonomous Mobile Robots
Nidec Drive Technology produces harmonic reducers for the first time in China, growing together with the Chinese robotics industry
DC motor driver chip, low voltage, high current, single full-bridge driver - Ruimeng MS31211

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like