Building a Chatbot with Ryzen™ AI Processors

Publisher:TranquilSilenceLatest update time:2024-05-11 Source: 11Author: Lemontree Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

and software bring the power of personal computing to AI PCs, taking productivity, collaboration and innovation to a whole new level. Generative applications, such as AI chat, live in the cloud due to their high processing requirements. In this blog, we will explore the building blocks of Ryzen ™ and show how easy it is to leverage it to build an AI chatbot that runs at peak performance only on Ryzen AI notebooks.

Full-Stack Ryzen™ AI Software

Ryzen AI is equipped with a dedicated neural processing unit (NPU) for AI acceleration integrated on-chip with the core. The Ryzen AI SDK (Software Development Kit) enables developers to take models trained in PyTch or NVIDIA GeForce and run them on a PC powered by Ryzen AI, optimizing tasks and workloads, freeing up CPU and resources, and ensuring the best performance at lower power consumption. Learn more about Ryzen AI.

The SDK includes tools and runtime libraries for optimizing and deploying AI inference on NPUs. Installation is simple, and the kit comes with a wide range of pre-quantized, ready-to-deploy models from the AMD Model Zoo. Developers can start building their applications in minutes, unleashing the full potential of AI acceleration on Ryzen AI PCs.

Building an AI Chatbot

AI chatbots require so much processing power that they usually live in the cloud. nbsp;In fact, we can run ChatGPT on a PC, but the local application sends the prompt to the server over the Internet for LLM model processing and simply displays the response after receiving it.

However, in this case, local and efficient AI chatbots do not require cloud support. You can use an open source pre-trained OPT1.3B model from Hugging Face and deploy it on a Ryzen AI laptop in a simple three-step process using the pre-built Gradio Chatbot application.

Step 1: Download the pre-trained opt-1.3b model from Hugging Face

Step 2: Quantize the downloaded model from FP32 to INT 8

Step 3: Deploy the Chatbot Application Using the Model

Prerequisites

First, you need to make sure you meet the following prerequisites.

AMD Ryzen AI Laptops with Windows®(R) 11

Anaconda, if needed, please get it from here

Ryzen AI AIE Driver and Software. Follow the simple one-click installation here

Supporting material for this blog is posted in the AMD GitHub repository.

Next, clone the repository or download and unzip Chatbot-with-RyzenAI-1.0.zip into the root directory where you installed the Ryzen AI SW. In this case, it is C:UserahoqRyzenAI

cd C:UseahoqRyzenAI

git clone alimulh/Chatbot-with-RyzenAI-1.0

#Activate the conda environment created when installing RyzenAI. In my case it is ryzenai-1.0-20231204-120522

Second vate Ryzenai-1.0-20231204-120522

#Install Gradio package using requirements.txt file. The chatbot browser application is created with Gradio

pinstall -r requirements.txt

# Initialize the path

setup.bat

Now you can create a chatbot in 3 steps:

Step-1 Download the pre-trained model from Hugging Face

In this step, download the pre-trained Opt-1.3b model from Hugging Face. You can modify the run.py script to download a pre-trained model from your own or your repository. Opt-1.3b is a large, ~B model. The download time depends on your network speed. In this case, it took ~6 minutes.

cd Chatbot-with-RyzenAI-1.0

run.py --model_name opt-1.3b --download

The downloaded model is saved in the folder opt-1.3b_pretrained_fp32 as shown below.

Step 2: Quantize the downloaded model from FP32 to Int8

After the download is complete, we reconfigure the model using the following command:

python run.py--model_name opt-1.3b-python

Quantization is a two-step process. First, the FP32 model is "smooth quantized" to reduce the accuracy loss during quantization. It essentially identifies outliers in the activation coefficients and adjusts the weights accordingly. So during quantization, if the outliers are discarded, the error introduction is negligible. Smooth Quant was invented by one of AMD's pioneering researchers, Dr. Song Han, who is a professor in the EECS department at MIT. Below is a visual demonstration of how the smooth quantization technique works.

You can learn more about the smooth quantization (smoothquant) technique here. After the smooth modeling process, the conditional model is saved along with the mart.json file in the "model_onnx" folder of the opt-1.3b_smoothquant folder. Here is a screenshot of the smooth quantization logarithm:

Smooth quantization takes about 30 seconds to complete. Once completed, the best quantizer is used to convert the model to int 8. The int 8 quantized model is then saved in the "model_onnx_int8" folder inside the "opt-1.3b_smoothquant" folder. Quantization is an offline process. It takes about 2-3 minutes to complete and needs to be done in one go. Below is a screenshot of the Int 8 quantization log:

Step-3 Evaluate the model and deploy it using the chatbot application

Next, evaluate the quantized model and run it with the NPU target using the following command. Note that the model path is set to the location where we saved the int8 quantized model in the previous step,

python run.py --model_name opt-1.3b --target aie --local_path。 opt-1.3b_smoothquantmodel_onnx_int8

During the first run, the model is automatically compiled by the inline compiler. Compilation is also a two-step process: First, the compiler identifies the layers that can be executed in the NPU and the layers that need to be executed in the CPU. Then it creates a set of subgraphs. One set for the NPU and the other for the CPU. Finally, it creates a set of instructions for each subgraph targeting the corresponding execution unit. These instructions are executed by two ONNX Execution Providers (EP), one for the CPU and one for the NPU. After the first compilation, the compiled model is saved in this cache, so it avoids compilation in subsequent deployments. Below is a screenshot where the model information is printed out during the compilation process.

After compilation, the model runs on both the NPU and the CPU. The test prompt is applied. The response from the LLM Opt1.3B model shows the correct answer. Keep in mind that we downloaded and deployed a publicly available pre-trained model. As such, its accuracy is subjective and may not always be as expected. We strongly recommend fine-tuning a publicly available model before production deployment. Below is a screenshot of the test prompt and response:

Now, let's start the chatbot using the int 8 quantized model saved in the path opt-1.3b-smoothquantmodel_onnx_int 8

python gradio_appopt_demo_gui. py——model_file. opt—1.3b_smoothquantmodel_onnx_int8

As shown in the command prompt, the chatbot application is running on localhost on 1234.

Open your browser and browse to http://localhost:1234.

On the browser application, set max_output_token=64 and enter the prompt “What does AMD do?” into the text box. The chatbot outputs the response shown below. It also calculates the KPI (Key Performance Indicator) as tokens/sec. In this case, it is about 4.7 tokens per second.

Congratulations, you have successfully built a private AI chatbot. It runs entirely on a laptop and OPT1.3B is a LLM (Large Language Model).

in conclusion

AMD Ryzen™ AI full-stack tools enable users to easily create previously unattainable experiences on AI PCs - developers with AI applications, creators with innovative and engaging content, and business owners with tools to optimize workflows and efficiencies.

We are excited to bring this technology to our customers and partners. If you have any questions or need clarification, we would love to hear from you. Check out our GitHub repository for and example designs, join our discussion, or send us an email at amd_ai_mkt@amd.com.

Review Editor: Huang Yu

Reference address:Building a Chatbot with Ryzen™ AI Processors

Previous article:Midea Group plans to IPO in Hong Kong again! Robot revenue will exceed 33 billion in 2023, raising funds to expand overseas business
Next article:The future of mobile robots: revolutionizing the industry with the latest chips and technologies

Latest robot Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号