Esp 32-S3 intelligent voice robot

plcpro · Published on 2022-10-11 09:27

Esp 32-S3 intelligent voice robot [Copy link]

1. Function Introduction

This project uses the esp32-S3 development version Kovro-2 V3.1 to accept voice input and make intelligent judgments to execute the required commands. This project is an offline voice intelligent judgment.
Wake up the intelligent robot through the keyword "Hi, Espressif". Then say the command you want it to complete within the specified time. When the command waiting time is exceeded, it exits the command receiving state. Entering the voice command again must wake it up again. There are a total of 16 commands in the initial project. Other voice commands can be set in the configuration file, a total of 200. When the development board is awakened, it prompts that it has been awakened and is waiting to receive commands. When the correct voice command is received within the specified time, the voice command ID number will be displayed.

System Block Diagram

Hardware : This system consists of the esp32-S3 korvo-2 development board provided by Digi-Key and an external three-watt speaker.

ESP32-S3-Korvo-2 is a multimedia development board based on the ESP32-S3 chip, equipped with a dual microphone array, supporting voice recognition and near/far-field voice wake-up. It also has peripherals such as LCD, camera, microSD card, etc., and can support JPEG-based video stream processing to meet users' needs for low-cost, low-power, networked audio and video product development.

Software : ESP-Skainet is an intelligent voice assistant launched by Espressif, which currently supports wake-up word recognition and command word recognition.

ESP-Skainet supports the development of wake-up word recognition and command word recognition applications based on Espressif's ESP32 series chips in the most convenient way. With ESP-Skainet, you can easily build wake-up word recognition and command word recognition applications.

ESP-Skainet supports the following functions:

Input Audio

The input audio stream can come from a microphone, or a wav/pcm audio file in a Flash/TF card.

Wake-up word recognition

The wake-up word model

链接已隐藏，如需查看请登录或者注册

is dedicated to providing a high-performance model with low resource consumption, supporting the recognition of wake-up words such as "Alexa", "Tmall Genie", "Xiao Ai Tongxue", etc. Currently, Espressif has opened "Hi, Espressif" for free.

Voice command word recognition

The command word recognition model

链接已隐藏，如需查看请登录或者注册

is dedicated to providing a flexible offline voice command recognition framework. Users can easily customize voice commands according to their needs without retraining the model.

Currently, the model supports recognition of Chinese command words such as "Turn on the air conditioner" and "Turn on the bedroom light" and English command words such as "Turn on/off the light". The maximum number of custom voice command words is 200.

Acoustic front-end algorithm

The acoustic front-end algorithm

链接已隐藏，如需查看请登录或者注册

integrates echo cancellation AEC (Acoustic Echo Cancellation), automatic gain adjustment AGC (automatic_gain_control), noise suppression NS (Noise Suppression), voice activity detection VAD (Voice Activity Detection) and microphone array algorithm (Mic Array Speech Enhancement).

3. Functional description of each part
Install and build the software environment. The software development environment for this time is Windows 10 + esp-idf + vscode. First install esp-idf under Windows 10. Due to network problems, it is recommended to use offline installation.
Download IDF V4.4.2the offline installation package, which is over 900M: https://dl.espressif.cn/dl/esp-idf/ Select the second download.
image.png (50.24 KB, downloads: 0)

download attach save to album

2022-10-11 09:47 上传

Download and run.

Select Complete Installation

Next, install. Wait for the installation to complete.

Select Allow, and then install and configure the Python environment.

After configuration is completed

Click Finish. The command line and power shell environment for setting the path will be run. Click Yes

Now the ESP-IDF software is installed. Then install vscode. Download Visual Studio Code - Mac, Linux, Windows Download supports win10

Double-click the latest version 1.72.0 to install.

You can change the default installation path.

Check Create desktop shortcut

Open vscode and go to the far right column. Select the item in the red circle. Install the plugin. Type esp and you will see espressif-IDF. Click install

Install and reopen, you will see the ESP icon. When installing the esp-idf plugin, the C and C++ plugins will be automatically installed.

When you open the ESP-IDF plugin for the first time, it will prompt you to configure the default path. Since esp-idf has been installed, it has already queried the installation path of esp-idf.

Just select the red column below and click Install.

When completed, the image is as follows

Close vscode. Enter the esp-idf installation path through the ESP-IDF command on the desktop

Run cd.. to return to the frameworks path. git clone --recursive https://github.com/espressif/esp-skainet.git

Pull the esp-skainet package.

Open vscode and install the Chinese environment package. Open the esp-skainet folder

C:\Espressif\frameworks\esp-skainet\examples\cn_speech_commands_recognition

When you open it for the first time, you will be prompted to trust it. Select Trust.

Click the red circle to set the compilation parameters. It will pull information for the first time. It takes a little time. Change the chip to ESP32S3

Select KORVO-2 at the board level

Select the wake-up word below

and add the commands you want to enter, note: Pinyin

Click to save

Then click the red mark to compile. Normally, the following result will appear, indicating successful compilation.

Connect the speaker, connect the development board, and turn on the power switch.

At this time, the computer will recognize the serial port on the development board. Enter the computer's device manager to view the serial port number.

In vscode, click the red position to select the development board serial port number COM4 you just saw.

After modification, it is COM4

Click the burn icon to burn

After the burning is completed, click Monitor

Enter the standby state. At this time, you can use the previously configured wake-up word "hai lexin" to wake up the development board and issue voice commands.

Say different command words. The development board will display the command information ID number. The red part is our newly added command ID17 "zao shang hao good morning"

Source code of the four works , installation software, and test videos. Link: https://pan.baidu.com/s/1Veovh9hMcZNeUg66sP4g3w?pwd=5wug
Extraction code: 5wug
Five works function demonstration video
Click to view >> Demo video
Source code, installation software, and test video. Link: https://pan.baidu.com/s/1Veovh9hMcZNeUg66sP4g3w?pwd=5wug
Extraction code: 5wug
Or scan the icon below via WeChat.

Summary of the six projects Espressif's ESP32S3 is powerful. The official has also made a lot of open source information. A little complaint is that some resources on github are very inconvenient to download. I took a lot of detours in the process of building the development environment this time. I will take this report to fully record the construction process.
After the integration of Espressif's ESP-IDF software into VSCODE, the operation is more intuitive. Through this project, I have basically mastered how to build the development environment of products such as ESP32. And how to develop ESP32 series products under VSCODE. I hope to have a deeper understanding of the development process in future work.
Finally, I wish this competition a complete success, and I wish all my friends a successful career! If you need to communicate with friends, you can add WeChat plcpro to make progress together

发呆二极管 · Published on 2022-10-12 09:13

I just bought an S3 development board yesterday to play with voice recognition. It's a good opportunity for me to learn. Thanks for sharing the tutorial. I'll try it after I get the development board.

wangerxian · Published on 2022-10-12 15:44

I have never been able to find this speech recognition routine. Thanks for sharing~

wangerxian · Published on 2022-10-12 17:26

I checked and there is no [esp-skainet] folder in my environment.

plcpro

esp-skainet needs to be pulled separately

HonestQiao

This work is very good!

Voice interaction is now a must-have feature for smart homes. If ESP32 can support this feature, it will be more useful to develop with ESP32 in the future.

飞扬自我 · Published on 2022-12-28 15:22

Haha, very interesting? But it's a bit of a rehash of the old stuff. I was able to implement this function N years ago, but now I'm just using the same old method to implement it again on a different platform.

Esp 32-S3 intelligent voice robot [Copy link]

Input Audio

Wake-up word recognition

Voice command word recognition

Acoustic front-end algorithm

Latest reply