[ESP32-Korvo Review] Part 3: Smart Voice Assistant ESP-Skainet
[Copy link]
This post was last edited by Digital Leaf on 2021-1-31 20:36
ESP-Skainet is an intelligent voice assistant developed by Espressif, which supports the wake-up word engine (WakeNet), offline voice command recognition engine (MultiNet) and front-end acoustic algorithm. It has the characteristics of lightness, security, low latency, wake-up word customization, custom control commands, etc. The main application scenarios include smart home, smart office, companion devices, etc. The functional block diagram of ESP-Skainet is as follows
I found that its input audio can not only come from a microphone, but also audio files in wav/pcm and other formats.
AEC (Acoustic Echo Cancellation), AGC (automatic gain control), NS (Noise Suppression), VAD (Voice Activity Detection), and Mic Array Speech Enhancement are the algorithm functions currently integrated in ESP-Skainet.
WakeNet wake-up word model. Currently, the wake-up words that Espressif has opened include "Hi, Espressif", "Hello Xiaozhi", "Hello Xiaoxin", and "hi, Jeson". Others need to be customized and require a certain amount of sample data.
MultiNet command word model, which can customize voice commands according to needs without retraining the model.
The last part of TTS is speech synthesis, application scenarios, etc.
git clone --recursive https://github.com/espressif/esp-skainet.git
Clone the whole process first. The download process is really long and slow.
After the download is complete, check the downloaded content and find that Skainet provides 6 routines: "chinese_tts", "garbage_classification", "get_started", "mic_array_speech_enhancement", "noise_suppression", and "wake_word_detection". "get_started" is more suitable as the first routine, so copy a copy of "get_started"
I tried to compile, but the compilation failed and an error message was given. I could only check the CMakeOutput.log file to find the reason, but there was no prompt and only the passed projects were shown. . .
I could only test it slowly and finally found the problem.
Therefore, this get_started project cannot be copied without modification. So make a backup copy first and compile it in the original path. The compilation is OK.
Burn it to the board and test it. get_started comes with 20 voice commands.
However, the test results were not ideal. Not only was the recognition rate low, but the commands that were fed back were all wrong. For example, 0 was recognized as 5, and 7 was recognized as 8. So I tried various parameter changes, but the problem was not solved. Once, I found that the flash size was incorrect during downloading, so I changed the correct flash size and the problem was solved.
Finally, it can be recognized correctly, but the recognition rate is low
I looked at the configuration parameters again and changed the speech commands recognition mode after wake up.
After changing to single recognition, the recognized speech is almost all correct
The recognition rate is almost the same as the built-in demo routine. Finally, among the many wake-up words, I still think "Hello, Xiaoxin" is the most catchy.
|