A detailed introduction to embedded audio processing workflow development-EEWORLD

Collect

Preface

Audio is a digital representation of sound. It has many applications, and the application technology in many fields is already mature, such as the common ones: communication, entertainment, medical treatment (ultrasound), human-computer interaction, etc. As far as the consumer embedded devices I have come into contact with are concerned, the more common application scenarios are:

Voice intercom,

Audio and video recording

Voice detection and recognition

The development technologies involved mainly include:

Audio encoding and decoding

Audio format packaging and format conversion

Echo Cancellation

Sound detection and recognition

Although most audio application technologies are mature, there are still many problems in embedded development due to the lack of hardware resources. It involves a lot of knowledge and concepts, and it is easy to get confused if you are not a professional audio and video student.

The following content is a simple summary of the audio-related knowledge I have come into contact with in actual development work, for reference only.

(I) Introduction to audio processing flow

(1) Ideal processing flow

The ideal audio application processing flow is shown in the figure below:

MIC converts the sound vibration signal into an electrical (digital/analog) signal and inputs it to the AI (audio input module) of the SOC

The AI module converts the input signal (ADC conversion sampling) and outputs audio data in PCM format

Compress, convert, and package PCM audio data into various formats, such as common AAC, MP3, etc.

Encapsulate the compressed audio file and the video file into an audio and video file, such as an MP4 file

(2) Actual processing flow

In embedded applications, considering the system resource limitations and different application scenarios, the actual use will be more complicated. The main limitation is that it must support both local audio storage and network transmission.

PCM is the original audio data. The audio encoding of general embedded chips can encode PCM data into G711, G726 and other formats, but basically does not support AAC encoding, which may be mainly related to copyright issues. Ingenic and HiSilicon series SOCs cannot directly support AAC encoding.

However, from the perspective of encoding compression ratio, the compression ratio of AAC encoding is higher than that of G711 and G726, which means that under the same conditions, AAC encoding can store audio information for a longer time. In addition, many video encapsulation libraries are relatively friendly to AAC.

Based on the above situations, there may be several audio formats in the same system. For example, the following figure:

In the above picture, the main application scenarios are audio network transmission and audio local storage.

Route 1:

The PCM collected by the AI module is directly transmitted to the IOT platform through the network

This method consumes less resources, but occupies a large amount of network bandwidth.

Suitable for SOC without audio encoding module

Route 2:

Encode PCM format data into G711, G726 and other formats before transmitting them over the network

It consumes less resources and occupies less network bandwidth, making it the best option.

Suitable for SOC with audio coding

Route 3:

Encode PCM format data into AAC format through software encoding, and then encapsulate it into MP4, AVI and other formats

This method will occupy CPU resources, RAM, and Flash space (AAC encoding library is relatively large)

Applicable to scenarios where AAC encoding is required

Route 4:

The main reason for this usage is that SOC only supports one audio format output at a time. For example, if you want to output PCM format, you can no longer encode and output G711, G726 and other formats.

The encoded output G711 and G726 formats are decoded into PCM format by software, compressed into AAC format by software, and finally packaged into mp4 format

This method is suitable for scenarios where the AAC format must be used, but the SOC cannot output two types of audio formats at the same time.

The most resources are consumed

(II) Audio format conversion

(1) PCM and G711A, G711U

PCM:

The device collects audio signals through MIC. MIC is divided into two categories, digital MIC and analog MIC. Digital MIC outputs converted digital signals, but analog MIC is more commonly used in consumer devices.

PCM data is a binary sequence of digital signals converted by ADC from analog audio signals input by analog MIC. It has no file header and no end mark and is an uncompressed data format.

PCM files can be opened by Audacity Beta (Unicode) in File->Import->Raw Data mode, and can be played, edited, viewed, etc.

The main parameters are: channel, sampling frequency, sampling bit number

The following figure opens a 2-channel, 48KHz sampling frequency, 16-bit PCM file.

G711A and G711U

G711 is divided into a-law and u-law, which compresses 16-bit PCM data into 8-bit by looking up the table.

G711 has a compression ratio of 1:2. A 1M PCM file is only 0.5M after being converted to G711 format.

The u-law in G711 is g711u, which is mainly used in North America and Japan.

The a-law in G711 is g711a, which is mainly used in Europe and other regions

If you want to play G711 file audio directly, you can use ffplay command to play it in Linux system.

ffplay -i test.pcm -f s16le -ac 2 -ar 48000 ffplay -i test.g711a -f alaw -ac 2 -ar 48000 ffplay -i test.g711u -f mulaw -ac 2 -ar 48000

-ac: number of audio channels -ar: audio sampling rate -f: file format

The conversion between G711 and PCM is relatively simple. The above is a simple project to convert a 48K 16bit 2-channel PCM to G711 format.

(III) AAC format and encoding

AAC is much more complicated than G711. AAC has many versions and encoders. The most commonly used one is FAAC (Freeware Advanced Audio Coder) because it is free.

(1) Various AAC formats

The file formats of AAC are:

ADIF (Audio Data Interchange Format) has audio header information only at the beginning of the file

The main feature of ADTS (Audio Data Transport Stream) is that each frame carries header information.

File format refers to the audio data stored mainly in file types.

AAC stream format:

Stream format mainly refers to the format used for streaming media transmission, mainly including:

AAC_RAW refers to raw AAC data without encapsulation

AAC_ADTS is the same as the ADTS format in the file format

AAC_LATM (Low-Overhead Audio Transport Multiplex) is a transmission protocol for AAC audio.

The ADTS format is more commonly used because it can be used in both audio data file storage and streaming.

(2) ATDS format introduction

Let's look at the definition of the ADTS structure in fdk-aac

typedefstruct {
  /* ADTS header fields */
  UCHAR mpeg_id;
  UCHAR layer;
  UCHAR protection_absent;
  UCHAR profile;
  UCHAR sample_freq_index;
  UCHAR private_bit;
  UCHAR channel_config;
  UCHAR original;
  UCHAR home;
  UCHAR copyright_id;
  UCHAR copyright_start;
  USHORT frame_length;
  USHORT adts_fullness;
  UCHAR num_raw_blocks;
  UCHAR num_pce_bits;
} STRUCT_ADTS_BS;

Here only the items in the structure header are listed. There are 15 items listed here, and the length of the entire structure header is 17 bytes.

The actual ADTS header structure has two lengths. The one with CRC checksum is 9 bytes long, and the one without CRC checksum is 7 bytes long. The function and actual length of each item can be seen in a definition on wiki: https://wiki.multimedia.cx/index.php/ADTS

We use the Elecard Stream Analyzer tool to open an AAC file in ADTS format for a clearer view:

The fourth frame of label 1 is randomly selected, and its offset address is 0x54a

Label 2 is the ADTS synchronization word Syncword, 12 bits, 0xFFF

The upper right box is the analysis of various ADTS parameters

Label 3 is the length of the single previous frame (frame 4), 403

Label 4 is the offset address of the next frame 0x6dd, which is exactly the offset address of the previous frame + the length of the previous frame = 0x54a + 403 = 0x6dd

If you need to manually parse the AAC ADTS format file, you can also parse it in the above way. First find the frame header label, then parse each parameter item by item, and finally jump to the next frame for data parsing according to the frame length.

(3) AAC format encoding

The main AAC encoders are: FhG, Nero AAC, QuickTime/iTunes, FAAC, DivX AAC. FAAC is more commonly used in embedded systems.

The commonly used coding tools and libraries based on FAAC are:

FFMPEG: It can integrate multiple encoders

fdk-aac: also integrates faac codec

faac: aac encoding library

faad: aac decoding library

The source code of the AAC encapsulation libraries introduced above can be downloaded from GitHub:

https://github.com/mstorsjo/fdk-aac https://github.com/knik0/faac https://github.com/knik0/faad2

(4) fdk-aac transplantation

Download the source code from github https://github.com/mstorsjo/fdk-aac

You can select different versions to download by tag. The ones in tags are generally more stable release versions.

If you want to port fdk-aac to Ingenic's T31 device, you can cross-compile using the following command:

mkdir _install_uclibc ./autogen.sh CFLAGS+=-muclibc LDFLAGS+=-muclibc CPPFLAGS+=-muclibc CXXFLAGS+=-muclibc ./configure --prefix=$PWD/_install_uclibc --host=mips-linux-gnu make -j4 make install

The cross-compiled files are placed in the _install_uclibc folder. You can use the following command to determine the compilation tool chain used for compilation: file libfdk-aac.so.2.0.2

biao@ubuntu:~/test/fdk-aac-master/_install_uclibc/lib$ file libfdk-aac.so.2.0.2 libfdk-aac.so.2.0.2: ELF 32-bit LSB shared object, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, not stripped

[1] [2]

Reference address：A detailed introduction to embedded audio processing workflow development

Previous article：Smart home system based on CW32
Next article：YXC active crystal oscillator provides clock solution for power amplifier and audio system

Popular Resources
Popular amplifiers