Preface
Audio is a digital representation of sound. It has many applications, and the application technology in many fields is already mature, such as the common ones: communication, entertainment, medical treatment (ultrasound), human-computer interaction, etc. As far as the consumer embedded devices I have come into contact with are concerned, the more common application scenarios are:
Voice intercom,
Audio and video recording
Voice detection and recognition
The development technologies involved mainly include:
Audio encoding and decoding
Audio format packaging and format conversion
Echo Cancellation
Sound detection and recognition
Although most audio application technologies are mature, there are still many problems in embedded development due to the lack of hardware resources. It involves a lot of knowledge and concepts, and it is easy to get confused if you are not a professional audio and video student.
The following content is a simple summary of the audio-related knowledge I have come into contact with in actual development work, for reference only.
(I) Introduction to audio processing flow
(1) Ideal processing flow
The ideal audio application processing flow is shown in the figure below:
MIC converts the sound vibration signal into an electrical (digital/analog) signal and inputs it to the AI (audio input module) of the SOC
The AI module converts the input signal (ADC conversion sampling) and outputs audio data in PCM format
Compress, convert, and package PCM audio data into various formats, such as common AAC, MP3, etc.
Encapsulate the compressed audio file and the video file into an audio and video file, such as an MP4 file
(2) Actual processing flow
In embedded applications, considering the system resource limitations and different application scenarios, the actual use will be more complicated. The main limitation is that it must support both local audio storage and network transmission.
PCM is the original audio data. The audio encoding of general embedded chips can encode PCM data into G711, G726 and other formats, but basically does not support AAC encoding, which may be mainly related to copyright issues. Ingenic and HiSilicon series SOCs cannot directly support AAC encoding.
However, from the perspective of encoding compression ratio, the compression ratio of AAC encoding is higher than that of G711 and G726, which means that under the same conditions, AAC encoding can store audio information for a longer time. In addition, many video encapsulation libraries are relatively friendly to AAC.
Based on the above situations, there may be several audio formats in the same system. For example, the following figure:
In the above picture, the main application scenarios are audio network transmission and audio local storage.
Route 1:
The PCM collected by the AI module is directly transmitted to the IOT platform through the network
This method consumes less resources, but occupies a large amount of network bandwidth.
Suitable for SOC without audio encoding module
Route 2:
Encode PCM format data into G711, G726 and other formats before transmitting them over the network
It consumes less resources and occupies less network bandwidth, making it the best option.
Suitable for SOC with audio coding
Route 3:
Encode PCM format data into AAC format through software encoding, and then encapsulate it into MP4, AVI and other formats
This method will occupy CPU resources, RAM, and Flash space (AAC encoding library is relatively large)
Applicable to scenarios where AAC encoding is required
Route 4:
The main reason for this usage is that SOC only supports one audio format output at a time. For example, if you want to output PCM format, you can no longer encode and output G711, G726 and other formats.
The encoded output G711 and G726 formats are decoded into PCM format by software, compressed into AAC format by software, and finally packaged into mp4 format
This method is suitable for scenarios where the AAC format must be used, but the SOC cannot output two types of audio formats at the same time.
The most resources are consumed
(II) Audio format conversion
(1) PCM and G711A, G711U
PCM:
The device collects audio signals through MIC. MIC is divided into two categories, digital MIC and analog MIC. Digital MIC outputs converted digital signals, but analog MIC is more commonly used in consumer devices.
PCM data is a binary sequence of digital signals converted by ADC from analog audio signals input by analog MIC. It has no file header and no end mark and is an uncompressed data format.
PCM files can be opened by Audacity Beta (Unicode) in File->Import->Raw Data mode, and can be played, edited, viewed, etc.
The main parameters are: channel, sampling frequency, sampling bit number
The following figure opens a 2-channel, 48KHz sampling frequency, 16-bit PCM file.
G711A and G711U
G711 is divided into a-law and u-law, which compresses 16-bit PCM data into 8-bit by looking up the table.
G711 has a compression ratio of 1:2. A 1M PCM file is only 0.5M after being converted to G711 format.
The u-law in G711 is g711u, which is mainly used in North America and Japan.
The a-law in G711 is g711a, which is mainly used in Europe and other regions
If you want to play G711 file audio directly, you can use ffplay command to play it in Linux system.
ffplay -i test.pcm -f s16le -ac 2 -ar 48000 ffplay -i test.g711a -f alaw -ac 2 -ar 48000 ffplay -i test.g711u -f mulaw -ac 2 -ar 48000
-ac: number of audio channels -ar: audio sampling rate -f: file format
The conversion between G711 and PCM is relatively simple. The above is a simple project to convert a 48K 16bit 2-channel PCM to G711 format.
(III) AAC format and encoding
AAC is much more complicated than G711. AAC has many versions and encoders. The most commonly used one is FAAC (Freeware Advanced Audio Coder) because it is free.
(1) Various AAC formats
The file formats of AAC are:
ADIF (Audio Data Interchange Format) has audio header information only at the beginning of the file
The main feature of ADTS (Audio Data Transport Stream) is that each frame carries header information.
File format refers to the audio data stored mainly in file types.
AAC stream format:
Stream format mainly refers to the format used for streaming media transmission, mainly including:
AAC_RAW refers to raw AAC data without encapsulation
AAC_ADTS is the same as the ADTS format in the file format
AAC_LATM (Low-Overhead Audio Transport Multiplex) is a transmission protocol for AAC audio.
The ADTS format is more commonly used because it can be used in both audio data file storage and streaming.
(2) ATDS format introduction
Let's look at the definition of the ADTS structure in fdk-aac
typedefstruct { /* ADTS header fields */ UCHAR mpeg_id; UCHAR layer; UCHAR protection_absent; UCHAR profile; UCHAR sample_freq_index; UCHAR private_bit; UCHAR channel_config; UCHAR original; UCHAR home; UCHAR copyright_id; UCHAR copyright_start; USHORT frame_length; USHORT adts_fullness; UCHAR num_raw_blocks; UCHAR num_pce_bits; } STRUCT_ADTS_BS;
Here only the items in the structure header are listed. There are 15 items listed here, and the length of the entire structure header is 17 bytes.
The actual ADTS header structure has two lengths. The one with CRC checksum is 9 bytes long, and the one without CRC checksum is 7 bytes long. The function and actual length of each item can be seen in a definition on wiki: https://wiki.multimedia.cx/index.php/ADTS
We use the Elecard Stream Analyzer tool to open an AAC file in ADTS format for a clearer view:
The fourth frame of label 1 is randomly selected, and its offset address is 0x54a
Label 2 is the ADTS synchronization word Syncword, 12 bits, 0xFFF
The upper right box is the analysis of various ADTS parameters
Label 3 is the length of the single previous frame (frame 4), 403
Label 4 is the offset address of the next frame 0x6dd, which is exactly the offset address of the previous frame + the length of the previous frame = 0x54a + 403 = 0x6dd
If you need to manually parse the AAC ADTS format file, you can also parse it in the above way. First find the frame header label, then parse each parameter item by item, and finally jump to the next frame for data parsing according to the frame length.
(3) AAC format encoding
The main AAC encoders are: FhG, Nero AAC, QuickTime/iTunes, FAAC, DivX AAC. FAAC is more commonly used in embedded systems.
The commonly used coding tools and libraries based on FAAC are:
FFMPEG: It can integrate multiple encoders
fdk-aac: also integrates faac codec
faac: aac encoding library
faad: aac decoding library
The source code of the AAC encapsulation libraries introduced above can be downloaded from GitHub:
https://github.com/mstorsjo/fdk-aac https://github.com/knik0/faac https://github.com/knik0/faad2
(4) fdk-aac transplantation
Download the source code from github https://github.com/mstorsjo/fdk-aac
You can select different versions to download by tag. The ones in tags are generally more stable release versions.
If you want to port fdk-aac to Ingenic's T31 device, you can cross-compile using the following command:
mkdir _install_uclibc ./autogen.sh CFLAGS+=-muclibc LDFLAGS+=-muclibc CPPFLAGS+=-muclibc CXXFLAGS+=-muclibc ./configure --prefix=$PWD/_install_uclibc --host=mips-linux-gnu make -j4 make install
The cross-compiled files are placed in the _install_uclibc folder. You can use the following command to determine the compilation tool chain used for compilation: file libfdk-aac.so.2.0.2
biao@ubuntu:~/test/fdk-aac-master/_install_uclibc/lib$ file libfdk-aac.so.2.0.2 libfdk-aac.so.2.0.2: ELF 32-bit LSB shared object, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, not stripped
Previous article:Smart home system based on CW32
Next article:YXC active crystal oscillator provides clock solution for power amplifier and audio system
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- NXP PLC2366 Timer 2 Interrupt Program
- Use of Linux semaphores - code testing
- Free evaluation | Ateli AT-START-F403A helps you explore and discover ARM Cortex-M4F with FPU core
- How to configure the input and output of a GPIO in the TMS320C6000 series DSP?
- Working conditions and working characteristics of switching power supply
- Let’s discuss some common misunderstandings about GaN.
- RISC-V MCU Development (IV): Compilation Configuration
- Free review: Chuanglong TL570x-EVM is here, TI AM5708 industrial board
- [Synopsys IP Resources] One data cable goes everywhere, USB4 fully accelerates the next generation of chips
- 【CH579M-R1】+ driving LCD5110 display