introduction
With the rapid development of microelectronics and computer technology, many embedded application systems have emerged. Among them, various speech processing systems have been continuously developed and widely used in various industries, such as voice station announcers, automatic interpretation devices, interview recorders, etc., which have provided great convenience for human production and life. This paper is based on the 32-bit embedded SoC of Southeast University's National ASIC-SEP3203 processor, and adopts the G.721 standard ADPCM algorithm to realize the software real-time encoding and decoding of speech signals, providing an effective embedded solution for speech processing applications.
1. G.721 Standard Overview
In 1937, AHReeves proposed pulse code modulation (PCM), which pioneered the process of voice digital communication. In the early 1980s, CCITT began to study non-PCM coding algorithms below 64 Kb/s, and successively formulated and passed coding standards such as G.721, G.728, and G.729. Among them, the G.721 protocol, as a typical algorithm of ADPCM, not only has almost the same voice quality as PCM, but also has a simple algorithm structure and excellent error-resistant performance. It is widely used in satellites, submarine cables, and portable digital voice equipment. The simplified block diagram of the G.721 algorithm is shown in Figure 1.
Coding process:
① Calculate the difference E(k)=Sl(k)-Se(k) between Sl(k) and the adaptive predictor output Se(k);
② Quantize E(k) through the adaptive quantization module to obtain the ADPCM codeword I(k);
③ Calculate I(k) through the adaptive inverse quantization module to obtain the quantized differential prediction signal Dq(k);
④ Update the prediction filter coefficients according to the reconstructed signals Sr(k)=Se(k)+Dq(k) and Dq(k);
⑤ Use the new coefficients to calculate Se(k+1), repeat the above 5 steps, and compress the next speech sample data.
Decoding process:
① Obtain Dq(k) and Se(k) through adaptive dequantization and adaptive prediction, and obtain the speech reconstruction signal Sr(k);
② Convert the reconstructed signal Sr(k) into PCM format to obtain the PCM codeword Sp(k);
③ Update the prediction filter coefficients using the same method as the encoder;
④ Synchronously adjust Sp(k) to achieve two-way communication;
⑤ Repeat the above 5 steps using the new filter coefficients to decode the next I(k).
2. Chip Introduction
The SEP3203 chip is the core of the system processing, and the overall structure block diagram is shown in Figure 2. This chip is a 32-bit SoC based on ARM7TDMI designed independently by the National ASIC System Engineering Technology Research Center of Southeast University. It adopts the AMBA2.0 standard and 0.25μmCMOS process, and is mainly aimed at embedded low-end handheld devices. The chip provides AC97 controller, external memory interface EMI, 6-channel DMAC, TIMER, PMU, INTC and other modules. Among them, the modules used in the voice system are: EMI, which is responsible for controlling access to external memory; on-chip memory
eSRAM is used to optimize time-consuming core codes; AC97 provides an AC97 standard audio interface; DMAC is used to implement DMA transfer of large amounts of data.
3. System design
3.1 Hardware System
The hardware system block diagram is shown in Figure 3. The dotted box is the on-chip module; the outside of the box is the off-chip device, including external memory (SDRAM/SRAM/FLASH, etc.), CODEC, etc. Philips' UCB1400 is used as CODEC. The following is the system working process.
① Encoding. CODEC samples the voice data and temporarily stores it in the input FIFO of AC97. Then, DMAC transmits the data to the specified storage area through interruption. Under the control of ARM7TDMI, the G.721 encoding program is run to compress the voice PCM data into ADPCM code.
② Decoding. Run the G.721 decoding program to decode the ADPCM code in the memory into PCM code. After decoding a full frame of data, DMAC transmits the data to the output FIFO of AC97 through interrupt mode, and drives the playback device (headphones, speakers) through CODEC.
wait).
According to the real-time requirements of speech, the sampling rate of UCB1400 is set to 8 kb/s. The chip uses 16 bits to represent a sampling point, so the sampling rate is 128 kb/s. After encoding, each sampling point is represented by 4 bits, so the transmission rate is 32 kb/s.
3.2 Software System
The software flow is shown in Figure 4. Each frame of data consists of 64 sampling points, a total of 128 bytes, 16-bit PCM code, which is encoded into 32 bytes, 4-bit ADPCM code.
(1) Coding
First, the system is initialized, including configuration of AC97, CODEC, DMAC and other modules, as well as initialization of related state variables. Then, the first frame of voice data is sampled, and after sampling, the DMA interrupt is entered. In the interrupt processing, the DMAC is configured again, a new sampling transmission is triggered, and the newly sampled data is encoded. Since the encoding is performed by the kernel and the sampling is completed by the CODEC and DMA, the encoding of the Kth frame and the sampling of the K+1th frame are performed concurrently.
(2) Decoding
Similar to the encoding process, the system is initialized first, and then the first frame of audio data is decoded. After decoding, the DMAC is configured to trigger the data to be transmitted to the AC97 output FIFO, and the recording is played through the playback device. Similarly, the decoding of the K+1 frame data is performed concurrently with the playback of the K frame data.
This design uses a "double buffer" mechanism to buffer data. "Double Buffer" means: two frame buffers are opened as Buf0 and Buf1, and the buffer flag Flg is initially 0. When encoding, the first frame of data is sampled, and DMA transfers data from the AC97 input FIFO to Buf0. After the transmission, Flg=1 is set, and the encoder takes data from Buf0 for encoding; at the same time, DMA transmits new data to Buf1. Repeatedly, after each frame of data is sampled, Flg=!Flg is set, and the encoder takes data from the Buf!Flg buffer for encoding. The destination address of the sampled data transmitted by DMA is Buf Flg, thereby realizing the concurrent sampling of the K+1 frame data and the encoding of the K frame data. As long as the encoding speed is higher than the sampling speed, there will be no data overwriting. The processing process is as follows (the situation during decoding is similar):
Flg=0;
Psmp=Buf Flg;
Run_Sample(Psmp);//Sampling the first frame data
While(1) {
Flg=!Flg;
Penc=Buf !Flg;//The encoding pointer points to the buffer Buf !Flg
Psmp=Buf Flg;//The sampling pointer points to the buffer Buf Flg
Run_Sampler(Psmp); Run_Encoder(Penc);
//Start the sampler and encoder, both concurrently
}
4. Performance optimization
The real-time requirements of speech processing are very high. Otherwise, if the data processing speed cannot keep up with the speed of speech changes, the newly sampled data will cover the previously sampled but unprocessed data during recording; when playing, the playback speed will be slower than the actual speech. Of course, if a large enough buffer is used, the problems in recording can be avoided, but the problems in playback are unavoidable. At the same time, given that storage resources are very valuable for embedded systems, this solution has no practical value. The "double buffer" mechanism introduced above can make sampling and encoding, decoding and playback independent of each other, and execute concurrently, which is easy to control; but in order to meet the real-time requirements, the encoding and decoding speed must also meet the requirements of sampling and playback. The speech rate is 8 KB/s, and a sampling point in the system is represented by 16 bits, so the encoding and decoding speed cannot be lower than 16 KB/s (that is, at least 16 KB of PCM code is encoded per second, and at least 16 KB of PCM code is decoded per second). Table 1 shows the time taken to process 512 KB of PCM code (corresponding to 128 KB of ADPCM code) on a bare metal machine without an operating system before system optimization. This test was conducted using the SoC internal timer TIMER, see reference [1]. The test results show that the system did not meet the real-time requirements of speech before optimization.
So far, the system target code is running in SDRAM. SEP3203 provides a very useful module - on-chip high-speed memory eSRAM. eSRAM has a very fast access speed, which can reach 0.89 MIPS/MHz, so it has a great optimization effect on system performance, while SDRAM can only have about 1/3 of its performance. Table 2 is a performance comparison of SDRAM and eSRAM under the conditions of 50 MHz clock and 32-bit ARM instructions. The meaning of each indicator can be found in reference [1].
However, the 20K eSRAM resource of SEP3203 is limited, and it is impossible and unnecessary to execute all the codes in it. The ARM integrated development tool provides a profile function, which can perform statistical analysis on the entire program and obtain the percentage of the time consumed by each part of the code (mainly in units of standard C functions) in the total system time. By performing profile analysis on the software system, the percentage of each codec library function in the total codec time is obtained, and the main parts are listed in Table 3.
The above three functions take up nearly 80% of the total encoding and decoding time (Quan(), Fmult(), and Update() are used for quantization table lookup, fixed-point floating-point multiplication, and state variable update, respectively). Optimizing these codes will significantly improve the encoding and decoding speed. Integrate these function codes into the file rec_esram.c, and then load the remap.scf file for memory mapping (*.scf file is the link script file provided by the ARM ADS integrated development tool). Below is the content of the remap.scf file:
FLASH 0x30002000 0x1000000
{
FLASH 0x30002000
//System initialization entry and other code storage address
{
init_ice.o (INIT, +First)
* (+RO,+RW,+ZI)
}
32bitRAM 0x00000000 //Interrupt vector table entry address
{
boot_gfd.o (BOOT, +First)
}
ESRAM 0x1fff0000 0x600 //Core library code storage address, in eSRAM
{
rec_esram.o (+RO,+RW,+ZI)
}
/*Stack setting part*/
}
After the memory image is created, the target code of rec_esram.c, rec_esram.o (about 1.5KB), is loaded into eSRAM (starting address 0x1fff0000) and executed. Table 4 shows the encoding and decoding speed test results after eSRAM optimization.
The performance of the voice system was also tested with an operating system, as listed in Table 5. The operating system is ASIXOS, which is developed by the Southeast University ASIC System Engineering Technology and Research Center for embedded applications. It provides support for graphical user interface, network, clock, real-time interrupt management, and clear application development interface. The voice system is an application in the OS environment, with an independent user interface and underlying services. Due to space limitations, this article will not go into detail.
From the above tests, it can be seen that after eSRAM optimization, the encoding and decoding speed can meet the real-time needs of voice, whether on a bare metal or with an operating system, and meets the design requirements.
Conclusion
When designing an embedded system for multimedia applications, real-time performance is very important. This paper proposes a design scheme for a voice processing system in a SoC based on the ARM7TDMI core, and optimizes the system performance based on the characteristics of the SoC with eSRAM. The test of the prototype shows that the system has an encoding rate of 19.88 KB/s and a decoding rate of 22.68 KB/s with a main frequency of 70 MHz and an operating system, which meets the real-time requirements of the voice system. Moreover, if voice processing is used as a subsystem of the prototype, its hardware design also supports MP3 playback and LCD touch screen.
The function achieves the purpose of reducing the system board area and lowering the cost of the whole machine, which is an efficient and low-cost design solution.
References
1 Ling Ming. Low-cost handheld multimedia device processor based on ARM7TDMI. Nanjing: National ASIC Engineering Center of Southeast University, 2004
2 Gou Daju, Yang Qigang. Voice recording and playback system development platform based on ADPCM coding. Journal of Sichuan University (Natural Science Edition), 1998.4, Vol.35 No.2: 178~182
3 Fu Qiuliang, Yuan Zongbao. Pure software implementation of ADPCM voice compression algorithm. Telecommunications Science, 1994.10, Vol.10 No.10: 21~24
4 Gibson Jerry D. Principles and standards of multimedia digital compression. Translated by Li Yuhui. Beijing: Electronic Industry Press, 2002
5 CCITT. Recommendation G.721: A 32kbit/s Adaptive Differential PulseCodeModulation, Red Book,1984
6 CCITT. Recommendation G.711: General Aspects of Digital Transmission Systems and Terminal Equipments, Blue Book, 1988
Previous article:Data Communication between ARM Processor and DSP under Embedded Linux
Next article:Design of SoC Voice Processing System Based on ARM7 TDMI
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- EEWORLD University ---- Live Replay: Microchip Security Series 6 - Trust Your Firmware: Secure Boot Application Processors
- 51 MCU sends data to the serial port
- About MSP430 MCU UART_FIFO sending and receiving
- TI C66x DSP Instruction Set - ADDKPC
- TGF4042 signal generator evaluation: time-frequency domain characteristics of typical signals
- Help with motor drive wiring problem
- Does the XIO2001 bridge chip support DMA function?
- Analysis of MQTT protocol in the source code of IoT firewall himqtt
- MSP430 development environment construction under Ubuntu
- Classic pictures of the past