Design of SoC Voice Processing System Based on ARM7 TDMI-EEWORLD

Collect

introduction

With the rapid development of microelectronics and computer technology, many embedded application systems have emerged. Among them, various speech processing systems have been continuously developed and widely used in various industries, such as voice station announcers, automatic interpretation devices, interview recorders, etc., which have provided great convenience for human production and life. This paper is based on the 32-bit embedded SoC of Southeast University's National ASIC-SEP3203 processor, and adopts the G.721 standard ADPCM algorithm to realize the software real-time encoding and decoding of speech signals, providing an effective embedded solution for speech processing applications.

1. G.721 Standard Overview

In 1937, AHReeves proposed pulse code modulation (PCM), which pioneered the process of voice digital communication. In the early 1980s, CCITT began to study non-PCM coding algorithms below 64 Kb/s, and successively formulated and passed coding standards such as G.721, G.728, and G.729. Among them, the G.721 protocol, as a typical algorithm of ADPCM, not only has almost the same voice quality as PCM, but also has a simple algorithm structure and excellent error-resistant performance. It is widely used in satellites, submarine cables, and portable digital voice equipment. The simplified block diagram of the G.721 algorithm is shown in Figure 1.

Coding process:

① Calculate the difference E(k)=Sl(k)-Se(k) between Sl(k) and the adaptive predictor output Se(k);
② Quantize E(k) through the adaptive quantization module to obtain the ADPCM codeword I(k);
③ Calculate I(k) through the adaptive inverse quantization module to obtain the quantized differential prediction signal Dq(k);
④ Update the prediction filter coefficients according to the reconstructed signals Sr(k)=Se(k)+Dq(k) and Dq(k);
⑤ Use the new coefficients to calculate Se(k+1), repeat the above 5 steps, and compress the next speech sample data.

Decoding process:

① Obtain Dq(k) and Se(k) through adaptive dequantization and adaptive prediction, and obtain the speech reconstruction signal Sr(k);
② Convert the reconstructed signal Sr(k) into PCM format to obtain the PCM codeword Sp(k);

Figure 1 Simplified block diagram of G.721 encoder and decoder

③ Update the prediction filter coefficients using the same method as the encoder;
④ Synchronously adjust Sp(k) to achieve two-way communication;
⑤ Repeat the above 5 steps using the new filter coefficients to decode the next I(k).

2. Chip Introduction

The SEP3203 chip is the core of the system processing, and the overall structure block diagram is shown in Figure 2. This chip is a 32-bit SoC based on ARM7TDMI designed independently by the National ASIC System Engineering Technology Research Center of Southeast University. It adopts the AMBA2.0 standard and 0.25μmCMOS process, and is mainly aimed at embedded low-end handheld devices. The chip provides AC97 controller, external memory interface EMI, 6-channel DMAC, TIMER, PMU, INTC and other modules. Among them, the modules used in the voice system are: EMI, which is responsible for controlling access to external memory; on-chip memory

eSRAM is used to optimize time-consuming core codes; AC97 provides an AC97 standard audio interface; DMAC is used to implement DMA transfer of large amounts of data.

Figure 2 SEP3203 chip structure diagram

3. System design

3.1 Hardware System

The hardware system block diagram is shown in Figure 3. The dotted box is the on-chip module; the outside of the box is the off-chip device, including external memory (SDRAM/SRAM/FLASH, etc.), CODEC, etc. Philips' UCB1400 is used as CODEC. The following is the system working process.

Figure 3 Speech processing hardware system block diagram

① Encoding. CODEC samples the voice data and temporarily stores it in the input FIFO of AC97. Then, DMAC transmits the data to the specified storage area through interruption. Under the control of ARM7TDMI, the G.721 encoding program is run to compress the voice PCM data into ADPCM code.

② Decoding. Run the G.721 decoding program to decode the ADPCM code in the memory into PCM code. After decoding a full frame of data, DMAC transmits the data to the output FIFO of AC97 through interrupt mode, and drives the playback device (headphones, speakers) through CODEC.

wait).

According to the real-time requirements of speech, the sampling rate of UCB1400 is set to 8 kb/s. The chip uses 16 bits to represent a sampling point, so the sampling rate is 128 kb/s. After encoding, each sampling point is represented by 4 bits, so the transmission rate is 32 kb/s.

3.2 Software System

The software flow is shown in Figure 4. Each frame of data consists of 64 sampling points, a total of 128 bytes, 16-bit PCM code, which is encoded into 32 bytes, 4-bit ADPCM code.

Figure 4: Encoding and decoding software flow

(1) Coding

First, the system is initialized, including configuration of AC97, CODEC, DMAC and other modules, as well as initialization of related state variables. Then, the first frame of voice data is sampled, and after sampling, the DMA interrupt is entered. In the interrupt processing, the DMAC is configured again, a new sampling transmission is triggered, and the newly sampled data is encoded. Since the encoding is performed by the kernel and the sampling is completed by the CODEC and DMA, the encoding of the Kth frame and the sampling of the K+1th frame are performed concurrently.

(2) Decoding

Similar to the encoding process, the system is initialized first, and then the first frame of audio data is decoded. After decoding, the DMAC is configured to trigger the data to be transmitted to the AC97 output FIFO, and the recording is played through the playback device. Similarly, the decoding of the K+1 frame data is performed concurrently with the playback of the K frame data.

This design uses a "double buffer" mechanism to buffer data. "Double Buffer" means: two frame buffers are opened as Buf0 and Buf1, and the buffer flag Flg is initially 0. When encoding, the first frame of data is sampled, and DMA transfers data from the AC97 input FIFO to Buf0. After the transmission, Flg=1 is set, and the encoder takes data from Buf0 for encoding; at the same time, DMA transmits new data to Buf1. Repeatedly, after each frame of data is sampled, Flg=!Flg is set, and the encoder takes data from the Buf!Flg buffer for encoding. The destination address of the sampled data transmitted by DMA is Buf Flg, thereby realizing the concurrent sampling of the K+1 frame data and the encoding of the K frame data. As long as the encoding speed is higher than the sampling speed, there will be no data overwriting. The processing process is as follows (the situation during decoding is similar):

Flg=0;
Psmp=Buf Flg;
Run_Sample(Psmp);//Sampling the first frame data
While(1) { 
Flg=!Flg;
Penc=Buf !Flg;//The encoding pointer points to the buffer Buf !Flg
Psmp=Buf Flg;//The sampling pointer points to the buffer Buf Flg
Run_Sampler(Psmp); Run_Encoder(Penc); 
//Start the sampler and encoder, both concurrently
}

4. Performance optimization

The real-time requirements of speech processing are very high. Otherwise, if the data processing speed cannot keep up with the speed of speech changes, the newly sampled data will cover the previously sampled but unprocessed data during recording; when playing, the playback speed will be slower than the actual speech. Of course, if a large enough buffer is used, the problems in recording can be avoided, but the problems in playback are unavoidable. At the same time, given that storage resources are very valuable for embedded systems, this solution has no practical value. The "double buffer" mechanism introduced above can make sampling and encoding, decoding and playback independent of each other, and execute concurrently, which is easy to control; but in order to meet the real-time requirements, the encoding and decoding speed must also meet the requirements of sampling and playback. The speech rate is 8 KB/s, and a sampling point in the system is represented by 16 bits, so the encoding and decoding speed cannot be lower than 16 KB/s (that is, at least 16 KB of PCM code is encoded per second, and at least 16 KB of PCM code is decoded per second). Table 1 shows the time taken to process 512 KB of PCM code (corresponding to 128 KB of ADPCM code) on a bare metal machine without an operating system before system optimization. This test was conducted using the SoC internal timer TIMER, see reference [1]. The test results show that the system did not meet the real-time requirements of speech before optimization.

Table 1 Encoding and decoding speed before optimization

So far, the system target code is running in SDRAM. SEP3203 provides a very useful module - on-chip high-speed memory eSRAM. eSRAM has a very fast access speed, which can reach 0.89 MIPS/MHz, so it has a great optimization effect on system performance, while SDRAM can only have about 1/3 of its performance. Table 2 is a performance comparison of SDRAM and eSRAM under the conditions of 50 MHz clock and 32-bit ARM instructions. The meaning of each indicator can be found in reference [1].

Table 2 Performance comparison between eSRAM and SDRAM

However, the 20K eSRAM resource of SEP3203 is limited, and it is impossible and unnecessary to execute all the codes in it. The ARM integrated development tool provides a profile function, which can perform statistical analysis on the entire program and obtain the percentage of the time consumed by each part of the code (mainly in units of standard C functions) in the total system time. By performing profile analysis on the software system, the percentage of each codec library function in the total codec time is obtained, and the main parts are listed in Table 3.

Table 3 Most time-consuming library functions

The above three functions take up nearly 80% of the total encoding and decoding time (Quan(), Fmult(), and Update() are used for quantization table lookup, fixed-point floating-point multiplication, and state variable update, respectively). Optimizing these codes will significantly improve the encoding and decoding speed. Integrate these function codes into the file rec_esram.c, and then load the remap.scf file for memory mapping (*.scf file is the link script file provided by the ARM ADS integrated development tool). Below is the content of the remap.scf file:
FLASH 0x30002000 0x1000000
{
FLASH 0x30002000
//System initialization entry and other code storage address
{
init_ice.o (INIT, +First)
* (+RO,+RW,+ZI)
}
32bitRAM 0x00000000 //Interrupt vector table entry address
{
boot_gfd.o (BOOT, +First)
} 
ESRAM 0x1fff0000 0x600 //Core library code storage address, in eSRAM
{
rec_esram.o (+RO,+RW,+ZI)
}
/*Stack setting part*/
}

After the memory image is created, the target code of rec_esram.c, rec_esram.o (about 1.5KB), is loaded into eSRAM (starting address 0x1fff0000) and executed. Table 4 shows the encoding and decoding speed test results after eSRAM optimization.

Table 4 Encoding and decoding speed after eSRAM optimization

The performance of the voice system was also tested with an operating system, as listed in Table 5. The operating system is ASIXOS, which is developed by the Southeast University ASIC System Engineering Technology and Research Center for embedded applications. It provides support for graphical user interface, network, clock, real-time interrupt management, and clear application development interface. The voice system is an application in the OS environment, with an independent user interface and underlying services. Due to space limitations, this article will not go into detail.

From the above tests, it can be seen that after eSRAM optimization, the encoding and decoding speed can meet the real-time needs of voice, whether on a bare metal or with an operating system, and meets the design requirements.

Table 5 Encoding and decoding speed after eSRAM optimization

Conclusion

When designing an embedded system for multimedia applications, real-time performance is very important. This paper proposes a design scheme for a voice processing system in a SoC based on the ARM7TDMI core, and optimizes the system performance based on the characteristics of the SoC with eSRAM. The test of the prototype shows that the system has an encoding rate of 19.88 KB/s and a decoding rate of 22.68 KB/s with a main frequency of 70 MHz and an operating system, which meets the real-time requirements of the voice system. Moreover, if voice processing is used as a subsystem of the prototype, its hardware design also supports MP3 playback and LCD touch screen.

The function achieves the purpose of reducing the system board area and lowering the cost of the whole machine, which is an efficient and low-cost design solution.

References
1 Ling Ming. Low-cost handheld multimedia device processor based on ARM7TDMI. Nanjing: National ASIC Engineering Center of Southeast University, 2004
2 Gou Daju, Yang Qigang. Voice recording and playback system development platform based on ADPCM coding. Journal of Sichuan University (Natural Science Edition), 1998.4, Vol.35 No.2: 178~182
3 Fu Qiuliang, Yuan Zongbao. Pure software implementation of ADPCM voice compression algorithm. Telecommunications Science, 1994.10, Vol.10 No.10: 21~24
4 Gibson Jerry D. Principles and standards of multimedia digital compression. Translated by Li Yuhui. Beijing: Electronic Industry Press, 2002
5 CCITT. Recommendation G.721: A 32kbit/s Adaptive Differential PulseCodeModulation, Red Book,1984
6 CCITT. Recommendation G.711: General Aspects of Digital Transmission Systems and Terminal Equipments, Blue Book, 1988

Keywords：ARM7TDMI Reference address：Design of SoC Voice Processing System Based on ARM7 TDMI

Previous article：Data Communication between ARM Processor and DSP under Embedded Linux
Next article：Design of SoC Voice Processing System Based on ARM7 TDMI

Popular Resources
Popular amplifiers