Development of high-efficiency multi-channel voice recording system-EEWORLD

Collect

Abstract: This paper introduces a voice processing card that uses a TMS320C542 to achieve 4-channel telephone compression. The processing card hardware structure, RPE-LTP algorithm and DSP implementation are described. Finally, a voice recording system based on the processing card is proposed.

Keywords: Voice recorder, voice codec, digital signal processor (DSP), regular pulse excitation-long-term linear prediction (RPE-LEP), ADPCM algorithm

Multi-channel voice recording and playback systems have a wide range of application backgrounds and are mainly used in areas such as voice recording, voice files, and digital voice storage. With the continuous improvement of computer cost performance and the popularization of applications, voice systems managed by computers are receiving more and more attention.

In the past, traditional multi-channel voice recording systems generally used ADPCM proposed in Recommendation ITU-G.726 as the voice compression algorithm due to limitations in DSP computing power and price. This algorithm requires less calculations and has a minimum code rate of 16kb/s, but the language quality is poor at low code rates. In recent years, the cost performance of DSP has been continuously improved, which enables better-performing speech compression algorithms to be applied to multi-channel speech compression systems. This project uses the DSP chip TMS320C542PGE-40 from Texas Instruments (TI) in the United States to complete the regular pulse excitation-long-term linear prediction (RPE-LTP) algorithm used in the GSM system and realize a highly efficient multi-channel voice recording system. .

This system has application-oriented configuration and flexible expansion interfaces. Depending on the configuration, the system can support 4 to 8 channels of voice processing. The system uses ISA bus to interface with PC. In order to enhance the business expansion capability, an external interface is reserved on the card, which can be used for function expansion. Supports 4-way voice processing without extension part.

1 Hardware structure

The hardware block diagram is shown in Figure 1. The system hardware includes five parts: 'C542, extended RAM, ISA bus interface, analog interface, and extended interface (the part in the dotted box). Among them, the analog interface (ADC) is TMS320AC01 and the extended RAM is TC55B8016, both of which are TI products. The FIFO is IDT7201 from IDT.

'C542 is the core component of the system. Its function is to manage the underlying hardware and compress voice data. 'C542 is a new generation product for mobile communications launched by TI in 1995. It adopts an improved Harvard architecture and integrates many parallel processing units, making it particularly suitable for completing high-complexity algorithms. The 'C542 used in this system has a computing power of 40MIPS. 'C542 also has several efficient on-chip peripherals: a 64K×16 parallel port, 2 synchronous serial ports, an 8-bit host interface (HPI) and a timer. It can respond to 5 interrupts and has bus suspend function. 'C542's rich on-chip and on-chip peripherals greatly reduce the need for off-chip circuitry.

The analog interface consists of four AC01s connected in a master-slave manner. These four ADCs are connected to the serial port 0 of the C542, as shown in Figure 2 . Because the four input circuits are the same, only one is completely drawn. At this time, the synchronous serial port of 'C542 works in trigger mode, that is, the data transmission is actively triggered by 'AC01. The four 'AC01s communicate with the serial port in a time division multiplexed manner via FSD signals (similar to batons). First, the master 'AC01 sends out a frame synchronization pulse, transmitting one sample point. After the 16-bit information is transmitted serially, the master'AC01 notifies the first slave'AC01 to start transmitting data through the FSD signal, and triggers frame synchronization at the same time. After the first slave 'AC01 transmits a sample value, it notifies the second slave 'AC01, and so on. After all four ADCs have transmitted the first sample value, the main AC01 transmits the second sample value, and so on. In the 16-bit information passed into 'C542, the high 14 bits are the sample value in 2's complement form. When the lower 2 bits are 00, it means that the value is passed in from the master'AC01, and when it is 01, it means that the value is passed in from the slave'AC01. The DSP program distinguishes each channel of voice data based on this point. It can be seen from Figure 2 that the receiving and transmitting frame synchronization signals (FSR, FSX) of the C542 serial port are connected together. Therefore, when the C542 receives a data, it also sends a data to the AC01 that is sending the data at that time. 'AC01 converts the received data into analog quantities through its DA part.

'The circuit on the front end of AC01 is an input shaping circuit. Vi first passes through a low-pass filter, and then is decomposed into two differential components in+ and in- by the arithmetic circuit through the voltage follower. Among them, in+=Vm+Vi/2; in-=Vm-Vi/2. 'AC01 works at +5V, and Vm is the median voltage it provides: Vm=2.5V. At the input front end Vi has been limited to the range of ±5V.

In the entire system, the PC serves as a data storage and human-machine exchange information window. Its role is to store data and manage the operation of the entire plug-in card. During the initialization phase of the system, the PC also loads programs for the DSP. The interface part between PC and 'C542 consists of ISA bus and 'C542's host interface HPI. HPI is an on-chip peripheral of 'C542. It provides a window for a 2K-word RAM inside 'C542. The host can directly access this RAM through this window. It can be seen that the function of this 2K word RAM is equivalent to the dual-port RAM expanded by the traditional off-chip chip. Because HPI's RAM is within the DSP chip, it can provide higher read and write efficiency. At the same time, the host interface, as a component of 'C542, can also be used to interrupt the host and accept host interrupts. 'C542 also has HPI bootload mode. In this system, the host first loads the program into the C542 using the C542's HPI bootload method and lets it run. Then it uses the 160 words in the HPI RAM as a buffer to exchange data with the C542.

With the above three basic parts, this system already has the ability to record and play 4 channels of voice at the same time. We note that at this time the potential of the 'C542 has not been fully realized. In fact, 4-channel GSM codec (this is not commonly used) only uses about 20MIPS of computing power, half of its computing power. In addition, the on-chip peripherals such as parallel port, serial port 1 and timer are not used. The part in the dotted box in Figure 1 is used to expand this part of the resource.

The extended RAM is 16K words of SRAM. It hangs directly on the address and data bus of 'C542, and can achieve 0 wait cycle reading and writing. The expansion interface is a way for off-board circuits to access on-board resources (including parallel port, serial port 1, timer, interrupt, etc.). Using these two components can further improve the card's ability to process voice; it can compress 8 channels of voice at the same time, or it can also complete functions such as data movement, dumping, and analysis.

2 RPL-LTP algorithm

The speech coding algorithm we use is the RPE-LTP method, and the specific algorithm is specified in ETSI-G.06.10. It requires a sampling rate of 8kHz, a sampling accuracy of 13 bits, the original bit rate of the algorithm is 104kb/s, the encoding bitrate is 13kb/s, the MOS is divided into 3.6, and the encoding and decoding delay is. This algorithm has very good voice playback quality compared with the ADPCM algorithm used in traditional multi-channel voice recording systems. In ITU-G.726, the ADPCM algorithm has a MOS score of 4 when the code rate is 32kb/s; a MOS score of 3.2 when the code rate is 24kb/s; and a MOS score of only 2 when the code rate is 16kb/s. Correspondingly, this algorithm is more complex than the ADPCM algorithm. It is a hybrid coding: it not only uses the correlation of the speech signal for parameter coding, but also uses the amplitude characteristics of the excitation source signal for waveform coding. In addition, the auditory characteristics of the human ear are also used to further eliminate subjective redundancy in the speech signal. In our implementation, its computational complexity is 4.7M, IPS (codec), and the program and data storage spaces used are 2K and 1.2K respectively. Simply looking at computing power, a single chip 'C542 can compress nearly 8 channels of voice. Therefore, it can be predicted that in the design of multi-channel voice recording systems, as the cost performance of DSP continues to improve, some compression algorithms with low bit rate and good voice quality will gradually replace the ADPCM algorithm.

The flow of RPE-LTP algorithm is shown in Figure 3.

3 Software design

The software part includes DSP programming and PC programming. The main tasks of DSP programming are initialization, management of resources on the board and completion of voice encoding and decoding algorithms. PC programming focuses on managing DSP operations and writing application layer software. The software interface between DSP and PC is a set of customized "communication protocols".

The DSP program first initializes the 'C542 and analog interface, allowing the four 'AC01s to transmit four-channel voice samples in a time division multiplexing manner. In the interruption after 20ms of a frame of speech, the service program first separates the 4 channels of speech data, stores them in four buffers respectively, and then calls the encoding program to encode them in sequence. The encoding program reserves a buffer for each channel of voice data to save its intermediate results for use in the next frame. After the encoding is completed, the program writes the compressed code stream into the HOI RAM and interrupts the host. The host reads the code stream and stores it. The decoding process is relatively simple. The host writes the code stream into the HPI RAM frame by frame. After decoding, the C542 puts it into the output buffer. The synchronous serial port transmits the data of the frame to the AC01 for playback at the sampling rate in turn. After one frame is transmitted, the interrupt service routine of 'C542 notifies the host to transmit the next frame of code stream.

The presence or absence of multi-channel voices in the system is random, so a voice presence detection unit is added before the compression algorithm. There are two options, one is the manual method, that is, the program checks a hardware status bit, which is triggered by manual intervention (such as off-hook); the other method is to use the program to detect voice activity, which uses voice distinction Detect the presence of speech based on the nature of the noise. The specific algorithm is specified in ETSI-G.06.32. The header of each frame's sample and code stream indicates whether the frame exists.

The programming on the PC side includes the DSP interface part and the application layer programming part. At the beginning of the PC program, the program that interfaces with the DSP first calls the initialization function to download the DSP program to the DSP. Since the HPI RAM is only 2k words and the DSP program is larger than 2k words. Therefore, the initialization program must first load a small bootloader program into the DSP, and then load the entire program into the DSP through the bootloader program piece by piece. After the initialization is completed, the program that interfaces with the DSP reads the DSP processing result frame or the DSP request frame at the designated location according to the customized "communication protocol" and hands it to the upper layer (application program) for processing. The application program also issues various commands to the DSP through some programs that interface with the DSP.

The upper-layer application is the interface for users to use the system. It provides two functions: voice database management and system management. Voice database management includes voice entry, classification, monitoring, playback, etc. Due to limited hard disk space, the voice database must be backed up and cleaned regularly. This part of the function is the focus of the system, which should enable users to manage information in the most convenient way. System management includes setting and reading system status. Its goal is to allow users to effectively control the operation of the system and obtain the operating status of the system in a timely manner.

4 Voice recording system based on this processing card

In certain key positions, such as the command rooms of factories and mines, the command rooms of ocean-going ships, and audio service desks, in order to track accidents and improve the sense of responsibility of staff, this card can be used to record their voices. Utilizing the remaining processing power of the 'C542 in the card can provide functions such as speaker recognition and data analysis. In addition, the expansion interface of the card can also be used to complete the data migration function, such as storing the global positioning data (GPS) of ocean ships or field headquarters into a microcomputer for processing.

Figure 4 is a practical processing card. The card can record four-way telephone information and incoming and outgoing phone numbers at the same time. FIFO is the buffer when playing back speech. At the same time, the RS232 circuit is used to transmit local information or receive serial signals.

To sum up, this processing card is a voice recording and management system built on a PC. The system design adopts a modular approach, and users can customize the system according to their needs. Since the DSP program is downloaded from the host, the system is easy to upgrade and add functions. Therefore, it has broad application prospects.

Reference address：Development of high-efficiency multi-channel voice recording system

Previous article：ITU-TG.729 algorithm and its real-time implementation
Next article：High-precision ultrasonic ranging system based on transducer deconvolution