FPGA implementation of sound source localization system based on microphone array-EEWORLD

Collect

Abstract: This paper discusses the basic principles of sound source localization technology based on microphone arrays, and gives the design method of using FPGA to implement each module of the system. The principle and circuit implementation of the module are introduced in detail. The experimental results based on FPGA design show that the system maximizes the advantages of FPGA, simplifies system design, shortens the design cycle, and meets the design requirements.
Keywords: sound source localization; time delay estimation; FFT; CORDIC

Sound source localization, that is, determining the position of one or more sound sources in space, is a research topic with a wide range of application backgrounds. Sound source localization technology based on microphone arrays has important application value in video conferencing, sound detection, and speech enhancement.

There are currently three main types of sound source localization algorithms: the first type of algorithm is based on beamforming. This algorithm can be used to locate multiple sound sources, but it has the disadvantages of requiring prior knowledge of the sound source and background noise and being sensitive to initial values; the second type of algorithm is based on high-resolution spectrum estimation. This algorithm can theoretically effectively estimate the direction of the sound source, but it has a large amount of calculation and is not suitable for processing broadband signals such as human voices; the third type of algorithm is based on the arrival time difference method. Because this method is simple in principle, has a small amount of calculation, and is easy to implement, it has been widely used in sound source localization systems. Based on the above introduction, this article decided to choose the third type, which is the positioning method based on the arrival time difference.

The sound source localization algorithm based on arrival time difference includes two steps:
1) First, the time delay is estimated to obtain the sound arrival delay between the corresponding array element pairs in the microphone array. Commonly used methods include the least mean square adaptive filtering method, the cross-power spectrum phase method and the generalized cross-correlation function method.

2) Use time delay estimation to estimate the direction. The main methods include angle distance positioning method, spherical interpolation method, linear interpolation method and target function space search positioning method. Compared with other methods, the method based on generalized cross-correlation function has small calculation amount and high calculation efficiency. The advantages are obvious, so this method is used for time delay estimation. The direction estimation adopts the angle distance positioning method with moderate accuracy and easy implementation.
FPGA has high-speed processing capability, flexible development, easy online system upgrade, and can greatly shorten the system development cycle. For this reason, the FPGA processing device of Ahera Company is used to implement this system.

1 Basic principles and flow chart of the system
The structural flow of the algorithm is shown in Figure 1. First, the speaker's voice signal is obtained by microphones 1 and 2, and then after A/D sampling and low-pass filter, the input voice signal to be processed is finally obtained, which can be recorded as x1(n) and x2(n) respectively.

After passing through the FIR bandpass filter, x1(n) and x2(n) are windowed with a semi-overlapping Hamming window to obtain X1w(n) and X2w(n), and then the cross-power spectrum of x1(n) and x2(n) can be obtained as To

further highlight the peak, the cross-power spectrum between the microphone signals can be smoothed after weighting in the frequency domain to obtain

Where m represents the number of frames accumulated and smoothed. Next The inverse Fourier transform is obtained, and the generalized cross-correlation function between microphones 1 and 2 can be obtained as Its peak value is the time delay between microphones 1 and 2.
After obtaining the time delay between multiple pairs of microphones, the sound source position can be obtained by the angle distance positioning method. 2 Design and implementation of each module

2.1 FIR bandpass filter module
In order to eliminate the influence of noise and echo interference, filtering is required first. The bandwidth of the speech signal is 0.3~3.4 kHz, so a bandpass filter needs to be designed to filter out the noise outside the bandwidth of the speech signal. In order to keep the phase of the processed signal unchanged, that is, to maintain the linear phase, an FIR filter is required.

Here, the Chebyshev approximation method is used to obtain the filter coefficients using the Matlab filter design tool, which are multiplied by 1024 for quantization and converted into CSD coding to improve its operating efficiency. Finally, it is implemented by Verilog code.
2.2 Semi-overlapping Hamming window module

In order to ensure the stability of speech signals, the time window length of a frame signal is selected to be 10 to 30 ms. The sampling frequency is 10 kHz, and in order to facilitate FFT processing, 25.6 ms is selected, that is, the frame length is 256 points. In order to ensure the continuity of statistical features and obtain better speech processing effects, 50% overlap is performed between frames, that is, only 12.8 ms of data is updated each time. In this way, the signal within a frame can be approximately considered to be stable.

Framing is achieved by weighting with a movable finite-length window, which is to multiply a window function w(n) by s(n) to form a windowed speech signal sw(n)=s(n)×w(n), where the value of the window function is stored in the internal storage resource. Commonly used window functions include Hamming window and rectangular window. The Hamming window has a better smoothing effect than the rectangular window, so the Hamming window is selected. Its expression is shown in formula (5)

Where N is the frame length.

[page]
2.3 FFT operation module
Since the voice signal is continuously sampled in real time, in order to process the incoming voice signal continuously, a ping-pong structure is used here, that is, two dual-port RAMs that can store one frame of data are used. When the first RAM stores new data, the second RAM performs FFT operation and stores its results. Then, the first RAM performs FFT operation and stores its results, and the second RAM stores new data, thus ensuring the continuity of signal processing.

During ping-pong storage, the reverse address module generates reverse storage addresses, so that the data stored in RAM is in reverse order, preparing for FFT operation. In order to speed up the operation, the butterfly operation rotation factor is first generated by Matlab software, quantized into a 12-bit signed number, and then stored in the internal ROM.

The entire FFT operation unit is designed by a state machine and consists of 5 states in total; the S1 state outputs the address of the first operand; the S2 state obtains the first operand and outputs the address of the second operand; the S3 state obtains the second operand and calculates the first result; the S4 state stores the first result and calculates the second result; the S5 state stores the second result and generates the next level operation address.

2.4 Cross-power spectrum module of this frame
The cross-power spectrum of this frame is obtained by multiplying the conjugate of the FFT result of the first signal and the FFT result of the second signal.
If the first signal is r1+i1 and the second signal is r2+i2, their conjugate is r2-i2. When multiplying, the calculation method shown in formula (6) and formula (7) can be used. This can reduce one multiplication operation and save internal resources

Where R and I are the real and imaginary parts of the cross-power spectrum of this frame; r1 and r2 are the real parts of the FFT results; i1 and i2 are the imaginary parts of the FFT results.

2.5 Frequency Domain Weighting Module
The cross power spectrum of this frame is multiplied by the weighting function stored in ROM to make the peak of the cross correlation function more prominent. This can be done by calling the internal multiplier module.

2.6 Power spectrum smoothing module
For the weighted module results, the cross-power spectrum is smoothed by accumulating several consecutive frames to make the peak easier to detect. This can be done by calling the internal adder module.

2.7 Inverse FFT module
Perform inverse FFT operation on the smoothed result to obtain the cross-correlation function. According to the FFT principle, the inverse FFT operation can be calculated with the help of the FFT module. That is, take the inverse of the rotation factor in the FFT operation, and multiply the final output by 1/N to calculate the inverse FFT. To prevent overflow during the operation, 1/N can be assigned to each level of butterfly operation. Since 1/N=(1/2)M, each butterfly output branch of each level has a multiplication factor of 1/2, that is, shift it right by one position.
2.8 Peak detection module
Taking the modulus of the FFT result is to find the value of, and then find the peak value of its modulus value, that is, the corresponding voice signal delay value.
2.9 Positioning algorithm module
According to the angle distance positioning method, the horizontal angle θazimuth of the sound source relative to the origin is

Where a is the distance between the microphones; d is the distance difference between the sound source and the microphone pair.
The elevation angle φelevation of the sound source relative to the origin is

Where a is the distance between the microphones; d is the distance difference between the sound source and the microphone pair.
From the above, we can know that we need to calculate the value of the arccosine function to determine the corresponding angle value. The arccosine function is a transcendental function. The Taylor series can be used to approximate this function, but it is more troublesome and the accuracy is not high. The CORDIC algorithm consists of shift and addition and subtraction operations, so it is more suitable for FPGA implementation, with faster speed and higher iteration accuracy. This system uses a high-speed 9-stage pipeline structure to implement the CORDIC algorithm. The iteration relationship is as follows:

Formula (10) is the initial condition of the iteration, Formula (11) is used to determine the direction of the next iteration based on the current coordinate value, and Formula (12) to Formula (14) are the next iteration formulas.
After several iterations of the algorithm, the value of θ is *****. In practice, using 9-stage iterations can obtain 7-bit accuracy. The minimum angle accuracy is 0.111 905.

3 Module simulation and synthesis report
Quartus II is a comprehensive PLD software developed by Altera Corporation. It has an embedded synthesizer and simulator, which can complete the complete PLD design process from design input to hardware configuration. It also has the characteristics of fast running speed, unified interface, concentrated functions, and easy to learn and use.

This design uses Quartus II8.0 to simulate and verify each module. Through simulation, it is verified that each system module can work accurately and complete the entire design function. The EP2C35F484C8 device in the Cyclone II series of Altera is selected to perform timing simulation on the entire program. Its main resource consumption is: total logic units are 3740/3 3216, total storage units are 74 240/483 840, and total pins are 387/475. The total multiplier is 16/35. The simulation results show that the implementation method described in this article is feasible and can achieve good performance. Its maximum rate can reach 87.3 MHz, which fully meets the system requirements.

4 Conclusion
This design uses FPGA to implement the entire system, making full use of the advantages of Altera's FPGA products such as high-speed, large-capacity, flexible and convenient development, and using the library resources provided by the QuartusⅡ development environment. At the same time, it maximizes the use and play of the advantages of FPGA, thereby simplifying the system design and shortening the design cycle.

Reference address：FPGA implementation of sound source localization system based on microphone array

Previous article：FPGA Implementation of WIMAX LDPC Code Decoder
Next article：FPGA Implementation of FSK/PSK Modulation

Recommended ReadingLatest update time:2024-11-16 17:57

Implementing FFT Algorithm Using FFT IP Core

Abstract: Combined with engineering practice, this paper introduces a method of implementing FFT using FFT IP Core. The design can perform 256-point FFT operations on two real number sequences at the same time, and perform modulus square operations on the conversion results, and has the ability to process data con

[Embedded]