0 Introduction
In the processing of broadband radar signals, there are a series of problems such as high echo sampling rate, large amount of pulse compression (matched filtering) calculation, complex processing flow, and difficulty in real-time high-resolution target detection. In view of these problems, it is difficult to cope with the high requirements of large amount of calculation and real-time performance by using a general-purpose computer platform. Therefore, a dedicated digital signal processor (DSP) is required for high-speed calculation. Although the current digital signal processor has reached a high level, the processing power of a single DSP chip still cannot meet the performance requirements of broadband radar. It is necessary to introduce parallel processing technology. In this design, four DSP chips are used to form a parallel processing system. In addition, in order to give full play to the advantages of DSP chips in complex algorithm processing and the advantages of FPGA in the underlying algorithm of large data volume, a multi-DSP parallel processing system based on FPGA control is designed.
1 System Design
The schematic diagram of the multi-DSP parallel processing system based on FPGA control is shown in Figure 1.
The entire radar signal processing system is based on a high-reliability CPCI industrial computer, with built-in signal processing boards with different functions. Data transmission between boards is completed through the CPCI interface. According to the task allocation of the radar signal processing system, this system is responsible for completing the processing of intermediate frequency digital signals. According to the different output data of the front-end signal acquisition board, the data will be transmitted to this system in serial or parallel mode. Among them, the serial signal is directly transmitted to DSP2 in a differential form through the J3 port of CPCI, and then the task allocation and parallel processing are carried out among the four DSP chips according to the predetermined algorithm. After the processing is completed, it is written into the FIFO of the two expansions connected into a 32-output mode through DSP4. At this time, FPGA directly reads data from FIFO, completes the timing conversion with the CPCI interface chip PCI9656, and sends the data to PCI9656. It is transmitted to other functional modules of the radar system through the CPCI bus through J1 and J2 ports. For parallel signals, the 32-bit bandwidth signal is first sent to the internal register of F-PGA through J3 port. After receiving the data, FPGA writes the data into the input buffer area and outputs an interrupt to the parallel DSP after completing one frame. When the parallel DSP samples an interrupt, it reads data from the data buffer and transfers the data to the buffer after processing. The FPGA then uses the same processing method to transfer the data to other functional modules of the radar system through the J1 and J2 ports of the CPCI interface. [page]
2 DSP chip selection
According to the performance requirements of the system, by comparing various high-performance DSP processors and focusing on the performance and convenience of forming a parallel processing system, it is determined to use TS201S from AD's ADSP Tiger SHARC series processors to form a multi-DSP parallel system. Because this series of processors itself provides the on-chip bus arbitration control and unique link ports required for interconnection when forming a parallel processing system, DSPs can be interconnected in various topological structures to meet the requirements of large computing volume and flexible inter-chip communication. In addition, the selection of ADSP Tiger SHARC can also reduce the complexity of peripheral design and enhance the stability of the system.
The main performance indicators of the TS201S chip (600 MHz) are as follows:
(1) Running speed: 1.67 ns instruction cycle; 4 instructions can be executed per cycle;
(2) There are 2 operation modules inside the DSP, and the supported operation types are: 32 b and 40 b floating-point operations; 8 b, 16 b, 32 b and 64 b fixed-point operations;
(3) It can execute 12×109 16 b fixed-point operations or 3.6×109 floating-point operations per second;
(4) Using the single instruction multiple data (SIMD) mode, it can provide 4.8×109 40 b multiplication and addition operations per second;
(5) External bus DMA transfer rate 1.2 GB/s (bidirectional);
(6) 4 link ports, each link port provides a maximum transfer rate of 1.2 GB/s, and DMA transfers can be performed simultaneously;
(7) Multi-processor processing capability, with on-chip arbitration logic that supports seamless connection of multiple processors. Multiple processors are accessed using a unified addressing method and can be accessed through the cluster bus (ClusterBus) or link port (Link Ports) can be conveniently used to form a multi-processor system.
(8) On-chip SDRAM controller, on-chip DMA controller (providing 14 DMA channels).
3 Design of DSP parallel processing structure
There are two optional modes for data transmission channels between ADSP-TS201S: high-speed link port (LINK) mode and high-speed external bus port (cluster bus). Therefore, from the perspective of data transmission mode, the DSP parallel processing system composed of multiple ADSP-TS201S can be divided into the following three models: high-speed link port (LINK) coupling model; high-speed external bus port (cluster bus) coupling model; high-speed link port (LINK) and high-speed external bus port (cluster bus) mixed coupling model.
3.1 Multi-DSP parallel processing system based on link port
In this connection mode, each DSP is connected together with a LINK port for communication control and data exchange. The system structure is simple, with few connections and strong scalability. When the DSP has multiple LINK ports, it can flexibly form a variety of topological structures such as linear, star, ring, network or hypercube. ADSP-TS201S has 4 full-duplex link port communication ports. A link port unidirectional communication includes 4 bits of data plus clock and handshake signals, a total of 12 leads, and a bidirectional communication requires a total of 24 leads. When the core clock is 600 MHz, the unidirectional data transmission rate can reach up to 600 MB/s, and the bidirectional data transmission rate can reach 1.2 GB/s. Since the link port communication is point-to-point, it has high transmission reliability, but the sharing of data transmission is not as good as the bus form.
3.2 Multi-DSP parallel processing system based on shared bus
The shared bus is that the external buses (address, data and access control buses) of all DSPs in the system are directly connected together, and the internal memory and registers of each DSP and the external memory and peripherals attached to the bus are accessed by each DSP as shared resources. The external bus of ADSP-TS201S is 32b, and the data bus can be configured as 32b or 64b. The operating speed of the external port can reach up to 125MHz, and the data throughput can reach up to 1GB/s. In order to connect with different external devices, the external port of ADSP-TS201S supports fast (pipeline), slow and SDRAM protocols. And it supports data transmission in DMA mode. In addition, the biggest feature of the ADSP-TS201S parallel bus is that it has seamless connection capability. Whether it is connected to SRAM, SDRAM, or processor, you only need to connect the corresponding pins to easily and conveniently form a multi-processor system consisting of up to 8 DSPs, fully sharing the internal resources of 8 DSPs and external EPR-OM, SRAM, SDRAM and other resources.
3.3 Multi-DSP parallel processing system based on external bus sharing and link port hybrid coupling
In order to take into account data rate, resource sharing, easy control and flexible communication between DSPs, a parallel processing system of hybrid coupling model is adopted in this design. The bus ports of the four ADSP-TS201S are connected to each other, and the high-speed link ports of each DSP are also connected to establish a point-to-point channel from DSP to DSP and a working block mode of resource sharing between DSPs. Among the four SDRAMs, every two pieces are extended to 64 bits and attached to the 64-bit data bus. The two FLASHs are also accessed through the bus. The control bus is connected to the FPGA, and the FPGA uniformly controls the data transmission between the four DSPs and between the DSP and the external memory. The connection mode of the working blocks of the four DSPs is shown in Figure 2.
[page]
4 FPGA and Peripheral Interface Design4.1 FPGA Selection
Field Programmable Gate Array (FPGA) is developed on the basis of dedicated ASIC, which overcomes the shortcomings of dedicated ASIC that are not flexible enough. The specific logic functions inside it can be configured as needed, and it is very convenient to modify and maintain the circuit. At present, the capacity of FPGA has exceeded one million gates, making FPGA one of the important options for solving system-level design. Now FPGA has become a powerful solution for a variety of digital signal processing applications. Due to the flexibility of programmable solutions, DSP system design can adapt to the ever-changing standards, protocols and performance requirements. The Vir-tex-5 series is the newest and most powerful FPGA on the market. It uses a 65 nm chip manufacturing process and has advanced high performance and an FPGA structure that is ideal for applications. The main performance indicators are as follows:
(1) Powerful clock management capability;
(2) On-chip integration of up to 36 Kb of block RAM and FIFO memory resources;
(3) High-performance parallel Select I/O technology and advanced DSP48Eslice;
(4) Flexible loading and configuration schemes and system monitoring capabilities on all devices;
(5) Integrated 100 Mb/s ~ 3.75 Gb/s Rocket I/O GTP transceiver, 150 Mb/s ~ 6.5 Gb/s Rocket I/O GTX transceiver;
(6) Powerful on-chip microprocessor PowerPC440.
Comprehensive processing board functional requirements, performance analysis, system compatibility and I/O pin requirements, the FPGA selected Xilinx's Vir-tex-5 series XC5VSX50TFF1136 chip.
4.2 FPGA Design
According to the system functional requirements, the tasks of the FPGA are mainly divided into four parts.
(1) Control data transmission logic in the system
During the design, all signals in the control bus of Figure 2 are connected to the FPGA, and the FPGA is used to uniformly schedule data transmission between DSPs and between DSPs and external memory. This provides the greatest degree of simplicity in the processing algorithm for the allocation of task parallel processing and pipeline processing of radar signals, and can give full play to the computing power of DSP in processing complex algorithms.
(2) Control the writing and reading of data in the data buffer (FIFC), and control the data transmission between DSP and FPGA through external interrupt IRQ
Since the external 4 FIFOs are expanded into 32-bit output/input mode every two pieces, the FPGA and FIFO use a unidirectional data transmission mode for data transmission. In unidirectional data transmission, data block transmission is used. By connecting the handshake signal to the DSP's IRQx to generate an interrupt or FLAGx, the FPGA writes the data received from the external processing board into the input buffer and outputs an interrupt to the parallel DSP after completing a frame. After the DSP reads a frame of data from the FIFO, it informs the FPGA through the handshake signal that it can transmit the next frame of data.
(3) Control the communication between LINK port and DSP Link
port communication has its own communication protocol, and FPGA circuit only needs to be designed according to the communication protocol of link port. The link port of ADSP-TS201S uses independent sending and receiving channels, so the corresponding FPGA also uses different receiving circuits and sending circuits. The FPGA receiving or sending DSP link port logic circuit is mainly composed of two parts: receiving/transmitting module and receiving buffer/transmitting buffer. The receiving module is used to interface with the DSP link port sending channel and unpack data, and the sending module is used to connect with the DSP link port receiving channel and pack data; the receiving buffer/transmitting buffer is used to cooperate with the receiving module and the sending module to transmit as a data buffer, and realize the data transmission function of the interface with other interfaces in the system or other modules in the FPGA.
(4) Control the data transmission between the CPCI interface module and the CPCI bus .
The CPCI interface module is composed of PCI9656, and an independent functional module is defined in the FPGA as an interface controller to implement the CPCI bus protocol. The controller mainly contains a FIFO control logic to complete the data transmission between the local board and the CPCI bus. The main functions are as follows: cooperate with PCI9656 to realize the reading and writing of CPCI bus to the target device, buffer the data transmitted between CPCI bus and FIFO, and control the reading and writing of FIFO. Local reading and writing of CPCI bus only needs to read and write FIFO.
4.3 CPCI transmission interface design
In order to ensure the data transmission rate and efficiency between this system and other processing systems on the back panel, PCI9656 is used as CPCI interface chip in the design. PCI9656 is a special I/O accelerator that supports CPCI format transmission. The data transmission clock frequency is up to 66 MHz and the data transmission bandwidth is 64 b. Its peak transmission rate can reach 528 MB/s. From the system block diagram, it can be seen that the four interfaces of CPCI J1, J2, J3, and J4 are used in the design. According to the CPCI transmission protocol, J1 and J2 are 64-bit PCI data transmission interfaces. J3 and J4 are custom interfaces. In the design, J3 is defined as the data transmission interface between the processing board and the back panel, and J4 is the data transmission interface between the upper and lower processing boards.
4.4 External device interface design
The memory resources connected to this system through a common bus include: 4 SDRAMs for extended applications, 2 FLASHs, 2 pairs of FIFOs for extended applications, and DSP on-chip memory resources. All memory resources are distinguished by a unified address space mapping. The 32-bit address bus of ADSP-TS201S provides up to 4 GB of addressing space, which can be divided into 4 parts:
(1) Host addressing space. The address mapping range is 0X80000000~0XFFFFFFFF, which is used for the address mapping space of the off-chip host interface.
(2) External memory block space. The address mapping range is 0X30000000~0X7FFFFFFF, which is used for the processor peripheral device memory interface address space mapping, including general memory devices and SDRAM memory. The design mainly divides this space to allocate a separate and unique address space for the external memory.
(3) Multi-processor space. The address mapping range is 0XOC000000~0X2FFFFFFF, which is mainly used for the internal memory space mapping shared by each processor in a multi-processor system.
(4) On-chip storage space. The address mapping range is 0X00000000~0X03FFFFFF, which defines the internal memory space mapping.
The external memory can be divided into SDRAM addressing space and external general storage space. In the design, the extended SDRAM will be allocated to occupy the SDRAM addressing space, while the external FLASH and FIFO will be allocated to occupy the general storage space.
Every two SDRAM expansion connections are in 64-bit form. The design uses MSSD0 and MSSD1 as the control signals of the common chip select signals of every two SDRAMs, and the corresponding SDRAM addressing spaces are 0X4000 0000~0X44000000 and 0X50000000~OX54000000, which can respectively obtain a memory addressing range of 128 MB to meet the SDRAM addressing requirements.
The addressing space of the two external FLASH chips is divided by the two groups of signals MS0_AB and BMS_AB and MS0_CD and BMS_CD as chip select signals, and the addressing space is allocated to 0X30000000~0X34 000000 and 0X34000000~0X348000000, and the addressing space range is 128 MB.
The external 4 FIFO chips, each two chips are extended to form a 32-bit output/input mode. When performing address mapping, they can actually be mapped to an address addressing space, and the read and write signals are controlled to distinguish them. The MSl signal is used as the FIFO enable signal to obtain the allocated addressing space 0X38000000~0X40000000. In order to facilitate logic control, the MS1 signal pin is connected to the FPGA, and the addressing control of the FIFO is obtained through the logic decoding of the FPGA.
In addition, the high eight-bit address lines of ADSP TS201S are also connected to FPGA, and a more detailed address division scheme can be further obtained through logic decoding, which brings more flexibility to the design while also ensuring the reliability of the design.
[page]
5 System software design
Since the system hardware is based on the DSP+FPGA structure, the corresponding software is also divided into two functional modules. The FPGA mainly completes the data transmission logic control of the entire system, so the specific processing flow of the FPGA is nested in the signal processing flow of the DSP. The four DSPs mainly complete the signal processing, and the general system design process is shown in Figure 3.
When the four DSPs work in parallel, the bus arbitration strategy designates DSP1 as the main processor, which completes the system initialization, data program configuration, communication with the CPCI industrial computer host, and participates in the operation. When the system receives data, it first determines the transmission mode of the signal. If it is a parallel signal, the FPGA will process it accordingly and write it into the FIFO, and notify DPS-1 through an external interrupt. Then DSP1 issues an interrupt request, and the bus arbitration orders DSP1 to obtain the bus control right, read the data in the FIFO and transfer it to the public storage area; then DSP1 communicates with other DSPs through the LINK port to assign tasks, and other DSPs circulate to obtain the bus control right in turn, read the data for processing and then store it in the storage area; finally, DSP4 controls the writing of the data in the storage area into FIF-O, and notifies FPGA to read and complete the timing conversion and then transmit it to PCI9656, which transmits the data to the CPCI bus to complete the processing of this frame of data. If the data received by the system is a serial signal transmitted from port J3, DSP2 will first issue an interrupt request, and the bus arbitration will allow DSP2 to obtain bus control and transfer the received data to the public storage area; then the task allocation and processing will be carried out among the four DSPs in the same way, and DSP4 will write to FIFO, and finally FPGA and PCI9656 will jointly transfer the data to CPCI bus to complete the processing of serial signals.
6 Conclusion
This paper introduces the design of a multi-DSP parallel processing system based on PFGA, focusing on the analysis of DSP parallel structure design, and introducing FPGA design and external device interface design. Practical application shows that when the multi-DSP parallel processing system is applied to broadband radar signal processing, it can meet various indicators in the task, complete other functions outside the design, and is easy to control, stable and reliable. The system design scheme provided in this paper can provide a certain reference for other researchers who process broadband radar signals.
Previous article:Design of displacement measurement device based on FPGA and single chip microcomputer
Next article:Research on RF based on AT88RF256
Recommended ReadingLatest update time:2024-11-16 19:44
- Popular Resources
- Popular amplifiers
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- [McQueen Trial] The fourth post is delayed - Try the Arduino IDE development environment & timer interrupt
- [Zero-knowledge ESP8266 tutorial] Quick Start 8-Reading analog values
- Invitation | Visit the Avnet Artificial Intelligence Cloud Exhibition, read useful information online, and win gifts!
- In this circuit, why does the PMOS turn on as soon as it is powered on?
- !!! Help "Does anyone know how to solder SIM900A on the perf board?"
- Watch Shuige's video for a reward | How to save oscilloscope data and pictures to a remote PC
- LOTO virtual oscilloscope software function demonstration - FIR digital filtering
- How to compile the image for SINA33 development board
- CalcToolBox 2 based on micropython
- DIY handheld computer with ESP32 and Raspberry Pi