Abstract:
This paper introduces a design method of a general embedded image processing system. The system uses FPGA to design FIFO to realize high-speed data transmission between ARM and multiple DSPs. Experimental results show that the designed real-time embedded image processing system with multiple DSPs working together has stable working performance and strong data processing capability, and is suitable for high-end radar signal processing, electronic countermeasures, ultrasonic image processing and other occasions.
Keywords:
ARM11; TMS320C6416T; FIFO; FPGA; multi-DSP embedded system
Real-time image processing and high-speed data computing require that the system has fast data processing speed, high data throughput and multi-task processing capabilities. At present, most solutions use HPI data transmission to combine ARM and DSP to complete some image processing. The DSP processor only completes simple processing tasks such as image acquisition, compression, and encoding [1], which cannot meet the requirements of real-time intelligent recognition or video processing with large amounts of data. In terms of application areas, it is also subject to certain restrictions due to its processing speed.
For example, in the fetal gender masking project, it is necessary to detect and mask the gender parts of the real-time video. If a single DSP is used, there will be frame leakage or video stagnation; when applied to high-speed moving object tracking, a single DSP cannot achieve real-time moving object tracking. For example, Hanwang Technology's motion detection and Hikvision's motion detection cannot detect in real time. Even if they detect, there will be missed detection and video stagnation. When processing 4CIF or larger images, the processing power of a single DSP will decrease. Although the image can be reduced for processing, some important image information will be lost when the image is reduced, which reduces the accuracy of intelligent recognition.
In view of the above situation, it is necessary to design a real-time image processing system that can realize fast signal processing and data exchange.
1 System Structure
1.1 Structure
System functions: Use
S3C6410
for data integration, task scheduling, and human-computer interaction; use TMS320C6416 for algorithm calculation; each DSP is seamlessly connected to FPGA. In the design, FIFO implemented in FPGA is used for high-speed data transmission between DSP and ARM for task scheduling of multiple DSPs.
The system structure is shown in Figure 1. The system is an ARM+multi-DSP embedded image processing system that is interconnected by 1 ARM11 processor S3C6410 (master processor) and 4 TMS320C6416 (720 Hz) (slave DSP) through FPGA (
EP2C70
~7). All DSPs are seamlessly connected to FPGA through external memory interface (EMIF), and data transmission between each DSP is realized through the internal interconnection FIFO network of FPGA.
Figure 2 shows an interconnected FIFO network structure and a high-speed data transmission network structure. The main processor is connected to the dual-port FIFO of the FPGA through the DMA data access mode, thereby communicating with all slave DSPs connected to the FPGA through the FIFO. All FIFOs are bidirectional, and the FIFO and its read and write control logic are implemented inside the FPGA.
The FIFO read and write status control in the FPGA, the synchronous handshake signal for communication between the slave DSPs, the S3C6410 processor data request and other logical signals are all realized by connecting a part of the GPIO port of each DSP to the I/O port of the EP2C70.
1.2 Features
The system structure has the characteristics of reconfigurability. Under the condition that the hardware platform remains unchanged, the system structure can be completely changed by simply changing the FPGA program code to adapt to different algorithm structures. As shown in Figure 2, by shielding the mutual communication between DSP1~DSP4, a master-slave parallel pipeline structure can be formed; if a serial pipeline structure is required, only one of DSP1~DSP4 needs to communicate with S3C6410; if a more complex serial-parallel hybrid structure needs to be designed, it can be easily realized by simply changing the FPGA code.
2 Implementation of DMA of S3C6410 and Soft FIFO Interface
of FPGA
2.1 Introduction of S3C6410
S3C6410 is a product of Samsung Company. It adopts ARM1176JZF-S core, including 16 KB instruction data cache and 16 KB instruction number TCM. When the ARM Core voltage is 1.1 V, it can run at 553 MHz. When it is 1.2 V, it can run at 667 MHz. It is connected to external modules through the 64/32 bit internal bus composed of AXI, AHB and APB. SROM Controller: 6 chip selects, supports SRAM, ROM and NOR Flash and supports 8/16 bit, and each chip select supports 128 MB. JPEG Codec: Supports JPEG encoding and decoding functions, with a maximum size of 4 096×4 096. 2D GRAPHICS: 2D acceleration, supports drawing points/lines, bitblt function and Color Expansion. 3D GRAPHICS: 3D acceleration.
S3C6410 can support 4 DMA controllers for data exchange within the system bus or between the system bus and the peripheral bus. Each controller contains 8 channels and supports 8/16/32 bit transfer. Now we take the external DMA request as an example to briefly introduce the DMA working process. Figure 3 shows the basic working sequence of DMA.
When a DMA operation is required, the external DMA request pin XnXDREQ is set to a low level. At this time, the DMA controller sends a request to the CPU to occupy the bus. When the bus request is successful, the XnXDACK pin becomes a low level, indicating that the CPU has handed over the right to use the bus to the DMA controller and data transmission can be performed. When the data transmission is completed, the response signal XnXDACK is set to a high level to notify the CPU to complete a DMA operation.
S3C6410 provides three different DMA operation modes: single service command mode, single service handshake mode and full service handshake mode. Before using DMA for data transmission, its related registers must be set, including the source address register, the destination address register and their respective control registers, as well as the control register for configuring the DMA mode.
2.2 FPGA and its implementation FIFO [2]
When using FPGA to implement a multi-clock circuit system, it is necessary to deal with the rate matching problem between different clock domains. This can be handled by using the asynchronous FIFO generated inside the FPGA. The asynchronous FIFO is mainly composed of a dual-port RAM, a write address generation module, a read address generation module, and a full and empty flag generation module. The dual-port RAM is composed of the Block RAM of the FPGA. The FPGA uses Atera's EP2C70-896C7, whose Block RAM read and write clock frequency can reach 216.73 MHz. Therefore, Block RAM is selected as the storage body, which is not only fast but also simple to design. During the design, one port is configured as a write port and the other port is configured as a read port, and then the pins of the Block RAM are connected to the corresponding control signals. The read and write addresses are generated by the binary carry logic inside the FPGA chip, and the corresponding Read_En/Write_En is used as the enable signal to count under the control of the read/write clock. The empty or full flag can be obtained by the relative position of the read or write address. This system uses two FIFOs to form a data transmission channel. The design diagram of the bidirectional FIFO is shown in Figure 4.
2.3 Implementation of DMA and soft FIFO interface of S3C6410
According to the DMA interface principle, the interface diagram between S3C6410 and FPGA is designed as shown in Figure 5.
The read clock is provided by the clock output pin CLKOUT0 of ARM. CLKOUT0 can output different clock frequencies according to the settings of the internal registers of S3C6410. The FIFO output data is connected to the data bus of S3C6410 after passing through the buffer (Buffer) with nGCS4 as the selection signal. nGCS4 is the chip select signal of BNAK4 in the storage space of S3C6410. When S3C6410 performs read and write operations on the storage space corresponding to this signal, BANK4 is low level, and it is high level at other times.
The write request signal of FIFO is controlled by S3C6410 and the full state of FIFO. When S3C6410 sends a START signal and FIFO is not full, the write request signal is high level, and FIFO writes data under the control of the write clock; when the START signal is canceled or the FIFO is full, the write request signal becomes low level and the write operation is stopped.
The read operation of FIFO is carried out in coordination with the DMA operation of S3C6410. The system adopts DMA operation in single service command mode, and transfers one byte of data each time. When the DREQ0 signal becomes low, the DMA operation starts. After each byte is transferred, a DACK0 response signal is generated. As long as DREQ0 is low, the DMA operation continues until the counter in the DMA control register is 0, generating a DMA interrupt. According to the above timing characteristics, the FIFO empty signal is used as the DMA request signal DREQ0. When the data output by the CCD is written into the FIFO, the empty signal jumps to a low level to start the DMA operation, and the DACK0 signal is used as the FIFO read request. After each DMA transfer is completed, the response signal moves the FIFO read pointer by one position to achieve fast and accurate data acquisition.
[page]
3 Image acquisition module
The programmable video input processor SAA7113H is used to process video signals. SAA7113H integrates powerful image chrominance and brightness processing functions and multiple output modes [3]; it has 32 working registers, which must be initialized through the I2C bus when the system is reset. This system uses grayscale images and does not use chrominance signals, so the data line is 8 bits. The interface between SAA7113H and FPGA is shown in Figure 6.
In this system, a logic sequence needs to be designed inside the FPGA to complete the acquisition of image data, and the data transmission is completed using asynchronous FIFO. This solution can solve the problem that the CCD output data frequency does not match the DSP and ARM.
4 DSP's EMIFA and FPGA-implemented soft FIFO interface
4.1 DSP's EMIFA interface [4-5]
The communication between DSPs (TMS320DM6416T) is carried out through the external memory interface (EMIFA) to access the off-chip memory. EMIFA consists of 64-bit data lines D[63:0], 20-bit address lines A[22:03], 8-bit byte enable lines BE[7:0], 4-bit address area chip select lines /CE3~/CE0 and read/write control signals of various memories. Each
/CEx space of TMS320DM6416T has 256 MB addressing space and can be configured to interface with various memory types such as SRAM, SDRAM, ZBTSRAM, Flash, FIFO, etc. The clock for EMIFA to read/write various memories can be configured by software to be EMIF's AECLKIN, CPU/4 or CPU/6. This design is configured to be EMIF's AECLKIN, and it is 133 MHz.
4.2 EMIF and soft FIFO interface realizes
the communication between DSP and asynchronous FIFO implemented by FPGA through EMIF port. Each read/write cycle of EMIF asynchronous interface is divided into three stages: setup time (SETUP), trigger time (STROBE), hold time (HOLD), and the time of each stage can be programmed to adapt to different read and write speeds. The timing diagrams of DSP reading and writing asynchronous FIFO are shown in Figure 7 and Figure 8 respectively [6]. The DSP read and write FIFO control signals are generated by FPGA, and their logical relationship is as follows:
Write FIFO signal: writ_clk = AECLKOUT
writ_req = ! (/CE+/AWE)
Read FIFO signal: read_clk = AECLKOUT
read_req = ! (/CE+/ARE)
In addition, the DSP that writes to the FIFO must respond with a full status flag, and the DSP that reads the FIFO must respond with a half-full status flag.
5 Data communication between DSPs [7]
In order to design a highly versatile image processing platform, the data transmission between processors must be universal, so that for applications in different systems, only the image processing algorithm code needs to be modified without modifying the communication between processors. The specific design is divided into the following two parts:
(1) Data communication protocol description (x=0,1,2,3)
Se/Re (Send/Receive)[0]: ARM requests DSPx to receive (the data bit is 1) or send through FPGA.
ARM[1:3]: This segment of data is the DSP number that DSPx sends a request to FPGA.
DSPx[4:6]: The number of the DSP that the ARM processor requests to respond to the FPGA.
Da_Le (Data_Leng)[7:18]: The length of the data that ARM requests DSPx to receive or send.
Da_Un (Data_Unit)[19]: This flag indicates whether the data is transmitted according to the data length of Data_leng*K (1K=1 024 bit). If it is 1, it means that the length of the received or sent data is Data_leng*K (1K=1024 bit); if it is 0, it means that the length of the received or sent data is Data_leng.
Da_Bl (Data_Block)[20:27]: This data indicates that ARM requests DSPx to receive or send Data_Block Data_leng K or Data_leng data blocks.
Da_Ch (Data _Result)[7:18]: ARM requests DSPx to receive or send the intermediate running result or final result of the algorithm code. This data segment is shared with Data_Leng.
In_Pr (Interrupt _Priority) [27:30]: Set the interrupt priority of DSP.
Ot_Use (DSP_State) [31:34]: DSP status flag information.
Ot_Use (Other_Use) [36:47]: User-defined data segment.
(2) Main process of data communication
First, FPGA receives the request signal from ARM[1:3] processor, and then calculates the check data SUM based on Data[0:34] and compares it with Parity_Check[35]. If they are not equal, FPGA resends the request signal to ARM processor; if they are equal and DSPx is in idle state, FPGA sends a receive or send data request to DSPx through Send/Receive, and transmits the collected image data to DSPx, and enables the corresponding FIFO data channel at the same time; DSPx also calculates the check data based on the received data information. If it is equal to Parity_Check, it uses EDMA to receive or send Data_Block* Data_leng (or Data_Block* Data_leng K) data to EMIF port according to the Send/Receive flag bit. If FPGA receives two or more DSPx data transmission request signals at the same time, FPGA determines the execution order according to the Interrupt _Priority port data.
6 System Performance Analysis[7]
The main factors affecting system performance are: the response speed of the ARM processor's coordinated work, the speed of DSP processing data, and the speed of data transmission between multiple processors. The first two factors are mainly determined by the processor's main frequency and processing power, so they are not tested. The data transmission speed between processors is one of the main parts of this design, and data transmission bandwidth and data transmission delay are important indicators for measuring data transmission speed.
If the bandwidth of the DSP read and write FIFO in the system is B (the amount of data transmitted between DSPs per unit time), then:
Table 1 is the average delay time measured when the ARM processor transmits data of different sizes to DSP1~DSP4 respectively, and Figure 9 is the actual bandwidth Bf curve drawn based on the test data. It can be seen that as the amount of data transmitted increases, Bf gradually approaches the theoretical value of B of 266 MB/s.
This paper designs a real-time image signal processing system based on ARM, FPGA, and multiple DSPs. A high-speed data transmission network interconnection structure designed with FPGA chips is used to match the data communication capability of the system with the computing capability of DSP. The data transmission control bus makes data transmission very flexible. S3C6410 is used to schedule image data transmission, allocate image data processing tasks, save, display, and transmit images over the network. Four TMS320C6416Ts are used to process images. According to the test, the processing time of the algorithm code is less than 0.2 s on a single DSP (TMS320C6416T 1 GHz) platform, while the processing time on this platform is less than 40 ms, which can meet the real-time requirements. In addition, the system can be widely used in other fields such as image processing, electronic countermeasures, and radar signal processing.
Previous article:Design of aviation bus interface board test platform based on MCU+FPGA
Next article:Electric vehicle monitoring platform based on DSP
- Popular Resources
- Popular amplifiers
- Machine Learning and Embedded Computing in Advanced Driver Assistance Systems (ADAS)
- Embedded Systems with RISC-V and ESP32-C3 - A practical introduction to architecture, peripherals and
- Multiplexed Networks for Embedded Systems: CAN, LIN, FlexRay, Safe-by-Wire
- Principles and Applications of Single Chip Microcomputers and Embedded Systems
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- EEWORLD University ---- Jixin STM32 Smart Car
- Experience in using PWM of 28069 and 28377D of C28x series
- At 10:00 this morning, we invite you to listen to the award-winning live broadcast: ADI's digital active noise reduction headphone solution allows technology to calm us down~
- Common MOS tube models and parameter comparison table
- From terminals to architecture, TE Connectivity (TE) helps you connect to the 5G high-speed future. Watch the video and answer questions to win gifts!
- ST's latest evaluation activity! Get the first-hand experience of the NUCLEO_G431RB development board here
- Some points to note when designing CC2530 2.4G ZigBee low power PCB
- Unveiling the secrets of NB-IoT modules DRX, eDRX, and PSM (Part 1)
- About the development process of the transmission control unit TCU
- EEWORLD University Hall----Live Replay: Microchip's Trusted Platform for CryptoAuthentication? Series