Application of DMA in real-time image processing

Publisher:柳絮轻风Latest update time:2007-03-09 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

  introduction

  Real-time image processing systems require that the system must complete a large amount of data operations within a limited time. With its unique Harvard bus structure and parallel memory block structure, DSP considers multiplication operations and addition operations uniformly, and can complete multiple operations of a general processor in one instruction cycle; and the instruction system adopts a multi-stage pipeline operation method to ensure The system has real-time requirements, so it can be widely used in real-time image processing systems. The biggest feature of the image processing system is the large amount of computing data. In most cases, the amount of data is much larger than the on-chip memory capacity, and data must be exchanged during the calculation process. Proper use of DMA can improve data transmission efficiency and achieve twice the result with half the effort. This article takes TMS320C6701 (C6701 for short) as an example to introduce several typical applications of DMA in image processing.

  1 Characteristics of image processing systems and the necessity of using DMA

  As mentioned earlier, the biggest feature of image processing systems is the large amount of computing data, which is often larger than the on-chip memory capacity. Not only that, in image processing systems, the intermediate data generated during the operation process is often the same size as the source data, which also limits the use of on-chip high-speed storage areas. However, in order to increase the processing speed, the calculation of source data and intermediate data must be performed as much as possible in the on-chip high-speed storage area. Therefore, DMA must be used to exchange data between the on-chip high-speed storage area and the off-chip low-speed storage area. Increase data processing speed. In addition, the arrangement of data often does not meet the requirements of the program; the data must be rearranged to meet the program requirements; using DMA to rearrange the data can meet the program requirements. Just like data rearrangement, the basis of many operations in image processing is the operation of multiple arrays, that is, matrix operations. Operations frequently used in image processing, such as inversion and subgraph extraction, can also be completed through DMA. Of course, these operations can be implemented by programming in C language. However, if the program implementation is a multiple loop, it is not conducive to software pipeline, and as the amount of data increases, the clock cycles consumed will increase proportionally; even if parallel assembly is used, the clock consumption will be higher. It can be reduced, but this does not meet the real-time requirements of the system. If the data is rearranged through DMA, it can be easily achieved, and the CPU only occupies one clock cycle for this process. Through clever program arrangements, the data transmission process can be safely carried out in the background of the CPU, and the existence of DMA is not felt at all.

  2 Introduction to C6x series DMA

  TMS320C6701S is a high-speed floating-point digital signal processing signal of the TMS320C6000 series. It is the latest generation DSP product of TI in the late 1990s. C6701 has 4 self-loading DMA channels for DMA transmission of data; in addition, 1 auxiliary DMA channel is responsible for communicating with the host. The DMA channel can complete data transfer in the mapped space without CPU involvement. The transmission of data can be between on-chip memory, on-chip peripheral components or external devices.

  2.1 DMA control register

  For the DMA of the C6x series, the following sets of registers must be set before using any DMA channel for data transmission. Each register and its functions are as follows: * Primary control register - used to control DMA status and transmission type; * Secondary control register (secondary control register) - used to enable CPU interrupts and monitor DMA channel status; * Transfer control register - used to record the number of units transferred; * Source control register - the starting address of the transfer * Destination control register - the destination address of the transfer ; In addition, the DMA channel can use the following global DMA registers to complete more complex transfer processes: *Global address register group (global address register A, B, C and D); *Global index register group (global index register A and register A and B). There are four 32-bit registers in the global address register group, which serve as split addresses or address reload values. Global index register 2 32-bit registers. Each register contains 2 control fields, of which the upper 16 bits are the frame index field (FRAME INDEX), whose value is the address offset between frames, that is, the address adjustment amount after transmitting one frame; the lower 16 bits are Data unit index field (ELEMENT INDEX), its value is the intra-frame address offset, that is, the address adjustment amount after each data unit is transmitted. The global count reload counter has the same structure as the global index register and is used to reload the transfer count register of the DMA channel. The global DMA register can be used by any DMA channel, and the same register can be used by more than one DMA channel at the same time.

  2.2 Introduction to DMA working process

  DMA is a very complex system. Due to space limitations, here we only briefly introduce the working process of DMA. In the C6000 series DMA, a certain number of transmitted data units (ELEMENT) are called frames (FRAME). The size of the frame is specified by the lower 16-bit data of the transmission count register, that is, the unit count field (ELEMENT COUNT). The value is specified by the upper 16 bits of the transfer count register, the frame count field (FRAME COUNT). When a DMA read operation is completed, the ELEMENT COUNT value automatically reaches 1; when the last data unit read operation is completed, the FRAME COUNT automatically decreases by 1. At this time, the value of the ELEMENT COUNT will be updated by the ELEMENT COUNT of the global count reload register; When the read operation of the last frame is completed, the transfer count register will be updated with the value of the global count reload register. The DMA controller is responsible for address calculation for read and write transfers for each channel. When the computer transmits the address, there are two ways: basic adjustment and adjustment using the global index register: basic adjustment refers to setting the transmission address through the control domain SRC DIR and DST DIR, increasing or decreasing according to the data word size (controlled by ESIZE) Or remain unchanged; the use of global index register adjustment is different from the basic adjustment. In this mode, the address is adjusted according to whether the transmitted data element is the last one of the current frame. In global index register adjustment mode, the address adjustment value is controlled by the global index register. The global index register contains 2 control fields, of which the upper 16 bits are the frame index field (FRAME INDEX), whose value is the address offset between frames, that is, the address adjustment amount after transmitting one frame; the lower 16 bits are data The unit index field (ELEMENT INDEX), the intra-frame address offset of its value, is the address adjustment amount after each data unit is transmitted.

  3 Several typical DMA operations and their applications

  3.1 Block movement

  Block movement can transfer a continuous block of data from one address to another. It is usually used to move data or programs from external memory to internal memory. This block movement is the simplest and most common way of DMA working. For example, move a 1K continuous 32-bit data block from external memory (0x02000000) to memory (0x80000000), as shown in Figure 1.

  Value settings of related registers: Primary control register =0x00000050 Transfer control register =0x00000400 Source control register =0x02000000 Destination control register =0x80000000 The settings and meanings of each control domain of the main control register are as follows: DST RELOAD =00 No target address reload SRC RELOAD = 00 Passive address reload EMOD =0 FS =0 No frame synchronization TCINT =1 Allow interrupt PRI =1 DMA priority WSYNC =00000 No read synchronization RSYNC =000 No write synchronization RSYNC =00 No write synchronization FRAME COUNT =0X000 ELEMENT COUNT = 0X0400 INDEX =0 Global count reload register A CNT RELOAD =0 Global count reload register A SPLIT =00 No split address ESIZE =00 Data unit 4BYTES DSTDIR =11 Index register mode SRCDIR =01 Address increment STATUS =00 This bit is read only START =00 DMA stops. Write 01b in the START read of the master control register to start DMA transmission.

  3.2 Data rearrangement

  Often the format of the data does not meet the requirements of the operation. In this case, data can be rearranged through DMA to meet the requirements of the operation. Data rearrangement mainly uses the DMA frame transmission method. The most critical step necessary for data rearrangement is to set the global register, so the focus of the following discussion is the setting of the global register.

  3.2.1 Find matrix transpose

  Figure 2 shows the arrangement before and after a 16-bit continuous data area located in external memory, with the starting address (0x02000000), is rearranged and moved to the on-chip storage area, with the first address being (0x80000000).

  In data rearrangement, the main thing is to correctly set the global index register. Here, 1 frame can be regarded as an array, then the data unit is the element of the array. Therefore, if we assume that there are F%26;#215;E matrices, that is, there are F frames of data, each frame has E data units, each element is S (Byte), and the rearrangement is E%26;#215;F matrix. In this case, the source address is incremented and the destination address is adjusted based on the value of the global index register. When adjacent data units within a frame are transmitted, the target address offset should be F%26;#215;S, so the address after transmitting one frame is always called (E-1)%26;#215;F , therefore, the address of the first data unit of the next frame is the current address minus ((E-1)%26;#215;F-1)%26;#215;S. That is to say, *FRAME INDEX should be set to - ((E-1)%26;#215;F-1)%26;#215;S *ELEMENT INDEX should be set to F%26;#215;S above The register setting in the example is: *FRAME INDEX =-((2-1) %26;#215;4-1) %26;#215;2=0xFFEE *ELEMENT INDEX=4%26;#215;2= 8 Therefore, the register settings are as follows: Primary control register =0x030001D0 Transfer control register =0x00040002 Source control register =0x02000000 Destination control register =0x80000000 Global index register A =0xFFFA0008 Global count reload A =0x00000002 3.2.2 Acquisition of image sub-image image processing, It is often necessary to extract a sub-image of a certain size from the image and then process the sub-image. For large images, the size often exceeds the size of the on-chip memory of the DSP system, and this picking operation becomes an essential step. This can be done using the global index register. For example, a 2% 26;#215;4 subimage is extracted from an 8%26;#215;4 image, as shown in Figure 3, where each data unit is 1Byte.

  It can be described like this: There is F1 frame data, each frame of data has E1 data units, each data unit is S (Byte); the extracted part is F2 frame data, each frame of data has E2 data units, and the data unit is S ( Byte). In this case, because after the data transfer is completed, the destination storage area contains continuous data, the destination address is incremented; the source address is adjusted according to the value of the global index register. When adjacent data units within a frame are transmitted, the source address offset should be S; when the last data unit of the frame is read, the source address pointer skips (E1-E2) data units, that is, the address adjustment amount between frames is ((E1-E1)+1)%26;#215;S. Such global register settings: **FRAME INDEX =((8-4)+1%26;#215;1=4 *ELEMENT INDEX=1 *FRAME COUNT =2 *ELEMENT INDEX=4 Therefore, the register settings are as follows: Primary control register =0x03000270 Transfer control register =0x00020004 Source control register =0x02000000 Destination control register =0x80000000 Global index register A =0x00050001 Global reload register A =0x00000001

  Conclusion

  As a real-time system, it is crucial to select a reasonable and effective core algorithm. At the same time, selecting an effective data transmission method cannot be ignored. In our actual work, we found that in most cases, the time spent on data transmission often exceeds the time of data processing, becoming a bottleneck in real-time image processing systems. Therefore, rational use of DMA to improve data transmission efficiency is of great practical value and significance.

Reference address:Application of DMA in real-time image processing

Previous article:Data communication in DSP/BIOS environment
Next article:PCI2040 bridges TMS320VC5420 to PCI bus

Latest Embedded Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号