Image compression system based on TMS320C5409-EEWORLD

Collect

　　Introduction

　　With the development of multimedia and network technology, the large amount of information in digital images has placed higher and higher requirements on image compression technology. Therefore, dedicated high-speed digital information processing technology has become the direction of development. Among them, in hardware technology, the C5000 series DSP launched by TI has raised the processing capabilities of digital signal processors to a new level, returning the focus of signal processing system research to software algorithms. In terms of compression algorithm research, DCT, wavelet and other algorithms are becoming more and more popular because of their high reliability and efficiency.

　　System hardware design

　　Feasibility analysis of TMS320C5409 as the main processor

　　TMS320C5409 has a clock frequency of 100MHz and is extremely cost-effective. Using an improved Harvard structure built around 1 set of program buses, 3 sets of data buses and 4 sets of address buses, addressing and reading can be performed simultaneously. There is an independent hardware multiplier, which is conducive to realizing a large number of repeated multiplication operations in algorithms such as optimized convolution, digital filtering, FFT, and matrix operations. It has special instructions such as circular addressing and bit reversal. These instructions greatly improve the addressing, sorting and calculation speed in operations such as FFT and convolution. There are one or more independent DMA buses that work in parallel with the CPU's program and data buses.

　　In this system, TMS320C5409 serves as the main processor, and its task is to implement JPEG compression encoding.

　　Through analysis, it is not difficult to find that when processing an image with a frame size of 640×480, the time required for JPEG compression encoding is: T=62×10(ns)×640×480=0.19866s. When the processed image is resolved When the rate is smaller, it takes less time to compress each frame, which is completely feasible for applications that do not have high real-time requirements.

　　Hardware design block diagram

　　Figure 1 is the structure diagram of the image processing system based on TMS320C5409. C5409 is the central processing unit, SRAM is the DSP off-chip extended data memory, EEPROM is the program memory when working offline, and is used to store the boot program and other applications of the system. The A/D conversion part is responsible for converting images into digital signals. Store in frame memory. The address decoding and image acquisition system control circuit generates address decoding signals for each part of the system, maps them to different address areas, and controls the ADC for image acquisition. This part is controlled by CPLD; the register of the image acquisition chip is controlled by 51 The microcontroller is completed.

　　Storage space expansion plan

　　The original image data after A/D conversion is very large. The TMS320C5409 only has 32KB of RAM and 16KB of ROM inside, which cannot meet the needs. Therefore, the memory must be expanded to store the original image data and applications. This article considers external 64KB RAM and 512KB Flash. RAM uses Cypress's CY7C1021V33, and Flash uses SST's SST39VF512. Since the data space of C5409 is only 64KB, memory page expansion technology is used. The expansion output ports 1Q and 2Q of C5409 are used as page selection signals for expansion memory. Use the A15 pin and XF pin of C5409 to control the generation of the extended memory chip select signal through the 3/8 decoder. When A15=0, the on-chip RAM is selected; when A15=1, XF=0, the off-chip RAM is selected. SRAM; when A15=1 and XF=1, select off-chip Flash; the memory expansion is shown in Figure 2. 48KB of the 64KB of external extended RAM is used to store original image data, and 16KB is used to store compressed images, programs and temporary data.

　　DSP chip power supply circuit design

　　The main issues to be considered in power supply design are power and heat dissipation. Power requirements: Current consumption mainly depends on the activation degree of the device, that is, the activation degree of the CPU. Peripheral power consumption mainly depends on the working peripherals and their speed. Compared with the CPU, the peripheral power consumption is relatively small. Taking TMS320C5409 as an example, when performing FFT operation, the maximum power supply current is required. Therefore, when designing the power supply, it is necessary to consider leaving a certain margin between the supply current and the actual required current, because the peak current will be larger, and the margin is at least 20%.

　　C5409 adopts a dual power supply mechanism with operating voltages of 3.3V and 1.8V. Among them, 1.8V mainly provides voltage for the internal logic of the DSP, including the CPU and all other peripheral logic. The external interface pin uses 3.3V voltage. The power supply of this system uses TI's two-way output power chip TPS73HD318, which is a dual-output voltage regulator. The output voltage is 3.3V in one channel and 1.8V in the other channel. The maximum output current of each power supply is 750mA.

　　JPEG Image Compression Algorithm

　　Optimization of JPEG Algorithm

　　Although the basic JPEG system can compress images with low compression ratio, DCT and IDCT are the most time-consuming operations in the process of software implementation. Moreover, since the spectral characteristics of the image itself are not considered, JPEG quantization tables are not necessarily optimal for all image compression. The use of fast DCT algorithm can improve the speed of the software and enhance the real-time performance of the software. At the same time, the quantization table recommended by JPEG is adaptively improved according to the spectral characteristics of the image itself.

　　Fast DET Algorithm

　　If an image is divided into many 8×8 small blocks and then directly 2D-DCT transformed, the amount of calculation will be very huge. Therefore, it is necessary to convert the 8×8 two-dimensional DCT transformation into two 8-point one-dimensional DCT composite operations. The specific method is to first perform DCT transformation in the column direction for each 8×8 block to obtain an intermediate matrix, and then perform DCT transformation on each row of the matrix. It can be seen that the 2-dimensional DCT of the 8×8 matrix can be converted into a 16-time one-dimensional 8-point DCT.

　　Currently, many fast DCT algorithms for one-dimensional DCT operations have been proposed. Among them, Loeffler's algorithm requires the smallest amount of calculations. Loeffler's algorithm divides the 8-point one-dimensional DCT operation into 4 levels. Due to the dependency between the input/output of each level, the 4-level operations must be performed serially, while the operations within each level can be processed in parallel.

　　There are three types of operation factors in the flow chart: butterfly factor, rotation factor and multiplication factor, as shown in a, b and c in Figure 3 respectively. The operational relationship of butterfly factor is:

D ₀ =I ₀ +I ₁

O ₁ =I ₀ -I ₁

　　It requires two additions to complete. The input/output relationship of the multiplication factor is relatively simple: , and only one multiplication is required. The operation relationship of the rotation factor is:

　　It requires 4 multiplications and 2 additions to complete. If the input/output relationship is transformed as follows:

　　Only 3 multiplications and 3 additions are required. Among them, sum and difference are known coefficients and can be obtained by looking up the table.

　　From this calculation, it can be seen that an 8-point DCT Loeffler algorithm requires a total of 11 multiplications and 29 additions. From the perspective of DSP assembly language programming, an algebraic operation should include three steps: operand fetching, operation, and operand storage. Therefore, the algorithm requires approximately 120 instructions. C5409 has strong computing power, supports single-cycle addition/subtraction and single-cycle multiplication operations, and can complete two 16-bit addition/subtraction operations in a single cycle. In addition, there are 3 sets of data buses in the DSP, so it can Use long operands (32 bits) to perform long word operations. In long word instructions, the given address always accesses the upper 16-bit operand, so only 5 long word instructions can calculate 2 butterfly operations. Including other optimization measures, it takes about 90 instructions to complete the Loeffler algorithm.

　　Although the Loeffler algorithm has the smallest amount of calculations, it is not optimal when applied to this system. Because this algorithm is designed for high-level languages, it does not take advantage of the characteristics of assembly language and DSP hardware. This article proposes a fast DCT algorithm based on DSP multiply-accumulate unit.

　　The multiply-accumulate unit of DSP can complete one multiplication and one accumulation operation in a single cycle. If assembly instructions are used in DCT operations, the complexity of the program will be greatly simplified and the calculation time will be reduced. The specific algorithm is as follows, using butterfly operation:

　　As can be seen from the above expression, y(0)-y(7) are all multiplication-accumulation operations, and s0-s7 can be obtained from x(0)-x(7) through butterfly operation. Therefore, the DCT algorithm is derived from the original The four-level operation becomes two levels, namely the first-level butterfly operation and the second-level multiplication-accumulation operation. The first-level butterfly operation requires a total of 10+4=14 (10 calculation operations and 4 auxiliary operations) instructions. In the second-level operation, each output requires 4+1+1=6 instructions (4 multiplication and accumulation operations, 1 read operation and 1 storage operation), a total of 48 instructions. In this way, calculate an 8-point DCT requires 62 instructions, which greatly reduces the calculation time, improves the efficiency of the CPU, and enhances the real-time performance of the system.

　　Quantization operation optimization

　　This article proposes an adaptive quantization method based on the actual situation, that is, a secondary calculation method is used in the quantization stage. The algorithm is mainly divided into two steps: (1) adaptive processing of the transformed image coefficients; (2) Construct a new quantization table. The specific method is as follows:

　　First, find the average value P(u, v) of the absolute values of the 63 AC coefficients of all 8×8 sub-blocks in the frequency domain for the brightness component and the two chroma components, where u, v=0... 7 is location information. Next, find the maximum value among the 163 average AC coefficients, Z1 (u, v) = MAX [P1 (u, v)]. Finally, normalize the 63 average AC coefficients and add the frequency position. information, respectively, to obtain the correction coefficients of the 63 AC components in the luminance and chromaticity quantification tables. The calculation process is:

　　From this, the correction formula Qpl(u, v)=Q1(u, v)/X1(u, v) of the quantization table can be obtained, and the JPEG quantization table can be corrected.

　　The above corrected quantization table is used as the final quantization table, and standard JPEG compression is performed on the image to form a compressed file that fully conforms to the JPEG format. The decoding process and standards of this algorithm. The JPEG decoding process is exactly the same, and it can be seen that it is also a standard. The reverse process of the IPEG encoding process.

　　Experimental results

　　: Fast DCT operation:

　　The algorithm proposed in this article, Loeffler's DSP optimization algorithm and pure Loeffler algorithm were tested respectively. The results are shown in Table 1. It can be seen that the algorithm in this paper saves about 1/4 of the time compared to Loeffler's DSP optimization algorithm and about half of the time compared to the pure Loeffler algorithm. The effect is very obvious.

　　Adaptive Quantization

　　Simulates an adaptive quantizer. This article uses standard images of medium complexity as test images to compare performance with the basic JPEG system (based on peak signal-to-noise ratio (PSNR)). Simply changing the quantization table in the JPEG standard method to a modified quantization table can improve the quality of the restored image under the same compression ratio. Table 2 shows the peak signal-to-noise ratio using two methods, JPEG quantization table and adaptive quantization table, under different compression ratios. It can be seen from the comparison results of compression ratio and peak signal-to-noise ratio that the compression ratio of the adaptive quantization JPEG method is slightly higher than the standard JPEG method.

　　Conclusion:

　　This article uses TI's TMS320C5409 as the development platform to implement a new JPEG image compression system. The advantages of this system are that it improves the running speed of JPEG, enhances the compression rate and quality of images, and is easy to implement in hardware. This solution can be applied to most situations where real-time collection, compression and storage of video images are required.

Reference address：Image compression system based on TMS320C5409

Previous article：Research on "Degaussing Dynamic" Method Testing of Transformer DC Resistance Based on DSP
Next article：A CPLD-based microcontroller and PCI interface design solution

Popular Resources
Popular amplifiers