Design of Image Compression System Based on DSP-EEWORLD

Collect

With the development of multimedia and network technology, the large amount of information in digital images has put higher and higher demands on image compression technology. Therefore, dedicated high-speed digital information processing technology has become the direction of development. The C5000 series DSP launched by TI has brought the research focus of signal processing systems back to software algorithms. In the research of compression algorithms, multiple algorithms such as DCT and wavelet are becoming more and more popular because of their high reliability and efficiency.

System hardware design

Feasibility Analysis of TMS320C5409 as Main Processor

The clock frequency of TMS320C5409 is 100MHz, which is very cost-effective. It adopts an improved Harvard structure built around 1 set of program buses, 3 sets of data buses and 4 sets of address buses, and addressing and reading can be performed simultaneously. There is an independent hardware multiplier, which is conducive to optimizing a large number of repeated multiplication operations in algorithms such as convolution, digital filtering, FFT, and matrix operations. It has special instructions such as circular addressing and bit reversal, which greatly improve the addressing, sorting and calculation speed in operations such as FFT and convolution. There is one or more independent DMA buses that work in parallel with the program and data buses of the CPU.

In this system, TMS320C5409 is used as the main processor, and its task is to implement JPEG compression encoding.

Through analysis, it is not difficult to find that when processing a frame of image with a size of 640×480, the time required for JPEG compression encoding is: T=62×10(ns)×640×480=0.19866s. When the resolution of the processed image is smaller, the time spent on compressing each frame is less. This is completely feasible for applications where real-time requirements are not very high.

Figure 1 is a structural diagram of the image processing system based on TMS320C5409. C5409 is the central processor, SRAM is the DSP off-chip extended data memory, EEPROM is the program memory for offline operation, which is used to store the system's boot program and other applications, and the A/D conversion part is responsible for storing the image converted into a digital signal into the frame memory. The address decoding and image acquisition system control circuit generates address decoding signals for each part of the system, maps them to different address areas, and controls the ADC for image acquisition, which is controlled by the CPLD; the register control of the image acquisition chip is completed by the 51 single-chip microcomputer.

Storage space expansion plan

The raw image data after A/D conversion is very large. The internal RAM and ROM of TMS320C5409 are only 32KB and 16KB, which cannot meet the needs. Therefore, the memory must be expanded to store the raw image data and application programs. This paper considers external 64KB RAM and 512KB Flash. RAM uses Cypress's CY7C1021V33, and Flash uses SST's SST39VF512. Since the data space of C5409 is only 64KB, memory page expansion technology is adopted. The expansion output ports 1Q and 2Q of C5409 are used as the page selection signals of the extended memory. The A15 pin and XF pin of C5409 are used to control the generation of the extended memory chip select signal through the 3/8 decoder. When A15=0, select the on-chip RAM; when A15=1, XF=0, select the off-chip SRAM; when A15=1, XF=1, select the off-chip Flash; the memory expansion is shown in Figure 2. 48KB of the 64KB external expansion RAM is used to store original image data, and 16KB is used to store compressed images and programs as well as temporary data.

Design of Image Compression System Based on DSP

DSP chip power supply circuit design

The main issues that need to be considered in power supply design are power and heat dissipation. Power requirements: The current consumption mainly depends on the activation of the device, that is, the activation of the CPU. The power consumption of peripherals mainly depends on the peripherals that are working and their speed. Compared with the CPU, the power consumption of peripherals is relatively small. Taking TMS320C5409 as an example, when performing FFT calculations, the required power supply current is the largest. Therefore, when designing a power supply, it is necessary to consider leaving a certain margin between the power supply current and the actual required current, because the peak current will be larger, and the margin is at least 20%.

C5409 uses a dual power supply mechanism, with an operating voltage of 3.3V and 1.8V. Among them, 1.8V mainly provides voltage for the internal logic of DSP, including CPU and all other peripheral logic. The external interface pin uses 3.3V voltage. The power supply of this system uses TI 's two-way output power chip TPS73HD318, which is a dual-output voltage regulator. The output voltage is 3.3V on one side and 1.8V on the other side. The maximum output current of each power supply is 750mA.

JPEG Image Compression Algorithm

Optimization of JPEG algorithm

Although the JPEG basic system can compress images at a low compression ratio, DCT and IDCT are the most time-consuming operations in the software implementation process. Moreover, since the spectral characteristics of the image itself are not considered, the JPEG quantization table is not necessarily optimal for all image compression. The use of a fast DCT algorithm can increase the speed of the software and enhance the real-time performance of the software. At the same time, according to the spectral characteristics of the image itself, the quantization table recommended by JPEG is adaptively improved.

Fast DET algorithm

If an image is divided into many 8×8 small blocks and then directly subjected to 2D-DCT transformation, the amount of computation will be very huge. Therefore, it is necessary to convert the 8×8 two-dimensional DCT transformation into two 8-point one-dimensional DCT composite operations. The specific method is to first perform DCT transformation in the column direction for each 8×8 block to obtain an intermediate matrix, and then perform DCT transformation on each row of the matrix. It can be seen that the 2D DCT of the 8×8 matrix can be converted into 16 one-dimensional 8-point DCT.

At present, many fast DCT algorithms for one-dimensional DCT operations have been proposed. Among them, the Loeffler algorithm requires the least amount of calculation. The Loeffler algorithm divides the 8-point one-dimensional DCT operation into 4 levels. Due to the dependency between the input/output of each level, the 4-level operation must be performed serially, while the operations within each level can be processed in parallel.

There are three types of operation factors in the flowchart: butterfly factor, rotation factor and multiplication factor, which are shown as a, b and c in Figure 3 respectively. The operation relationship of the butterfly factor is:

D0=I0+I1

O1=I0-I1

It takes two additions to complete. The input/output relationship of the multiplication factor is relatively simple:, only one multiplication is required, and the operation relationship of the rotation factor is:

It requires 4 multiplications and 2 additions. If the input/output relationship is transformed as follows:

Only 3 multiplications and 3 additions are needed.

Reference address of this article: http://www.eepw.com.cn/article/247104.htm

The sum and difference are known coefficients and can be obtained by looking up the table.

From this calculation, we can know that the Loeffler algorithm for an 8-point DCT requires a total of 11 multiplications and 29 additions. From the perspective of DSP assembly language programming, an algebraic operation should include three steps: fetching operands, calculating, and storing operands. Therefore, the algorithm requires about 120 instructions. The C5409 has strong computing power, supports single-cycle addition/subtraction and single-cycle multiplication operations, and can complete two 16-bit addition/subtraction operations in a single cycle. In addition, there are three sets of data buses in the DSP, so long operands (32 bits) can be used for long-word operations. In long-word instructions, the given address always accesses the high 16-bit operand, so only 5 long-word instructions are needed to calculate 2 butterfly operations. In addition to taking other optimization measures, it takes about 90 instructions to complete the Loeffler algorithm.

Although the Loeffler algorithm has the smallest amount of computation, it is not optimal for the system in this paper because the algorithm is designed for high-level languages and does not take advantage of the characteristics of assembly language and DSP hardware. This paper proposes a fast DCT algorithm based on DSP multiplication and accumulation unit.

The multiplication and accumulation unit of DSP can complete one multiplication and one accumulation operation in a single cycle. If the assembly instruction is used for DCT operation, it will greatly simplify the complexity of the program and reduce the calculation time. The specific algorithm is as follows, using butterfly operation:

From the above expression, we can see that y(0)-y(7) are all multiplication-accumulation operations, and s0-s7 can be obtained by butterfly operation from x(0) to x(7). Therefore, the DCT algorithm is changed from the original four-level operation to two-level, namely the first-level butterfly operation and the second-level multiplication-accumulation operation. The first-level butterfly operation requires a total of 10+4=14 (10 calculation operations and 4 auxiliary operations) instructions. In the second-level operation, each output requires 4+1+1=6 instructions (4 multiplication-accumulation operations, 1 read operation and 1 storage operation), a total of 48 instructions. In this way, it takes 62 instructions to calculate an 8-point DCT, which greatly reduces the operation time, improves the CPU efficiency, and enhances the real-time performance of the system.

Quantization Operation Optimization

This paper proposes an adaptive quantization method based on actual conditions, that is, the quantization stage uses a secondary calculation method. The algorithm is mainly divided into two steps: (1) adaptively processing the transformed image coefficients; (2) constructing a new quantization table. The specific method is as follows:

First, find the average value P(u,v) of the absolute values of the 63 AC coefficients of all 8×8 sub-blocks of the luminance component and the two chrominance components in the frequency domain, where u,v=0…7 is the position information. Next, find the maximum value of the 163 AC coefficient averages, Z1(u,v)=MAX[P1(u,v)], and finally normalize the 63 AC coefficient averages, add the frequency position information, and obtain the correction coefficients of the 63 AC components in the luminance and chrominance quantization tables respectively. The calculation process is:

Thus, we can get the correction form of the quantization table Qpl(u,v)=Q1(u,v)/X1(u,v) to correct the JPEG quantization table.

The corrected quantization table is used as the final quantization table to perform standard JPEG compression on the image to form a compressed file that fully complies with the JPEG format. The decoding process of this algorithm is exactly the same as the standard JPEG decoding process, and it can be seen that it is also the inverse process of the standard JPEG encoding process.

Experimental Results

Fast DCT operation

The algorithm proposed in this paper, Loeffler's DSP optimization algorithm and pure Loeffler algorithm were tested respectively. The results are shown in Table 1. It can be seen that the algorithm proposed in this paper saves about 1/4 of the time compared with Loeffler's DSP optimization algorithm and about half of the time compared with the pure Loeffler algorithm. The effect is very obvious.

Adaptive Quantization

The adaptive quantizer is simulated. This paper uses a standard image of medium complexity as a test image to compare the performance with the basic JPEG system (based on peak signal-to-noise ratio (PSNR)). By simply changing the quantization table in the JPEG standard method to a modified quantization table, the quality of the restored image can be improved at the same compression ratio. Table 2 shows the peak signal-to-noise ratio of the two methods using the JPEG quantization table and the adaptive quantization table at different compression ratios. From the comparison results of the compression ratio and the peak signal-to-noise ratio, it can be seen that the compression ratio of the adaptive quantization JPEG method is slightly higher than that of the standard JPEG method.

Conclusion

The advantages of this system are that it improves the running speed of JPEG, enhances the compression rate and quality of images, and is easy to implement in hardware. This solution can be applied to most occasions where real-time acquisition, compression and storage of video images are required.

Keywords：DSP Reference address：Design of Image Compression System Based on DSP

Previous article：Design of local meteorological monitoring device for power transmission lines based on DSP and CPLD
Next article：Design of Harmonic Controller Based on DSP

Recommended ReadingLatest update time:2024-11-15 13:26

(Multiple images) PCB-level electromagnetic compatibility design for high-speed DSP systems

　　Printed circuit boards (PCBs) provide electrical connections between circuit components and devices. They are the most basic components of various electronic devices, and their performance is directly related to the quality of electronic devices. With the development of electronic technology, various electronic prod

[Embedded]

(Multiple images) PCB-level electromagnetic compatibility design for high-speed DSP systems

Research on DSP+FPGA Airborne Bus Interface Board (Part 2)

The main function of the sending part is to temporarily store the data sent by the DSP in the FIFO inside the FPGA, waiting for the sending command. Once the sending control instruction is received, the FIFO outputs the data and converts the parallel data into serial data through parallel/serial conversion, and adds a

[Analog Electronics]

Research on DSP+FPGA Airborne Bus Interface Board (Part 2)

Design of Ethernet Interface for TM1300 DSP System

　　1 Overview Reference address of this article: http://www.eepw.com.cn/article/266272.htm 　　With the rapid development of network technology and multimedia technology, multimedia applications based on IP networks are becoming more and more widespread. TM1300 is a high-performance multimedia digital signal process

[Embedded]

Design of Ethernet Interface for TM1300 DSP System

DSP Programming Skills 5---Unveiling the Mystery of Compiler Debugging and Path Options

　In program writing and testing, debugging function is very important. Often we need to debug and observe step by step to find some deeply hidden bugs, so we need to have some understanding of the compiler 's debugging options. Let's first look at the compiler 's debugging options. 　Table 1 Compiler debugging

[Embedded]

Realization of high frequency signal source for three-dimensional induction logging based on DSP and DDS

High-frequency signal source design is an important part of 3D induction logging. The principle of 3D induction logging is to use an excitation signal source to transmit high-frequency signals through three orthogonal transmitting coils, and then obtain multiple groups of magnetic field components through multiple grou

[Test Measurement]

Realization of high frequency signal source for three-dimensional induction logging based on DSP and DDS

Design of frequency characteristic analyzer based on DSP

The frequency characteristic analyzer can quickly and dynamically measure the frequency characteristics of the network under test, obtain the transmission characteristics of the network under test, and display the measurement results in real time in the form of data or graphics. Most traditional sweep frequency analyze

[Test Measurement]

Design of frequency characteristic analyzer based on DSP

Realization of discrete frequency coded radar signal based on DSP

Abstract: The discrete frequency coded sequence set is a set of orthogonal coded waveform sequences with good autocorrelation and cross-correlation. Its signal can improve the target search, tracking and recognition capabilities of the mesh multi-radar system. In order to design this signal, it is necessary to solve

[Embedded]

Realization of discrete frequency coded radar signal based on DSP

Design of Harmonic Controller Based on DSP

The power quality problem in today's power system is becoming more and more prominent. On the one hand, a large number of sensitive loads have higher and higher requirements for power quality. On the other hand, more and more nonlinear loads are continuously connected to the power grid, which makes the overall power q

[Embedded]

Design of Harmonic Controller Based on DSP

Popular Resources
Popular amplifiers