With the development of multimedia and network technology, the large amount of information in digital images has put higher and higher demands on image compression technology. Therefore, dedicated high-speed digital information processing technology has become the direction of development. The C5000 series DSP launched by TI has brought the research focus of signal processing systems back to software algorithms. In the research of compression algorithms, multiple algorithms such as DCT and wavelet are becoming more and more popular because of their high reliability and efficiency.
System hardware design
Feasibility Analysis of TMS320C5409 as Main Processor
The clock frequency of TMS320C5409 is 100MHz, which is very cost-effective. It adopts an improved Harvard structure built around 1 set of program buses, 3 sets of data buses and 4 sets of address buses, and addressing and reading can be performed simultaneously. There is an independent hardware multiplier, which is conducive to optimizing a large number of repeated multiplication operations in algorithms such as convolution, digital filtering, FFT, and matrix operations. It has special instructions such as circular addressing and bit reversal, which greatly improve the addressing, sorting and calculation speed in operations such as FFT and convolution. There is one or more independent DMA buses that work in parallel with the program and data buses of the CPU.
In this system, TMS320C5409 is used as the main processor, and its task is to implement JPEG compression encoding.
Through analysis, it is not difficult to find that when processing a frame of image with a size of 640×480, the time required for JPEG compression encoding is: T=62×10(ns)×640×480=0.19866s. When the resolution of the processed image is smaller, the time spent on compressing each frame is less. This is completely feasible for applications where real-time requirements are not very high.
Figure 1 is a structural diagram of the image processing system based on TMS320C5409. C5409 is the central processor, SRAM is the DSP off-chip extended data memory, EEPROM is the program memory for offline operation, which is used to store the system's boot program and other applications, and the A/D conversion part is responsible for storing the image converted into a digital signal into the frame memory. The address decoding and image acquisition system control circuit generates address decoding signals for each part of the system, maps them to different address areas, and controls the ADC for image acquisition, which is controlled by the CPLD; the register control of the image acquisition chip is completed by the 51 single-chip microcomputer.
Storage space expansion plan
The raw image data after A/D conversion is very large. The internal RAM and ROM of TMS320C5409 are only 32KB and 16KB, which cannot meet the needs. Therefore, the memory must be expanded to store the raw image data and application programs. This paper considers external 64KB RAM and 512KB Flash. RAM uses Cypress's CY7C1021V33, and Flash uses SST's SST39VF512. Since the data space of C5409 is only 64KB, memory page expansion technology is adopted. The expansion output ports 1Q and 2Q of C5409 are used as the page selection signals of the extended memory. The A15 pin and XF pin of C5409 are used to control the generation of the extended memory chip select signal through the 3/8 decoder. When A15=0, select the on-chip RAM; when A15=1, XF=0, select the off-chip SRAM; when A15=1, XF=1, select the off-chip Flash; the memory expansion is shown in Figure 2. 48KB of the 64KB external expansion RAM is used to store original image data, and 16KB is used to store compressed images and programs as well as temporary data.
DSP chip power supply circuit design
The main issues that need to be considered in power supply design are power and heat dissipation. Power requirements: The current consumption mainly depends on the activation of the device, that is, the activation of the CPU. The power consumption of peripherals mainly depends on the peripherals that are working and their speed. Compared with the CPU, the power consumption of peripherals is relatively small. Taking TMS320C5409 as an example, when performing FFT calculations, the required power supply current is the largest. Therefore, when designing a power supply, it is necessary to consider leaving a certain margin between the power supply current and the actual required current, because the peak current will be larger, and the margin is at least 20%.
C5409 uses a dual power supply mechanism, with an operating voltage of 3.3V and 1.8V. Among them, 1.8V mainly provides voltage for the internal logic of DSP, including CPU and all other peripheral logic. The external interface pin uses 3.3V voltage. The power supply of this system uses TI 's two-way output power chip TPS73HD318, which is a dual-output voltage regulator. The output voltage is 3.3V on one side and 1.8V on the other side. The maximum output current of each power supply is 750mA.
JPEG Image Compression Algorithm
Optimization of JPEG algorithm
Although the JPEG basic system can compress images at a low compression ratio, DCT and IDCT are the most time-consuming operations in the software implementation process. Moreover, since the spectral characteristics of the image itself are not considered, the JPEG quantization table is not necessarily optimal for all image compression. The use of a fast DCT algorithm can increase the speed of the software and enhance the real-time performance of the software. At the same time, according to the spectral characteristics of the image itself, the quantization table recommended by JPEG is adaptively improved.
Fast DET algorithm
If an image is divided into many 8×8 small blocks and then directly subjected to 2D-DCT transformation, the amount of computation will be very huge. Therefore, it is necessary to convert the 8×8 two-dimensional DCT transformation into two 8-point one-dimensional DCT composite operations. The specific method is to first perform DCT transformation in the column direction for each 8×8 block to obtain an intermediate matrix, and then perform DCT transformation on each row of the matrix. It can be seen that the 2D DCT of the 8×8 matrix can be converted into 16 one-dimensional 8-point DCT.
At present, many fast DCT algorithms for one-dimensional DCT operations have been proposed. Among them, the Loeffler algorithm requires the least amount of calculation. The Loeffler algorithm divides the 8-point one-dimensional DCT operation into 4 levels. Due to the dependency between the input/output of each level, the 4-level operation must be performed serially, while the operations within each level can be processed in parallel.
There are three types of operation factors in the flowchart: butterfly factor, rotation factor and multiplication factor, which are shown as a, b and c in Figure 3 respectively. The operation relationship of the butterfly factor is:
D0=I0+I1
O1=I0-I1
It takes two additions to complete. The input/output relationship of the multiplication factor is relatively simple:, only one multiplication is required, and the operation relationship of the rotation factor is:
It requires 4 multiplications and 2 additions. If the input/output relationship is transformed as follows:
Only 3 multiplications and 3 additions are needed.
Reference address of this article: http://www.eepw.com.cn/article/247104.htm
The sum and difference are known coefficients and can be obtained by looking up the table.
From this calculation, we can know that the Loeffler algorithm for an 8-point DCT requires a total of 11 multiplications and 29 additions. From the perspective of DSP assembly language programming, an algebraic operation should include three steps: fetching operands, calculating, and storing operands. Therefore, the algorithm requires about 120 instructions. The C5409 has strong computing power, supports single-cycle addition/subtraction and single-cycle multiplication operations, and can complete two 16-bit addition/subtraction operations in a single cycle. In addition, there are three sets of data buses in the DSP, so long operands (32 bits) can be used for long-word operations. In long-word instructions, the given address always accesses the high 16-bit operand, so only 5 long-word instructions are needed to calculate 2 butterfly operations. In addition to taking other optimization measures, it takes about 90 instructions to complete the Loeffler algorithm.
Although the Loeffler algorithm has the smallest amount of computation, it is not optimal for the system in this paper because the algorithm is designed for high-level languages and does not take advantage of the characteristics of assembly language and DSP hardware. This paper proposes a fast DCT algorithm based on DSP multiplication and accumulation unit.
The multiplication and accumulation unit of DSP can complete one multiplication and one accumulation operation in a single cycle. If the assembly instruction is used for DCT operation, it will greatly simplify the complexity of the program and reduce the calculation time. The specific algorithm is as follows, using butterfly operation:
From the above expression, we can see that y(0)-y(7) are all multiplication-accumulation operations, and s0-s7 can be obtained by butterfly operation from x(0) to x(7). Therefore, the DCT algorithm is changed from the original four-level operation to two-level, namely the first-level butterfly operation and the second-level multiplication-accumulation operation. The first-level butterfly operation requires a total of 10+4=14 (10 calculation operations and 4 auxiliary operations) instructions. In the second-level operation, each output requires 4+1+1=6 instructions (4 multiplication-accumulation operations, 1 read operation and 1 storage operation), a total of 48 instructions. In this way, it takes 62 instructions to calculate an 8-point DCT, which greatly reduces the operation time, improves the CPU efficiency, and enhances the real-time performance of the system.
Quantization Operation Optimization
This paper proposes an adaptive quantization method based on actual conditions, that is, the quantization stage uses a secondary calculation method. The algorithm is mainly divided into two steps: (1) adaptively processing the transformed image coefficients; (2) constructing a new quantization table. The specific method is as follows:
First, find the average value P(u,v) of the absolute values of the 63 AC coefficients of all 8×8 sub-blocks of the luminance component and the two chrominance components in the frequency domain, where u,v=0…7 is the position information. Next, find the maximum value of the 163 AC coefficient averages, Z1(u,v)=MAX[P1(u,v)], and finally normalize the 63 AC coefficient averages, add the frequency position information, and obtain the correction coefficients of the 63 AC components in the luminance and chrominance quantization tables respectively. The calculation process is:
Thus, we can get the correction form of the quantization table Qpl(u,v)=Q1(u,v)/X1(u,v) to correct the JPEG quantization table.
The corrected quantization table is used as the final quantization table to perform standard JPEG compression on the image to form a compressed file that fully complies with the JPEG format. The decoding process of this algorithm is exactly the same as the standard JPEG decoding process, and it can be seen that it is also the inverse process of the standard JPEG encoding process.
Experimental Results
Fast DCT operation
The algorithm proposed in this paper, Loeffler's DSP optimization algorithm and pure Loeffler algorithm were tested respectively. The results are shown in Table 1. It can be seen that the algorithm proposed in this paper saves about 1/4 of the time compared with Loeffler's DSP optimization algorithm and about half of the time compared with the pure Loeffler algorithm. The effect is very obvious.
Adaptive Quantization
The adaptive quantizer is simulated. This paper uses a standard image of medium complexity as a test image to compare the performance with the basic JPEG system (based on peak signal-to-noise ratio (PSNR)). By simply changing the quantization table in the JPEG standard method to a modified quantization table, the quality of the restored image can be improved at the same compression ratio. Table 2 shows the peak signal-to-noise ratio of the two methods using the JPEG quantization table and the adaptive quantization table at different compression ratios. From the comparison results of the compression ratio and the peak signal-to-noise ratio, it can be seen that the compression ratio of the adaptive quantization JPEG method is slightly higher than that of the standard JPEG method.
Conclusion
The advantages of this system are that it improves the running speed of JPEG, enhances the compression rate and quality of images, and is easy to implement in hardware. This solution can be applied to most occasions where real-time acquisition, compression and storage of video images are required.
Previous article:Design of local meteorological monitoring device for power transmission lines based on DSP and CPLD
Next article:Design of Harmonic Controller Based on DSP
Recommended ReadingLatest update time:2024-11-15 13:26
- Popular Resources
- Popular amplifiers
- Detailed explanation of intelligent car body perception system
- How to solve the problem that the servo drive is not enabled
- Why does the servo drive not power on?
- What point should I connect to when the servo is turned on?
- How to turn on the internal enable of Panasonic servo drive?
- What is the rigidity setting of Panasonic servo drive?
- How to change the inertia ratio of Panasonic servo drive
- What is the inertia ratio of the servo motor?
- Is it better for the motor to have a large or small moment of inertia?
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Wi-Fi 8 specification is on the way: 2.4/5/6GHz triple-band operation
- Wi-Fi 8 specification is on the way: 2.4/5/6GHz triple-band operation
- Vietnam's chip packaging and testing business is growing, and supply-side fragmentation is splitting the market
- Vietnam's chip packaging and testing business is growing, and supply-side fragmentation is splitting the market
- Three steps to govern hybrid multicloud environments
- Three steps to govern hybrid multicloud environments
- Microchip Accelerates Real-Time Edge AI Deployment with NVIDIA Holoscan Platform
- Microchip Accelerates Real-Time Edge AI Deployment with NVIDIA Holoscan Platform
- Melexis launches ultra-low power automotive contactless micro-power switch chip
- Melexis launches ultra-low power automotive contactless micro-power switch chip
- How much is a person who understands hardware + PCB design + Linux + FPGA worth?
- lpc824 save the brick!!!
- Winter vacation is almost over
- Diode limiter circuit
- Can I put port p0 in an array?
- Several minor issues in DSP debugging process
- RFID card reader help
- Amazon's smart assistant persuades its owner to commit suicide. Is this a sign of the war between man and machine?
- The application channels for multiple development boards have been reopened. Come and apply if you are interested!
- Learn about RTOS in ten minutes!