0 Introduction
H.264/AVC is the latest international video coding standard jointly formulated by ITU-TVCEG and ISO/IEC MPEG, and is one of the hot issues in the current image communication research field. The video coding layer (VCL) of H.264 adopts many new technologies, which greatly improves the coding performance. Compared with previous video coding and decoding standards, H.264 has better image quality at the same bit rate, which makes H.264 more widely used in low-bit-rate video applications such as wireless communication and network transmission. However, this is at the cost of increased complexity, so H.264 faces huge challenges in real-time video coding and transmission applications. Using a high-performance digital signal processor (DSP) to implement an H.264 real-time encoder is a fast and effective method, which helps the rapid promotion and application of the H.264 video standard. The ADSP-BF561 processor has excellent performance, a main frequency of 600 MHz, and integrates a set of universal digital image processing peripheral device interfaces, thus creating a perfect system-level on-chip solution for multimedia and image applications. Aiming at the needs of low-bit-rate video transmission, this paper studies and implements a real-time encoder based on H.264. 264 standard video encoding system, and also discusses the implementation and optimization method of H. 264 software encoder on DSP.
1 Introduction to H.264 encoding algorithm and ADSP-BF561
In the actual development process, this paper has done a lot of optimization work on the algorithm characteristics of H. 264 and the structural characteristics of the ADSP-BF561 dual-core processor, thereby greatly improving the encoding speed while ensuring the encoding accuracy. The following is a brief introduction to the H. 264 video encoding algorithm and the ADSP-BF561 dual-core processor system.
1.1 H.264 encoding algorithm
H.264 is a new generation video coding standard jointly developed by ISO and ITU, which has a high compression ratio and good robustness. Its overall framework is shown in Figure 1.
On the basis of inheriting the original video coding standard, H. 264 has made many improvements, including the introduction of 9 modes of intra-frame prediction for 4×4 sub-blocks and 16x16 sub-blocks. The introduction of intra-frame mode is to be used together with transform coding to eliminate spatial redundancy, thereby greatly improving coding efficiency. In inter-frame mode, H. 264 can support multi-size motion estimation and compensation. The size of the block in its inter-frame prediction is not fixed at 8x8, but can be from 4×4 to 16x16, and includes blocks of different lengths and widths (a total of 7 types), and supports multiple reference frames, so the prediction performance can be greatly improved. In addition, H. 264 also uses integer DCT transform to reduce the amount of calculation, and uses adaptive arithmetic coding to improve coding efficiency, and can use filters to eliminate the blocking effect caused by low-bit quantization. In fact, the efficiency of H. 264 can be improved by 50% compared with existing coding technologies.
1.2 ADSP-BF561 chip structure
ADSP-BF561 is a dual-core 750 MHz processor with a symmetric multi-processing (SMP) system structure. Its SMP structure can provide users with higher performance and greater design flexibility in terms of integration and partitioning of signal processing and control functions. The system structure of ADSP-BF561 is shown in Figure 2. It contains two cores, coreA and coreB, and the processing frequency of each core can reach 750MHz. Both cores have their own independent 32KB L1 instruction memory (16KB Cache/SRAM) and 64KB L1 data memory (32KB Cache/SDRAM), and can share 128KB L2 memory. When the two cores access different memories, their speeds are significantly different, among which access to L1 memory is the fastest, followed by L2, and access to off-chip memory and devices is the slowest.
Due to the difference in memory access speed, data exchange between the two cores is best performed directly in the L1 segment, and the IMDMA controller is required. The main function of this DMA controller is to exchange data between the L1 memories of the two cores. The use of the IMDMA controller can increase the data processing rate when accessing off-chip memory with a relatively slow rate or performing data processing operations on L2, thereby improving coding efficiency.
2 Optimization and implementation of H.264 video coding algorithm
The optimization of the encoder is mainly to optimize the P frame encoding process and the ADSP-BF561 dual-core processing system. A reasonable process is conducive to the independence and integrity of various modules, and is also conducive to the optimization or upgrade of a certain module in the future. The dual-core coordinated processing advantage of ADSP-BF561 can further improve its speed.
2.1 Optimization of P frame encoding process
Since the H.264 encoding algorithm is relatively large, optimizing the program details will not actually bring about significant efficiency improvements, so the program flow itself should be adjusted. In the JM86 version of the H.264 encoder, the encoding of I frames and P frames uses the same module, so there are a large number of repeated judgments of intra-frame and inter-frame macroblocks, which limits the encoding speed. The Micro_h264 encoding software model addresses this shortcoming and extracts the encoding of I frames and P frames and encodes them separately. Unfortunately, the micro_h264 encoding software model encodes the macroblocks of a frame of image one by one according to the raster scanning order of the macroblocks in the image, without considering that the macroblocks have different characteristics at different positions in a frame of image, and the use of a unified mode to encode these macroblocks will also generate many judgment conditions, which is not only not conducive to the pipeline operation of the DSP, but also not conducive to the optimization of the module. This paper optimizes the P frame encoding process of micro h264 in view of this shortcoming.
According to the different positions of macroblocks in a frame of image, macroblocks at different positions can be encoded independently. At the same time, according to the different positions of subblocks in macroblocks, they can also be encoded independently.
When a frame image is divided into multiple macroblocks, macroblocks at different positions have different characteristics. Therefore, macroblocks can be classified according to their different positions in a frame image to classify macroblocks with the same coding characteristics into one category. In this way, macroblocks of a frame image can be divided into five categories. Figure 3 shows the macroblock classification diagram.
[page]
By classifying macroblocks, different functions can be called to encode different macroblocks independently, thereby reducing a lot of unnecessary judgments. This will not interrupt the pipeline operation of DSP, achieve the purpose of increasing the speed, and be more targeted when optimizing.
This encoder uses only one reference frame when encoding a P frame, and improves the algorithm used by the micro_h264 encoder software model to traverse the macroblock encoding modes one by one, and instead adopts a macroblock encoding mode fast selection algorithm. The flowchart of P frame encoding is shown in Figure 4.
The software structure should be adjusted according to the characteristics of different operating platforms. The encoder with lower complexity can process multiple macroblocks of different types separately, which can save a lot of repeated judgments in the middle, which can not only improve the encoding speed, but also make the program structure clearer. Moreover, due to the relative independence of each module, it is also conducive to the expansion of the program. Although this increases the amount of code to a certain extent, it can effectively improve the encoding speed.
2.2 Optimization of ADSP-BF561 Dual-Core Processing System
In order to ensure the stable operation of the encoder, this paper positions the core at 600 MHz. If real-time encoding of 4CIF format can be achieved on the basis of 600 MHz, then increasing the core processing frequency can support higher-quality 4CIF format video encoding processing. In order to achieve real-time encoding of 25 frames of images, the number of clock cycles required for each frame is 600 MHz/25=24MHz, that is, one frame needs to be encoded within 24 MHz clock cycles. This is roughly equivalent to processing one frame of CIF format video within 6 MHz. Obviously, it is difficult to achieve real-time encoding processing if one core is used. This paper is different from the working method of most dual-core systems where one core runs the operating system and the other core runs other software. The encoder is placed in two cores for processing at the same time.
When implementing this encoding algorithm on the ADSP-BF561 development board, the main difficulty is how to communicate and coordinate between the two cores. When the two cores run a video encoding program at the same time, data needs to be shared and exchanged. Although the implementation method of using off-chip memory or L2 shared memory to exchange macroblock data is relatively simple and does not require data copying, a large number of operations accessing low-rate memory will greatly affect the execution rate of the program, and thus affect the encoding efficiency of the encoder. Therefore, shared memory cannot be used to exchange macroblock data. This article uses IMDMA to directly exchange data in the L1 data segments of the two cores, and exchanges memory data while encoding, thereby avoiding a large number of operations accessing low-rate storage space and reducing the execution time of the program. Since the amount of message exchange data is small, shared memory can be used, so L2 memory with a relatively fast access rate can be used for access. In fact, the author has implemented the above encoding algorithm on the BF561 development board through optimized programming. The main process of dual-core encoding is shown in Figure 5.
[page]
3 Experimental results and data analysis
After optimization, the encoding performance of H. 264 has been greatly improved, and the real-time encoding processing of 4CIF format video on the BF561 chip has been realized. At the same time, the author also tested the encoding results of the original encoder and the dual-core encoder in the VisualDSP++5.0 compilation environment, and the results are listed in Table 1. In fact, the encoding speed basically depends on the motion of the image and whether the color is rich. From the above data, it can be seen that for different sequences, the encoding speed is also different. The reason why the encoding speed of the Claire sequence is very fast is that the image background is still, and only the shoulders and head move, so the amount of encoding data is less than the optimization results of different sequences in Table 1 (25f/s CIF format), and the encoding speed is higher. In addition, if the image is relatively simple, its encoding speed will also be higher, thus saving encoding time.
The experimental results show that the optimization method proposed in this paper can save a lot of time in H. 264 video coding data processing and can better meet the requirements of 4CIF video sequence real-time coding. For very complex images, 4CIF real-time coding can also be achieved under certain quantization parameters.
4 Conclusion
This paper focuses on the optimization and implementation of the H.264 video encoding algorithm based on the ADSP-BF561 dual-core processor. At the same time, according to the architecture of the ADSP-BF561 dual-core processor, the algorithm flow of the key parts of the encoding is adjusted, and through the data exchange and coordination between the BF561 dual cores, the real-time encoding of 4CIF format video data is realized on the dual cores. Practice has proved that the 25f/s H.264 4CIF video encoding system implemented on the ADSP-BF561 development board using VisualDSP++5.0 simulation software can meet people's needs for video transmission.
Previous article:Application and development of intelligent video analysis technology for security
Next article:Video Conferencing System Hardware and Software Comparison
Recommended ReadingLatest update time:2024-11-16 21:34
- Popular Resources
- Popular amplifiers
- Mir T527 series core board, high-performance vehicle video surveillance, departmental standard all-in-one solution
- Akamai Expands Control Over Media Platforms with New Video Workflow Capabilities
- Tsinghua Unigroup launches the world's first open architecture security chip E450R, which has obtained the National Security Level 2 Certification
- Pickering exhibits a variety of modular signal switches and simulation solutions at the Defense Electronics Show
- Parker Hannifin Launches Service Master COMPACT Measuring Device for Field Monitoring and Diagnostics
- Connection and distance: A new trend in security cameras - Wi-Fi HaLow brings longer transmission distance and lower power consumption
- Smartway made a strong appearance at the 2023 CPSE Expo with a number of blockbuster products
- Dual-wheel drive, Intellifusion launches 12TOPS edge vision SoC
- Toyota receives Japanese administrative guidance due to information leakage case involving 2.41 million pieces of user data
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- μC/OS embedded real-time operating system releases new open source version
- STEVAL-MKI109V3+ Test Basics
- Is there a “great theorist” around you?
- Built-in numeric keyboard to input numeric variable values
- Summary of crystal oscillator problems
- Unboxing of Materials - STM32F7508 & ESP32
- Gigabit Network Contactless Connector-SK202 Evaluation 4: Strict Test Conditions, Reliable Test Data
- Please recommend a free and easy-to-use EDA software
- [HC32F460 Development Board Review] NO.3 Using OLED to display benchmark indicators
- TI C5000 implements FFT