introduction
H.264 is a new video compression standard jointly studied by the ITU-T VCEG organization and the ISO/IEC MPEG organization. Compared with other video compression algorithms, it has the characteristics of high compression ratio and complex algorithm. Due to the complexity of the encoding algorithm, the system has very strict requirements on image decoding speed and power consumption. Therefore, the H.264 decoding dedicated chip design was adopted when designing the decoder. For a large design project, the top-down (TOP-DOWM) design method is generally used to divide each functional module into sub-modules. The video controller module is the data interface between the chip and the display platform. It plays an important role in verifying the success of the chip design. It is necessary to divide it into a separate sub-module. In order to improve the success rate of the design, FPGA-based prototype verification is used in the early stage of design. The FPGA prototype verification platform of the entire system is shown in Figure 1. The platform is divided into two parts, hardware design and software decoding based on RISC CPU. The two parts work together to verify the decoding results of the software and hardware and accelerate the entire decoding. process.
Figure 1 FPGA prototype verification platform for H.264 decoding chip
Figure 2 Structural diagram of output video control module
Design and implementation of video control module
Video control module principle block diagram and functional analysis
The structural block diagram of the output video control module is shown in Figure 2. This module has two clock domains: system clock domain and display clock domain. The system clock frequency is fixed at 166MHz according to the selected SDRAM type; for a high-definition TV with a resolution of 1280×720, the display clock domain can use a frequency of about 70 MHz.
The system clock domain contains two external interfaces: the system interface, which mainly includes instructions issued by the upper-layer system and feedback information from the output control module; the DRAM interface, which includes the signals provided by the data dedicated bus for the output control module, and is used to request display from the DRAM image data.
The display input control submodule (Disp In Ctrl) in the system clock domain is first used to receive the StartDisp and EndDisp signals from the system to start or turn off the output display function of video data, and at the same time send out the frame image display completion signal (FrameDone). Notify the system to change the address information (ImageAddress) of the next image; secondly, it is used to issue a request to the DRAM to read the image data that needs to be displayed through a dedicated data channel; it also controls the input multiplexing module (Input MUX), Thus completing the task of writing data to the on-chip SRAM; finally, this module interacts with the information of the display clock domain, sends the display enable signal (DispEn Sys) to the clock domain synchronization module (Clk Domain Sync), and controls the opening and closing of the image display. closure. Another sub-module of the system clock domain - the input multiplexing module will select the on-chip dual-port SRAM according to certain rules, control the memory address, and complete the task of writing display image data to the memory.
The display clock domain contains an external display device interface, which mainly includes control signals for display and converted data information. The display clock domain contains two sub-modules, one is the output multiplexing sub-module (Output MUX), which is used to realize the selection and address control of dual-port SRAM, and read the image data to be displayed according to certain rules; it also performs data processing of packaging. Another sub-module is the display output control module (Disp Out Ctrl), which is used to control the TV encoder, convert YUV signals to RGB signals, and scale digital images. The signals include display clock, horizontal synchronization, frame synchronization and RGB image data, etc.; it also needs to control the output multiplexing module to read the display data; finally, it needs to interact with the system clock domain to coordinate the transfer of data between the two clock domains.
Special technology used in video control modules
The clock domain synchronization module is the focus of the output control module design. It is mainly responsible for the control signal transmission between the two clock domains. The design of signal transmission across clock domains is more troublesome, so the signals to be transmitted are divided into two categories in the design: data signals and control signals. The control signals are transmitted through the clock domain synchronization module. The number of signals that need to be transmitted across clock domains is simplified. In the final solution, only two signals are needed: The WrDone signal is sent by the system clock domain. The notification shows that the data in a dual-port SRAM in the clock domain has been updated and can be read and Perform display output; the RdDone signal is sent by the display clock domain to notify the system clock domain that the data in a certain dual-port SRAM has been displayed and its internal data can be updated. The transmission of signals between different clock domains requires processing measures to eliminate metastability, which allows the signal to be latched and output through a two-level register, as shown in Figure 3.
Figure 3 Cross-clock domain signal metastability elimination circuit
Figure 4 Hardware implementation block diagram of video output sub-module
There are two points worth noting in the design. First, the clock domain synchronization circuit should be placed in an independent module to ensure the optimization of comprehensive tools, correct timing analysis, and facilitate circuit analysis and debugging; at the same time, in order to achieve the target of the signal The clock domain collects signal changes, and the control signals transmitted in the design are represented by level signals.
Another signal to be transmitted between clock domains is data signals. Since the number of data signals is large and changes rapidly, their transmission is realized through dual-port DPRAM. Dual-port DPRAM requires that the read and write ports operate on the same storage address at a certain time interval, otherwise data transmission errors will occur and the hardware circuit may even be damaged. Therefore, in order to avoid read and write conflicts in DPRAM, a "ping-pong" buffering method is used in the design. Two DPRAMs alternately access the decoded brightness or color difference data for display: When the display part reads the data in one DPRAM, the system Write the data to be displayed next to another DPRAM. When the data is read, the two DPRAMs are exchanged. This part is implemented using a total of 4 DPRAM blocks, 2 blocks transmit brightness signals, and 2 blocks transmit color difference signals.
The following analyzes the format conversion algorithm, image scaling processing algorithm and their hardware implementation used in the video controller display output sub-module.
Display data format conversion analysis
According to the Sil 164 DVI signal encoding chip data and referring to the YUV → RGB conversion format given in the H.264 video encoding standard, the fixed conversion algorithm used in the design is as follows:
The above equation has been fixed-pointed and converted using shifting and addition methods, as shown in the following equation:
In hardware design, YUV and RGB signals are represented by 8-bit unsigned numbers, and intermediate variables are 12-bit to ensure accuracy. Finally, the calculated RGB result needs to be clipped within the range of 0~255. The power exponent and division operations in the formula are all implemented through shifting.
Algorithmic Analysis of Digital Image Scaling
For an original image with a resolution of M×N, the YUV values of all sampling points can be expressed as an M×N order matrix:
Pixel points are represented by f(m,n), where 0≤m≤M, 0≤n. The essence of scaling a digital image is to resample a digital image. Assume that the scaling factors for scaling the height and width of the original digital image are S1 and S2 respectively. Then according to the Nyquist sampling law, we should use The new horizontal and vertical sampling period is 740)this.width=740" border=undefined> Resample the original digital image. Obtain the scaled digital image f′(m′,n′):
It can be seen from the above formula that each reconstructed pixel f′(m′,n′) in the scaled digital image is the weighted sum of each pixel of the original digital image. If this formula is used to design hardware directly, the amount of calculation will be very large. In order to simplify the design difficulty and save chip costs, the above formula can be simplified without having a significant impact on image quality. The reconstructed image pixel value mainly depends on the value of the product of two sampling functions. In practice, only the point whose value is equal to 1 is used, that is, the point where it is satisfied. To further simplify, it can be taken to mean that the logarithm is rounded to an integer, and the simplified expression is obtained: f′(m′,n′)=f(m,n).
Hardware implementation of digital image format conversion and scaling
When designing this project, the display device uses a high-definition TV with a resolution of 1280×720. When output to the high-definition TV for display, the image center alignment method is used. When the decoded digital image data is sent to a high-definition television for display, without image scaling, the decoded digital image is placed in the middle of the display screen, and other places are filled with black. When scaling, follow the rules above. First, the front end of the video controller output module arranges the data sent in according to the progressive scan to perform data format conversion, and then puts the pixel data with RGB non-zero (that is, not black) in two turns according to the rules of each frame and progressive scan. The same size of the block is in the on-chip cache RAM, as shown in Figure 4.
Its working method is the same as the previous DPRAM. After reading the address of the data in RAM1 or RAM2, the row and column address of the pixel value of the point can be obtained through the address decoder, that is, the values of m and n can be obtained. Send the m and n values to the image scaling processing unit, obtain new image data and new image data address through scaling processing, and then obtain the address output in the output RAM3 according to the progressive scan format through the write address decoder. This address Used to store data after format conversion. Finally, the RGB data required for display can be directly output from RAM3 that stores the conversion data.
Conclusion
After the design is completed, the video controller module is synthesized with the synthesis tool Synplify 7.6, and the operating frequency of 80.3MHz can be obtained. Together with the front-end decoding module, it is downloaded to Xilinx's Virtex-II 6000 FPGA and integrated into the H.264 video decoding verification platform. The working frequency can reach 34MHz, and the effect is better when playing images on high-definition TVs. good.
Previous article:How to use microcontroller to implement caller ID decoding
Next article:Multimedia codec chip selection strategy
Recommended ReadingLatest update time:2024-11-16 20:52
- Innovation is not limited to Meizhi, Welling will appear at the 2024 China Home Appliance Technology Conference
- Enjoy big-screen gaming anytime, anywhere: Making portable 4K UHD 240Hz gaming projector a reality
- AMD surpasses Intel: CPU shipments surge in Q3 this year
- Exynos is losing ground, Samsung plans to use Qualcomm chips in home appliances
- Intel and 50 partners unveiled a full range of 30 notebook and desktop AI PCs equipped with Intel Core Ultra (2nd Generation)
- Innovation leads the new trend of mobile refrigeration GMCC will present new products at 2024 CIAAR
- Lenovo and NVIDIA expand collaboration to jointly launch new liquid-cooled AI servers
- Ceiling fan solution based on XMC1302
- Gartner: Global AI PC shipments are expected to account for 43% of total PC shipments in 2025
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Looking for a microcontroller model
- Is the STM32 library function HAL_UART_Receive blocking?
- 【DIY Creative LED V2】Complete program
- The Engineer's Way of Quanhui, the author of "FPGA Timing Constraints and Analysis"
- ALTERA cyclone V sockit development board for sale at low price and can be exchanged for E coins
- GD32L233C-START Review——04. Comparison between analog IIC and hardware IIC driving OLED
- What is the driving voltage in LCD segment code screen?
- [Smart Cup Holder] 04-Add hardware support for TouchGFX interface
- [RVB2601 Creative Application Development] RVB2601 Development Board - W800 Module Firmware Update Method
- Op amp positive and negative power supply problem