Design of video controller in H.264 video decoding chip-EEWORLD

Collect

　　introduction

　　H.264 is a new video compression standard jointly studied by the ITU-T VCEG organization and the ISO/IEC MPEG organization. Compared with other video compression algorithms, it has the characteristics of high compression ratio and complex algorithm. Due to the complexity of the encoding algorithm, the system has very strict requirements on image decoding speed and power consumption. Therefore, the H.264 decoding dedicated chip design was adopted when designing the decoder. For a large design project, the top-down (TOP-DOWM) design method is generally used to divide each functional module into sub-modules. The video controller module is the data interface between the chip and the display platform. It plays an important role in verifying the success of the chip design. It is necessary to divide it into a separate sub-module. In order to improve the success rate of the design, FPGA-based prototype verification is used in the early stage of design. The FPGA prototype verification platform of the entire system is shown in Figure 1. The platform is divided into two parts, hardware design and software decoding based on RISC CPU. The two parts work together to verify the decoding results of the software and hardware and accelerate the entire decoding. process.

　　Figure 1 FPGA prototype verification platform for H.264 decoding chip

　　Figure 2 Structural diagram of output video control module

　　Design and implementation of video control module

　　Video control module principle block diagram and functional analysis

　　The structural block diagram of the output video control module is shown in Figure 2. This module has two clock domains: system clock domain and display clock domain. The system clock frequency is fixed at 166MHz according to the selected SDRAM type; for a high-definition TV with a resolution of 1280×720, the display clock domain can use a frequency of about 70 MHz.

　　The system clock domain contains two external interfaces: the system interface, which mainly includes instructions issued by the upper-layer system and feedback information from the output control module; the DRAM interface, which includes the signals provided by the data dedicated bus for the output control module, and is used to request display from the DRAM image data.

　　The display input control submodule (Disp In Ctrl) in the system clock domain is first used to receive the StartDisp and EndDisp signals from the system to start or turn off the output display function of video data, and at the same time send out the frame image display completion signal (FrameDone). Notify the system to change the address information (ImageAddress) of the next image; secondly, it is used to issue a request to the DRAM to read the image data that needs to be displayed through a dedicated data channel; it also controls the input multiplexing module (Input MUX), Thus completing the task of writing data to the on-chip SRAM; finally, this module interacts with the information of the display clock domain, sends the display enable signal (DispEn Sys) to the clock domain synchronization module (Clk Domain Sync), and controls the opening and closing of the image display. closure. Another sub-module of the system clock domain - the input multiplexing module will select the on-chip dual-port SRAM according to certain rules, control the memory address, and complete the task of writing display image data to the memory.

　　The display clock domain contains an external display device interface, which mainly includes control signals for display and converted data information. The display clock domain contains two sub-modules, one is the output multiplexing sub-module (Output MUX), which is used to realize the selection and address control of dual-port SRAM, and read the image data to be displayed according to certain rules; it also performs data processing of packaging. Another sub-module is the display output control module (Disp Out Ctrl), which is used to control the TV encoder, convert YUV signals to RGB signals, and scale digital images. The signals include display clock, horizontal synchronization, frame synchronization and RGB image data, etc.; it also needs to control the output multiplexing module to read the display data; finally, it needs to interact with the system clock domain to coordinate the transfer of data between the two clock domains.

　　Special technology used in video control modules

　　The clock domain synchronization module is the focus of the output control module design. It is mainly responsible for the control signal transmission between the two clock domains. The design of signal transmission across clock domains is more troublesome, so the signals to be transmitted are divided into two categories in the design: data signals and control signals. The control signals are transmitted through the clock domain synchronization module. The number of signals that need to be transmitted across clock domains is simplified. In the final solution, only two signals are needed: The WrDone signal is sent by the system clock domain. The notification shows that the data in a dual-port SRAM in the clock domain has been updated and can be read and Perform display output; the RdDone signal is sent by the display clock domain to notify the system clock domain that the data in a certain dual-port SRAM has been displayed and its internal data can be updated. The transmission of signals between different clock domains requires processing measures to eliminate metastability, which allows the signal to be latched and output through a two-level register, as shown in Figure 3.

　　Figure 3 Cross-clock domain signal metastability elimination circuit

　　Figure 4 Hardware implementation block diagram of video output sub-module

　　There are two points worth noting in the design. First, the clock domain synchronization circuit should be placed in an independent module to ensure the optimization of comprehensive tools, correct timing analysis, and facilitate circuit analysis and debugging; at the same time, in order to achieve the target of the signal The clock domain collects signal changes, and the control signals transmitted in the design are represented by level signals.

　　Another signal to be transmitted between clock domains is data signals. Since the number of data signals is large and changes rapidly, their transmission is realized through dual-port DPRAM. Dual-port DPRAM requires that the read and write ports operate on the same storage address at a certain time interval, otherwise data transmission errors will occur and the hardware circuit may even be damaged. Therefore, in order to avoid read and write conflicts in DPRAM, a "ping-pong" buffering method is used in the design. Two DPRAMs alternately access the decoded brightness or color difference data for display: When the display part reads the data in one DPRAM, the system Write the data to be displayed next to another DPRAM. When the data is read, the two DPRAMs are exchanged. This part is implemented using a total of 4 DPRAM blocks, 2 blocks transmit brightness signals, and 2 blocks transmit color difference signals.

　　The following analyzes the format conversion algorithm, image scaling processing algorithm and their hardware implementation used in the video controller display output sub-module.

　　Display data format conversion analysis

　　According to the Sil 164 DVI signal encoding chip data and referring to the YUV → RGB conversion format given in the H.264 video encoding standard, the fixed conversion algorithm used in the design is as follows:

　　The above equation has been fixed-pointed and converted using shifting and addition methods, as shown in the following equation:

　　In hardware design, YUV and RGB signals are represented by 8-bit unsigned numbers, and intermediate variables are 12-bit to ensure accuracy. Finally, the calculated RGB result needs to be clipped within the range of 0~255. The power exponent and division operations in the formula are all implemented through shifting.

　　Algorithmic Analysis of Digital Image Scaling

　　For an original image with a resolution of M×N, the YUV values of all sampling points can be expressed as an M×N order matrix:

　　Pixel points are represented by f(m,n), where 0≤m≤M, 0≤n. The essence of scaling a digital image is to resample a digital image. Assume that the scaling factors for scaling the height and width of the original digital image are S1 and S2 respectively. Then according to the Nyquist sampling law, we should use The new horizontal and vertical sampling period is 740)this.width=740" border=undefined> Resample the original digital image. Obtain the scaled digital image f′(m′,n′):

　　It can be seen from the above formula that each reconstructed pixel f′(m′,n′) in the scaled digital image is the weighted sum of each pixel of the original digital image. If this formula is used to design hardware directly, the amount of calculation will be very large. In order to simplify the design difficulty and save chip costs, the above formula can be simplified without having a significant impact on image quality. The reconstructed image pixel value mainly depends on the value of the product of two sampling functions. In practice, only the point whose value is equal to 1 is used, that is, the point where it is satisfied. To further simplify, it can be taken to mean that the logarithm is rounded to an integer, and the simplified expression is obtained: f′(m′,n′)=f(m,n).

　　Hardware implementation of digital image format conversion and scaling

　　When designing this project, the display device uses a high-definition TV with a resolution of 1280×720. When output to the high-definition TV for display, the image center alignment method is used. When the decoded digital image data is sent to a high-definition television for display, without image scaling, the decoded digital image is placed in the middle of the display screen, and other places are filled with black. When scaling, follow the rules above. First, the front end of the video controller output module arranges the data sent in according to the progressive scan to perform data format conversion, and then puts the pixel data with RGB non-zero (that is, not black) in two turns according to the rules of each frame and progressive scan. The same size of the block is in the on-chip cache RAM, as shown in Figure 4.

　　Its working method is the same as the previous DPRAM. After reading the address of the data in RAM1 or RAM2, the row and column address of the pixel value of the point can be obtained through the address decoder, that is, the values of m and n can be obtained. Send the m and n values to the image scaling processing unit, obtain new image data and new image data address through scaling processing, and then obtain the address output in the output RAM3 according to the progressive scan format through the write address decoder. This address Used to store data after format conversion. Finally, the RGB data required for display can be directly output from RAM3 that stores the conversion data.

　　Conclusion

　　After the design is completed, the video controller module is synthesized with the synthesis tool Synplify 7.6, and the operating frequency of 80.3MHz can be obtained. Together with the front-end decoding module, it is downloaded to Xilinx's Virtex-II 6000 FPGA and integrated into the H.264 video decoding verification platform. The working frequency can reach 34MHz, and the effect is better when playing images on high-definition TVs. good.

Keywords：module Reference address：Design of video controller in H.264 video decoding chip

Previous article：How to use microcontroller to implement caller ID decoding
Next article：Multimedia codec chip selection strategy

Recommended ReadingLatest update time:2024-11-16 20:52

IIC protocol based on stm8 --- DS3231 clock module reading

1. Overview The previous protocol has explained in detail the various functions of the IIC protocol. Through the previous protocol, I believe everyone has a certain understanding of IIC. In this blog, I used the IIC protocol to implement DS3231 sensing. Friends who are just learning can buy one to try it out. It is re

[Microcontroller]

IIC protocol based on stm8 --- DS3231 clock module reading

Use DS18B20 to measure temperature and display it through LCD1602 Proteus simulation + DS1302 clock module

main.c #include reg52.h #include onewire.h #include LCD.h #include ds1302.h #define uchar unsigned char #define uint unsigned int uchar code Write_addr = {0x80, 0x82, 0x84, 0x86, 0x88, 0x8a, 0x8c}; // write address uchar code Read_addr = {0x81,0x83,0x85,0x87,0x89,0x8b,0x8d}; // read address uchar Time =

[Microcontroller]

Use DS18B20 to measure temperature and display it through LCD1602 Proteus simulation + DS1302 clock module

LTM8025 - 36V, 3A Step-Down μModule Converter

describe: The LTM ® 8025 is a 36V IN , 3A step-down μModule ® converter. The switching controller, power switches, inductor and all support components are built into the package. The LTM8025 operates over an input voltage range of 3.6V to 36V, supports an output voltage range of 0

[Power Management]

LTM8025 - 36V, 3A Step-Down μModule Converter

Clock synchronization technology between C8051F120 and RS422 information line

Introduction As the speed of instructions continues to increase, the performance of microcontrollers has been greatly improved in various aspects, such as clock synchronization between multiple microcontrollers. The execution speed of Silicon Labs' C8051F series microcontrollers can reach up to 100MIPS, which ma

[Microcontroller]

Clock synchronization technology between C8051F120 and RS422 information line

AVR multi-function experiment box DS1302 clock experiment source code resources

/****************************************************************************************************************** Program function: DS1302 clock experiment Development environment: WINAVR/GCC20100110 Hardware environment: eeskill multi-function development learning board/experiment box (2017 version): ATMEGA1

[Microcontroller]

AVR multi-function experiment box DS1302 clock experiment source code resources

STM32F103 clock configuration process

Preface: Generally speaking, the clock configuration of a microcontroller is the first and most important step in a microcontroller program. At this time, we need to consider the following issues. 1. Which clock source is used for the system clock? 2. What is the system clock frequency? 3. What is the clock frequency

[Microcontroller]

Design of bit synchronization clock extraction circuit based on CPLD

introduction Asynchronous serial communication is one of the most commonly used data information transmission methods in modern electronic systems. Generally speaking, in order to correctly send and receive asynchronous serial data, it must be synchronized with the transmitted code element. The synchronous clock sig

[Embedded]

Design of electronic clock printed circuit board using Prote199se

Abstract: In the design and development process of printed circuit boards, the layout of components and the routing of components on the printed circuit board are the most important performance indicators of the printed circuit board design performance. The electronic clock circuit printed circuit board developed by

[Industrial Control]

Design of electronic clock printed circuit board using Prote199se

Popular Resources
Popular amplifiers