Basic knowledge of H.264 video encoding

Publisher:平安幸福Latest update time:2011-04-23 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

1. Development History of Video Coding Technology

Video coding technology is basically the introduction of two series of international video coding standards, MPEG-x developed by ISO/IEC and H.26x developed by ITU-T. From H.261 video coding recommendation to H.262/3, MPEG-1/2/4, etc., there is a common goal to be pursued, that is, to obtain the best possible image quality at the lowest possible bit rate (or storage capacity). Moreover, with the increasing market demand for image transmission, the problem of how to adapt to different channel transmission characteristics has become increasingly apparent. Therefore, the two major international standardization organizations, IEO/IEC and ITU-T, jointly developed a new video standard H.264 to solve these problems.
H.261 is the earliest video coding recommendation, with the purpose of standardizing video coding technology in conference television and videophone applications on ISDN networks. The algorithm it uses combines a hybrid coding method of inter-frame prediction that can reduce temporal redundancy and DCT transform that can reduce spatial redundancy. Matching the ISDN channel, its output bit rate is p×64kbit/s. When p is small, only images with low definition can be transmitted, which is suitable for face-to-face video calls; when p is large (such as p>6), conference video images with better definition can be transmitted. H.263 recommends a low-bitrate image compression standard, which is a technical improvement and expansion of H.261 and supports applications with a bit rate of less than 64kbit/s. But in essence, H.263 and the subsequent H.263+ and H.263++ have developed into recommendations that support full-bitrate applications, as can be seen from the fact that it supports a wide range of image formats, such as Sub-QCIF, QCIF, CIF, 4CIF and even 16CIF.

The bit rate of the MPEG-1 standard is about 1.2Mbit/s, which can provide 30 frames of CIF (352×288) quality images. It is designed for video storage and playback of CD-ROM discs. The basic algorithm of the video encoding part of the MPEG-1 standard is similar to that of H.261/H.263, and also adopts motion-compensated inter-frame prediction, two-dimensional DCT, VLC run-length coding and other measures. In addition, the concepts of intra-frame (I), predicted frame (P), bidirectional predicted frame (B) and direct current frame (D) are introduced to further improve the encoding efficiency. Based on MPEG-1, the MPEG-2 standard has made some improvements in improving image resolution and compatibility with digital television. For example, its motion vector accuracy is half a pixel; it distinguishes between "frames" and "fields" in encoding operations (such as motion estimation and DCT); and it introduces coding scalability technologies, such as spatial scalability, temporal scalability and signal-to-noise ratio scalability. The MPEG-4 standard introduced in recent years has introduced coding based on audio-visual objects (AVO), which has greatly improved the interactive ability and coding efficiency of video communication. MPEG-4 also uses some new technologies, such as shape coding, adaptive DCT, and arbitrary shape video object coding. However, the basic video encoder of MPEG-4 still belongs to a type of hybrid encoder similar to H.263.

In short, H.261 is a classic video coding recommendation, and H.263 is its development and will gradually replace it in practice, mainly used in communications, but the numerous options of H.263 often make users at a loss. The MPEG series of standards have developed from applications for storage media to applications for transmission media. The basic framework of its core video coding is consistent with H.261. Among them, the eye-catching "object-based coding" part of MPEG-4 is still difficult to be widely used due to technical barriers. Therefore, the new video coding recommendation H.264 developed on this basis overcomes the weaknesses of both, introduces a new coding method under the framework of hybrid coding, improves coding efficiency, and is oriented to practical applications. At the same time, it is jointly formulated by two major international standardization organizations, and its application prospects should be self-evident.


2. Introduction to H.264
H.264 is a new digital video coding standard developed by the Joint Video Team (JVT) of ITU-T's VCEG (Video Coding Experts Group) and ISO/IEC's MPEG (Moving Picture Coding Experts Group). It is both ITU-T's H.264 and ISO/IEC's MPEG-4 Part 10. Drafts were collected in January 1998, the first draft was completed in September 1999, its test mode TML-8 was developed in May 2001, and the FCD board of H.264 was approved at the 5th JVT meeting in June 2002. It was officially released in March 2003.

H.264 is the same as the previous standards, which is a hybrid coding mode of DPCM plus transform coding. However, it adopts a simple design of "returning to basics" and does not use many options, so it can achieve much better compression performance than H.263++; it has enhanced the adaptability to various channels and adopts a "network-friendly" structure and syntax, which is conducive to the processing of bit errors and packet loss; it has a wider range of application targets to meet the needs of different rates, different resolutions and different transmission (storage) occasions; its basic system is open and does not require copyright.

Technically, the H.264 standard has many shining points, such as unified VLC symbol encoding, high-precision, multi-mode displacement estimation, integer transform based on 4×4 blocks, layered coding syntax, etc. These measures make the H.264 algorithm have a very high coding efficiency. Under the same reconstructed image quality, it can save about 50% of the bit rate compared with H.263. The H.264 code stream structure has strong network adaptability, increased error recovery capability, and can adapt well to IP and wireless network applications.

3. Technical highlights of H.264

1. Layered design

The H.264 algorithm can be conceptually divided into two layers: the video coding layer (VCL) is responsible for efficient video content representation, and the network abstraction layer (NAL) is responsible for packaging and transmitting data in an appropriate manner required by the network. A packet-based interface is defined between VCL and NAL, and packaging and corresponding signaling are part of NAL. In this way, the tasks of high coding efficiency and network friendliness are completed by VCL and NAL respectively.

The VCL layer includes block-based motion compensation hybrid coding and some new features. Like previous video coding standards, H.264 does not include pre-processing and post-processing functions in the draft, which can increase the flexibility of the standard.

NAL is responsible for encapsulating data using the segmentation format of the underlying network, including framing, signaling of logical channels, use of timing information or sequence end signals, etc. For example, NAL supports the transmission format of video on circuit switching channels and the format of video transmission on the Internet using RTP/UDP/IP. NAL includes its own header information, segment structure information and actual payload information, that is, the upper layer VCL data. (If data segmentation technology is used, the data may consist of several parts).

2. High-precision, multi-modal motion estimation

H.264 supports motion vectors with 1/4 or 1/8 pixel accuracy. At 1/4 pixel accuracy, a 6-tap filter can be used to reduce high-frequency noise. For motion vectors with 1/8 pixel accuracy, a more complex 8-tap filter can be used. When performing motion estimation, the encoder can also select an "enhanced" interpolation filter to improve the prediction effect.

In H.264 motion prediction, a macroblock (MB) can be divided into different sub-blocks as shown in Figure 2, forming 7 different modes of block size. This multi-mode flexible and detailed division is more in line with the shape of the actual moving object in the image, greatly improving the accuracy of motion estimation. In this way, each macroblock can contain 1, 2, 4, 8 or 16 motion vectors.

In H.264, the encoder is allowed to use more than one previous frame for motion estimation, which is called multi-frame reference technology. For example, 2 or 3 frames of reference frames have just been encoded, and the encoder will select the frame that can give a better prediction for each target macroblock, and indicate which frame is used for prediction for each macroblock.

3. Integer transform of 4×4 blocks

H.264 is similar to previous standards and uses block-based transform coding for residuals, but the transform is an integer operation rather than a real number operation, and its process is basically similar to DCT. The advantage of this method is that it allows transforms and inverse transforms of the same precision in the encoder and decoder, which facilitates the use of simple fixed-point operations. In other words, there is no "inverse transform error" here. The unit of transform is 4×4 blocks, rather than the 8×8 blocks commonly used in the past. Since the size of the transform block is reduced, the division of moving objects is more accurate, so that not only the amount of transform calculation is relatively small, but also the connection error at the edge of the moving object is greatly reduced. In order to prevent the small-size block transform method from generating grayscale differences between blocks for large smooth areas in the image, the DC coefficients of the 16 4×4 blocks of luminance data of the macroblock in the frame (one for each small block, a total of 16) can be transformed by a second 4×4 block, and the DC coefficients of the 4 4×4 blocks of chrominance data (one for each small block, a total of 4) can be transformed by a 2×2 block.

In order to improve the bit rate control capability, H.264 controls the amplitude of the quantization step size to about 12.5% ​​instead of changing it at a constant increment. The normalization of the transform coefficient amplitude is processed in the inverse quantization process to reduce the complexity of the calculation. In order to emphasize the realism of color, a smaller quantization step size is used for the chrominance coefficient.

4. Unified VLC

There are two methods of entropy coding in H.264. One is to use a unified VLC (UVLC: Universal VLC) for all symbols to be encoded, and the other is to use content-adaptive binary arithmetic coding (CABAC: Context-Adaptive Binary Arithmetic Coding). CABAC is an option, and its coding performance is slightly better than UVLC, but the computational complexity is also high. UVLC uses a codeword set of infinite length, and the design structure is very regular. Different objects can be encoded with the same code table. This method can easily generate a codeword, and the decoder can easily identify the prefix of the codeword. UVLC can quickly obtain resynchronization when a bit error occurs.

5. Intra-frame prediction

In the previous H.26x series and MPEG-x series standards, inter-frame prediction was used. In H.264, intra-frame prediction can be used when encoding Intra images. For each 4×4 block (except for edge blocks that are specially handled), each
pixel can be predicted by the different weighted sums of the 17 closest previously encoded pixels (some weights can be 0), that is, the 17 pixels in the upper left corner of the block where the pixel is located. Obviously, this intra-frame prediction is not a prediction coding algorithm performed in time, but in the spatial domain, which can remove the spatial redundancy between adjacent blocks and achieve more effective compression.

As shown in Figure 4, a, b, ..., p in a 4×4 block are 16 pixels to be predicted, and A, B, ..., P are coded pixels. For example, the value of point m can be predicted by (J+2K+L+2)/4, or by (A+B+C+D+I+J+K+L)/8, etc. Depending on the selected prediction reference point, there are 9 different modes for brightness, but there is only 1 mode for intra-frame prediction of chrominance.

6. For IP and wireless environments

The H.264 draft includes tools for error correction, which facilitates the robustness of compressed video transmission in environments prone to bit errors and packet loss, such as mobile channels or IP channels.

In order to resist transmission errors, time synchronization in H.264 video streams can be achieved by using intra-frame image refresh, and spatial synchronization is supported by slice structured coding. At the same time, in order to facilitate resynchronization after bit errors, certain resynchronization points are provided in the video data of an image. In addition, intra-frame macroblock refresh and multiple reference macroblocks allow the encoder to consider not only coding efficiency but also the characteristics of the transmission channel when deciding the macroblock mode.

In addition to using changes in the quantization step size to adapt to the channel bit rate, H.264 often uses data partitioning to cope with changes in the channel bit rate. Generally speaking, the concept of data partitioning is to generate video data with different priorities in the encoder to support the quality of service (QoS) in the network. For example, the syntax-based data partitioning method is used to divide each frame of data into several parts according to its importance, which allows less important information to be discarded when the buffer overflows. A similar temporal data partitioning method can also be used to achieve this by using multiple reference frames in P frames and B frames.

In wireless communication applications, we can support large bit rate changes in wireless channels by changing the quantization accuracy or spatial/temporal resolution of each frame. However, in the case of multicast, it is impossible to require the encoder to respond to various bit rates. Therefore, unlike the fine granular scalability (FGS) method used in MPEG-4 (which is relatively inefficient), H.264 uses stream switching SP frames instead of hierarchical coding.

4. H.264 Performance Comparison

TML-8 is a test mode for H.264, which is used to compare and test the video coding efficiency of H.264. The PSNR provided by the test results clearly shows that the results of H.264 are significantly superior to the performance of MPEG-4 (ASP: Advanced Simple Profile) and H.263++ (HLP: High Latency Profile).

The PSNR of H.264 is significantly better than that of MPEG-4 (ASP) and H.263++ (HLP). In the comparison test of 6 rates, the PSNR of H.264 is 2dB higher than that of MPEG-4 (ASP) on average and 3dB higher than that of H.263 (HLP) on average. The 6 test rates and their related conditions are: 32 kbit/s rate, 10f/s frame rate and QCIF format; 64 kbit/s rate, 15f/s frame rate and QCIF format; 128 kbit/s rate, 15f/s frame rate and CIF format; 256 kbit/s rate, 15f/s frame rate and QCIF format; 512 kbit/s rate, 30f/s frame rate and CIF format; 1024 kbit/s rate, 30f/s frame rate and CIF format.

Reference address:Basic knowledge of H.264 video encoding

Previous article:What does h.264 mean?
Next article:Basic Classification of Digital Video Recorders (DVRs)

Latest Analog Electronics Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号