A brief introduction to the international video coding standard MPEG and key technologies of AVS video-EEWORLD

Collect

Since the 1990s, ITU-T and ISO have formulated a series of audio and video coding technology standards (source coding technology standards) and recommendations. The formulation of these standards and recommendations has greatly promoted the practical application and industrialization of multimedia technology. From the perspective of technological progress, the compression capacity of the first-generation source coding technology standards MPEG-1 and MPEG-2 completed in 1994 was 50-75 times. Since the beginning of the new century, the second-generation source coding technology standards have been successively introduced, and the compression efficiency can reach 100-150 times. The second-generation source coding technology standards will reshuffle the international digital television and digital audio and video industry landscape that has just been formed.
There are two major series of audio and video coding standards in the world: the MPEG series standards formulated by ISO/IEC JTC1, and digital television uses the MPEG series standards; the H.26x series of video coding standards and the G.7 series of audio coding standards formulated by ITU for multimedia communications.
CCITT (International Telegraph and Telephone Consultative Committee, now incorporated into the International Telecommunication Union ITU) has proposed a series of audio coding algorithms and international standards since 1984. In 1984, CCITT Study Group 15 established an expert group to study the coding of videophone. After more than five years of research and efforts, CCITT Recommendation H.261 was completed and approved in December 1990. Based on H.261, ITU-T completed the H.263 coding standard in 1996. With little increase in the complexity of the coding algorithm, H.263 can provide better image quality and lower rate. At present, H.263 coding is the most widely used coding method in IP video communication. H.263+ launched by ITU-T in 1998 is the second version of H.263 recommendation, which provides 12 new negotiable modes and other features to further improve the compression coding performance.
MPEG is the abbreviation of the Moving Picture Expert Group established in 1988 by the first joint technical group of the International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC JTC1). Its full name is ISO/IEC JTC1/SC29/WG11, which is responsible for the formulation of international technical standards for the compression, decompression, processing and representation of digital video, audio and other media. Since 1988, the MPEG expert group has held about four international conferences each year, mainly to formulate, revise and develop the MPEG series of multimedia standards. The audio and video coding standards MPEG-1 (1992) and MPEG-2 (1994), the multimedia coding standard based on audiovisual media objects MPEG-4 (1999), the multimedia content description standard MPEG-7 (2001), and the multimedia framework standard MPEG-21. At present, the MPEG series of international standards have become the most influential multimedia technical standards, and have had a profound impact on important products in the information industry such as digital television, audio-visual consumer electronics, and multimedia communications.
The CCITT H.261 standard was started in 1984 and was essentially completed in 1989. It is the forerunner of MPEG. MPEG-1 and H.261 have common data structures, coding tools and syntax elements. However, the two are not completely backward compatible. MPEG-1 can be regarded as an extension of H.261. The development of MPEG-1 started in 1988 and was essentially completed in 1992. MPEG-2 can be regarded as an extension of MPEG-1. It started in 1990 and was essentially completed in 1994. H.263 started in 1992 and the first version was completed in 1995. MPEG-4 (whose video part is based on MPEG-2 and H.263) started in 1993 and the first version was essentially completed in 1998.
The standards that the MPEG expert group has developed and is developing include:
(1) MPEG-1 standard: It officially became an international standard in November 1992, named "Compressed Coding of Moving Pictures and Accompanying Sound for Digital Storage Media at a Rate of 1.5Mbps". The video parameters supported by MPEG-1 are 352 x 240 x 30 frames/second or equivalent.
(2) MPEG-2: It became an international standard (ISO/IEC13818) in November 1994. It is a widely adaptable dynamic image and sound coding scheme. The initial goal was to compress the video and its accompanying audio signals to 10Mb/s. After experiments, it can be applied to the coding range of 1.5-60Mb/s, or even higher. MPEG-2 can be used for compression coding in digital communication, storage, broadcasting, high-definition television, etc. DVD and digital television broadcasting use the MPEG-2 standard. After 1994, the MPEG-2 standard has also been expanded and revised.

Video coding and decoding technology in MPEG standardsMPEG standards
are mainly based on three coding tools: Adaptive block transform coding eliminates spatial redundancy; Motion-compensated differential pulse code modulation (DPCM) eliminates temporal redundancy, and the two are combined into hybrid coding technology (hybrid coding). Entropy coding is used to eliminate statistical redundancy generated by hybrid encoders. There are also some auxiliary tools as a supplement to the main tools, which are used to eliminate the remaining redundancy of certain special parts of the encoded data, or to adjust the coding according to specific applications. Some coding tools also support formatting data into specific bit streams for storage and transmission. Modern entropy
coding was created in the late 1940s; it was applied to video coding in the late 1960s; and then it was continuously improved. In the mid-1980s, two-dimensional variable length coding (2D VLC) and arithmetic coding methods were introduced. DPCM
was created in 1952 and was first applied to video coding in the same year. DPCM was originally developed as a spatial coding technology. In the mid-1970s, DPCM began to be used for temporal coding. DPCM as a complete video coding scheme lasted until the early 1980s. From the mid-to-early 1970s, key elements of DPCM were merged with transform coding techniques to form hybrid coding techniques, which developed into the prototype of MPEG in the early 1980s.
Transform coding was first used for video in the late 1960s and developed substantially in the first half of the 1970s. It is considered to achieve the highest resolution effect in spatial coding. In hybrid coding, transform coding is used to eliminate spatial redundancy and DCPM is used to eliminate temporal redundancy. Motion compensated prediction technology greatly improves the performance of DCPM in the temporal domain. It was first created in 1969 and developed into the basic form of MPEG in the early 1980s. In the early 1980s, interpolative coding was extended, that is, prediction is made by interpolating multiple frames, and the intermediate frames are predicted by scaled motion vectors. It was not until the late 1980s that bi-directional prediction was born and the technology developed into its final form. In recent developments (H.264), the prediction quality has been improved, that is, the correlation between different signals has been reduced. Therefore, fewer transforms are necessary and H.264 uses a simplified transform (4 x4).

The time correspondence between the AVS standard and related international standards and the work that the AVS working group has carried out are shown in the figure below.

Basic principles of video compression
The fundamental reason why video can be compressed is that video data has a high degree of redundancy. Compression refers to the elimination of redundancy, which is mainly based on two technologies: statistics and psychovision.
The basic basis for eliminating statistical redundancy is that the video digitization process adopts a regular sampling process in time and space. The video screen is digitized into a regular array of pixels, and its density is suitable for representing the highest spatial frequency of each point, while most picture frames contain very few or even no details of this highest frequency. Similarly, the selected frame rate can represent the fastest movement in the scene, and the ideal compression system only needs to describe the instantaneous movement required by the scene. In short, the ideal compression system can dynamically adapt to the changes in time and space of the video, and the amount of data required is much lower than the original data generated by digital sampling.
Psychological vision technology is mainly aimed at the limits of the human visual system. Human vision has limits in contrast bandwidth, spatial bandwidth (especially color vision), and temporal bandwidth. Moreover, these limits are not independent of each other, and the overall visual system has an upper limit. For example, the human eye cannot perceive high resolution in time and space at the same time. Obviously, there is no need to represent information that cannot be perceived, or in other words, a certain degree of compression loss is not perceived by the human visual system.
The video coding standard is not a single algorithm, but a set of coding tools, which together achieve a complete compression effect. The history of video compression can be traced back to the early 1950s. In the following 30 years, the main compression technologies and tools gradually developed. In the early 1980s, video coding technology took shape. Initially, each major tool was proposed as a complete solution for video coding. The main technical lines developed in parallel, and eventually the best performance was combined into a complete solution. The main contributors to the solution integration were standardization organizations. Experts from various countries and organizations jointly completed the solution integration work, or in other words, the coding standard solution was originally created by the standards committee. In addition, although some technologies were proposed many years ago, they were not put into practical use at the time due to the high cost of implementation. It was not until the development of semiconductor technology in recent years that the requirements of real-time video processing were met.

Figure 2 Development of coding tools and standards (Cliff, 2002)
(3) MPEG-4: Noting the needs of low-bandwidth applications and the rapid development of interactive graphics applications (synthetic content such as games) and interactive multimedia (content distribution and access technologies such as WWW), the MPEG expert group established the MPEG-4 working group to promote the integration of the above three areas. In early 1999, MPEG-4 (first edition) that defined the standard framework became an international standard (ISO/IEC 14496-1). The second edition that provided a variety of algorithms and tools became an international standard (ISO/IEC 14496-2) at the end of 1999. The third, fourth and fifth editions are still being developed.

The second generation video coding standard
MPEG-2 and H.263, formulated in 1994, are a milestone in the field of international audio and video standards and are the basic standards followed by the audio and video industry. In the past decade, both the audio and video coding technology itself and the industry application background have undergone significant changes. ITU-T proposed a long-term video standardization project H.26L in 1997, and ITU-T launched the first version of the test model of the standard in August 1999. In response to ISO/IEC MPEG's demand for advanced video coding technology, ISO and ITU began to form a joint video team (JVT, Joint Video Team, ISO/IEC MPEG and ITU-T VCEG Joint Video Team) in 2001 to develop a new video coding standard based on H.26L, namely the JVT standard.
The JVT standard is a wide-area standard that takes into account broadcasting and telecommunications, covering from low-bitrate communications to high-definition television. In ISO/IEC, the official name of the standard is MPEG-4 AVC (Advanced Video Coding) standard; in ITU-T, the official name is H.264 standard. In the second half of 2003, ISO/IEC officially released this standard under the name of MPEG-4 Part 10 (ISO/IEC 14496-10).
Although MPEG-4 AVC/H.264 is an important representative of the second-generation standard, it is far less popular than MPEG-2 in the past, and faces strong technical competition from enterprises and other standard organizations. The competitors from enterprises are represented by WMV9 proposed by Microsoft. SMPTE (Society of Motion Picture and Television Engineers) is a video, television and film industry standard development organization certified by the American National Standards Institute (ANSI), and has rich experience in formulating and exploring private specifications. In September 2003, SMPTE accepted the compression technology specification adopted by WMV-9 as a video codec format standard. The draft standard name is "SMPTE Standard for Television: VC-9 Compressed Video Bitstream Format and Decoding Process", referred to as VC-1. In April 2006, SMPTE officially released VC-1.
The AVS standard is a digital audio and video coding standard developed by the China Digital Audio and Video Coding Technology Standard Working Group (AVS Working Group). The AVS Working Group was established in 2002. Its members include more than 100 institutions and enterprises engaged in the research and development of digital audio and video coding technologies and products at home and abroad. The mission of the AVS Working Group is to organize the formulation of industry and national source coding technology standards in response to the needs of my country's information industry. The official name of the AVS national standard is "Information Technology Advanced Audio and Video Coding", numbered GB/T 20090, which includes 9 parts, of which "Part 2 Video" (abbreviated as AVS Video) was promulgated in February 2006 and implemented in March. The AVS video standard is mainly aimed at high-definition and high-quality digital television broadcasting, digital storage media and other related applications. It has four major characteristics: (1) high performance, with coding efficiency more than twice that of MPEG-2 and comparable to that of H.264; (2) low complexity, with lower algorithm complexity than H.264; (3) low implementation cost, with both software and hardware implementation costs lower than H.264; (4) simple patent licensing model, with costs significantly lower than similar standards.

MPEG1 and H.261 were preceded by the CCITT H.261 standard (started in 1984 and substantially completed in 1989), which shared common data structures, coding tools, and syntax elements. However, the two are not exactly backward compatible. MPEG1 can be seen as an extension of H.261. The development of MPEG1 started in 1988 and was substantially completed in 1992. MPEG2 can be seen as an extension of MPEG2, which started in 1990 and was substantially completed in 1994. H.263 started in 1992 and the first version was completed in 1995. MPEG4 (whose video is built on MPEG2 and .263) started in 1993 and the first version was substantially completed in 1998. Due to the chip and other technologies, MPEG-4 AVC/H.264, which was completed in 2003, uses more complex technology than previous video coding standards. It also has new technical modules, such as intra-frame and inter-frame coding of multi-size blocks, multi-directional spatial prediction technology, 4x4 integer orthogonal transform, and in-loop filter to remove block effects, which can achieve a higher compression ratio. Due to the use of data partitioning, the JVT standard also has stronger fault tolerance.

AVS standard and its core technology
AVS is the second generation audio and video coding technology standard independently formulated by China. The characteristic core technologies of AVS video include: 8x8 integer transform, quantization, intra-frame prediction, 1/4 precision pixel interpolation, special inter-frame prediction motion compensation, two-dimensional entropy coding, deblocking loop filtering, etc.
1) Transformation and quantization
AVS's 8x8 transform and quantization can be implemented without distortion on a 16-bit processor, thus overcoming the inherent distortion problem of 8x8 DCT transform used in all international standards for video compression coding before MPEG-4 AVC/H.264. However, the decorrelation performance of 4x4 integer transform used by MPEG-4 AVC/H.264 on high-resolution video images is not as effective as that of 8x8 transform. AVS uses 64-level quantization, which can fully adapt to the requirements of different applications and services for bit rate and quality. After solving the problem of 16-bit implementation, the 8x8 transform and quantization scheme currently used by AVS is suitable for both fast implementation of 16-bit DSP or other software methods, and optimized implementation of ASIC.
2) Intra-frame prediction
The intra-frame prediction technology of AVS follows the idea of intra-frame prediction of MPEG-4 AVC/H.264, using the pixels of adjacent blocks to predict the current block, and adopting multiple prediction modes representing the texture direction in the spatial domain. However, the intra-frame prediction of AVS brightness and chrominance is based on 8x8 blocks. The brightness block uses 5 prediction modes, and the chrominance block uses 4 prediction modes, and 3 of these 4 modes are the same as the prediction mode of the brightness block. Under the premise of equivalent coding quality, AVS uses fewer prediction modes, making the scheme more concise and greatly reducing the complexity of implementation.
3) Multi-mode inter-frame prediction
Inter-frame motion compensation coding is one of the most important parts of the hybrid coding technology framework. The AVS standard uses 16×16, 16×8, 8×16 and 8×8 block modes for motion compensation, and removes the 8×4, 4×8, and 4×4 block modes in the MPEG-4 AVC/H.264 standard, in order to better characterize the motion of objects and improve the accuracy of motion search. Experiments show that for high-resolution videos, the block mode selected by AVS can already express the motion of objects in sufficient detail. Fewer block modes can reduce the overhead of motion vector and block mode transmission, thereby improving compression efficiency and reducing the complexity of codec implementation.

4) 1/4 pixel motion compensation
AVS and MPEG-4 AVC/H.264 both use 1/4 pixel precision motion compensation technology. MPEG-4 AVC/H.264 uses a 6-tap filter for half-pixel interpolation and a bilinear filter for 1/4 pixel interpolation. AVS uses different 4-tap filters for half-pixel interpolation and 1/4 pixel interpolation, which reduces the reference pixels required for interpolation without reducing performance, reducing the data access bandwidth requirements, which is very meaningful in high-resolution video compression applications.
5) Reference frame
In traditional video coding standards (MPEG-x series and H.26x series), bidirectional prediction frames B frames have only one forward reference frame and one backward reference frame, while forward prediction frames P frames have only one forward reference frame. The recent MPEG-4 AVC/H.264 makes full use of the temporal correlation between pictures, allowing P frames and B frames to have multiple reference frames, up to 31 reference frames. Multi-frame reference technology will greatly increase storage space and data access overhead while improving compression efficiency. In AVS, P frames can use up to 2 forward reference frames, while B frames use one reference frame each. The number of reference frames of P frames and B frames (including backward reference frames) is the same. The reference frame storage space and data access overhead are not larger than the traditional video coding standard, and the resources that must be reserved are fully utilized.
6) 1/4 pixel motion compensation
The bidirectional prediction of B frames in AVS uses direct mode, symmetric mode and skip mode. When using the symmetric mode, the code stream only needs to transmit the forward motion vector, and the backward motion vector can be derived from the forward motion vector, thereby saving the encoding overhead of the backward motion vector. For the direct mode, the forward and backward motion vectors of the current block are derived from the motion vector of the corresponding position block of the backward reference image, and there is no need to transmit the motion vector, so the encoding overhead of the motion vector can also be saved. The method of deriving the motion vector of the skip mode is the same as that of the direct mode. The residual of the motion compensation of the block encoded in the skip mode is also zero, that is, in this mode, the macroblock only needs to transmit the mode signal, and does not need to transmit additional information such as the motion vector and the compensation residual.
7) Entropy Coding
AVS entropy coding uses adaptive variable length coding technology. In the AVS entropy coding process, all syntax elements and residual data are mapped into binary bit streams in the form of exponential Golomb codes. The advantages of using exponential Golomb codes are: on the one hand, its hardware complexity is relatively low, and the codewords can be parsed according to the closed formula without looking up the table; on the other hand, it can flexibly determine the k-order exponential Golomb code encoding according to the probability distribution of the coding element. If k is selected appropriately, the coding efficiency can approach the information entropy.
The block transform coefficients of the prediction residual are scanned to form a (level, run) pair string. Level and run are not independent events, but have a strong correlation. In AVS, level and run are two-dimensionally jointly encoded, and the order of the exponential Golomb code is adaptively changed according to the different probability distribution trends of the current level and run.
AVS video currently defines a profile, namely the benchmark profile. The benchmark profile is divided into 6 levels, corresponding to high-definition, standard definition and CIF (1/4 standard definition, equivalent to VHS or VCD quality) applications. Compared with the baseline profile of MPEG-4 AVC/H.264, AVS video adds B-frames, interlace and other technologies, so its compression efficiency is significantly improved. Compared with the main profile of MPEG-4 AVC/H.264, it reduces technologies that are difficult to implement, such as CABAC, thereby enhancing feasibility.
The main features of AVS video are clear application goals and targeted technologies. Therefore, in high-resolution applications, its compression efficiency is significantly higher than that of MPEG-2 video, which is currently commonly used in digital television and optical storage media. Under the premise of comparable compression efficiency, its implementation complexity is greatly reduced compared to the main profile of MPEG-4 AVC/H.264.
(4) MPEG-7 and MPEG-21 standards: MPEG-7 is a content expression standard for multimedia information search, filtering, management and processing, and became an international standard in July 2001. The focus of MPEG-21, which is being developed, is the multimedia framework, providing a basic system for all developed and under-development standards related to multimedia content delivery.

Reference address：A brief introduction to the international video coding standard MPEG and key technologies of AVS video

Previous article：Application of AVS in Video Surveillance System
Next article：The difference between AVS and international standard MPEG

Popular Resources
Popular amplifiers

Latest Analog Electronics Articles

High signal-to-noise ratio MEMS microphone drives artificial intelligence interaction
Author: Dr. Gunar Lorenz Senior Director of Technology Marketing, Infineon Technologies Proofreader: Ding Yue Chief Engineer, Greater China, Consumer, Computing and Communications Business, Infineon Technologies Introduction At Infineon, I ...
Advantages of using a differential-to-single-ended RF amplifier in a transmit signal chain design
Traditional radio frequency (RF) transmit signal chains typically use a digital-to-analog converter (DAC) to generate a baseband signal. This signal is then up-converted to the desired RF frequency using an RF mixer and a local oscillator. ...
ON Semiconductor CEO Appears at Munich Electronica Show and Launches Treo Platform
During Electronica, ON Semiconductor CEO Hassane El-Khoury was interviewed by Power Electronics News at the exhibition site and gave a detailed introduction to the new products that ON Semiconductor brought to the market. ...
ON Semiconductor Launches Industry-Leading Analog and Mixed-Signal Platform
The Treo platform uses a modular architecture that accelerates the development of intelligent power management, sensor interface and communication solutions. The Treo platform is based on 65-nanometer BCD process technology and supports ...
Analog Devices ADAQ7767-1 μModule DAQ Solution for Rapid Development of Precision Data Acquisition Systems Now Available at Mouser
Mouser Now Stocking Analog Devices ADAQ7767-1 μModule DAQ Solution for Rapid Development of Precision Data Acquisition Systems November 6, 2024 – Offering a broad range of semiconductor and electronics ...
Domestic high-precision, high-speed ADC chips are on the rise
Microcontrollers that combine Hi-Fi, intelligence and USB multi-channel features – ushering in a new era of digital audio
Using capacitive PGA, Naxin Micro launches high-precision multi-channel 24/16-bit Δ-Σ ADC
Fully Differential Amplifier Provides High Voltage, Low Noise Signals for Precision Data Acquisition Signal Chain

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like