Flexible implementation of HD video transcoding standard strategy based on multi-core media processors

Publisher:TranquilBreezeLatest update time:2011-03-01 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

There are few jokes in the telecommunications industry that are truly funny, but there is one that makes people smile with its ironic humor: The best thing about standards is that there are so many of them to choose from. This applies not only to video, but also to communications, transmission systems, and technical interfaces.

This article will first briefly introduce some of the most commonly used video standards, and then discuss the multi-core, media processor-based flexible approach taken by chip manufacturers such as LSI. LSI has extensive industry experience in developing products for the voice/video media gateway market, including a scalable product line for any-to-any video communication and real-time collaboration applications with the new generation of media gateways.

Laying the foundation

Video has been growing in importance over the past decade. The first and perhaps most important factor is the shift in the way advertising is consumed. However, television ratings have fallen to an all-time low, and the effectiveness and impact of television advertising have declined, while advertising costs remain high. Therefore, advertisers are looking for new ways to spend their advertising dollars, and one of the most popular directions is the booming online video on demand.

The reasons for this change are obvious. Because information can be transmitted in extremely fine detail and at a fraction of the cost of traditional methods, it is no surprise that video has become so popular as a network-based application (via the Internet). Video’s popularity has also been aided by the ubiquity of broadband, the power of modern personal computers, and the incredible breadth, depth, and richness of multimedia content.

Any emerging technology that is popular in the market will also promote innovation, which in turn translates into product and service differentiation, the pursuit of being the first, and the reduction of consumer costs. However, innovation and the pursuit of uniqueness often lead to a fragmented approach to meet market needs, and inevitably lead to product incompatibility. This incompatibility will slow down market development because users worry that the product or service they choose will end up taking the wrong technical route.

Standardization bodies have a responsibility to coordinate the different approaches adopted by developers. The key to solving the problem is balance. Standardization organizations must develop recommendations that "provide a unified approach to product development" while leaving enough room for implementation interpretation to avoid innovation becoming rigid and siloed.

Most of the current video standards are developed by the International Telecommunication Union (ITU) and the Moving Picture Experts Group (MPEG). ITU develops standards from the perspective of the network that transmits the video stream, while MPEG develops standards from the perspective of the product being transmitted. Both have been widely used and have good compatibility.

ITU Standards

The video standards released by ITU are located in the 'H' volume of relevant standards, including H.261, H.263 and H.264. This section will introduce the above standards one by one.

H.261

H.261 was originally developed as a video coding standard for the limited data rates of the ISDN era (specifically multiples of 64kbps). In some documents, the standard is also referred to as Px64, where P represents any number between 1 and 30 (30 is the maximum number of channels that an ISDN basic rate line or E-1 equipment can provide).

H.263

H.263 is a video coding standard designed for medium-quality video conferencing and video telephony applications. Originally developed for low-bandwidth video transmission at 20kbps, H.263 is based on the H.261 design, but requires only half the bandwidth of H.261 to achieve the same quality. As a result, H.263 has effectively replaced H.261 in terms of frequency of implementation. Like H.261, H.263 relies on the Real-time Transport Protocol (RTP) to transmit video signals.

H.261 supports only two resolutions, but H.263 supports 5. In addition to CIF and QCIF, H.263 also supports SQCIF, 4CIF, and 16CIF.

H.264

H.264 is a new generation standard in the ITU series, jointly developed by ITU and the International Organization for Standardization (ISO), also known as MPEG-4 Part 10. H.264/MPEG-4 is also known as the Advanced Video Coding (AVC) standard, which is designed to support high-end video applications such as video conferencing and video telephony, and has functions such as digital compressed video (such as low-bitrate Internet streaming, HDTV broadcasting, digital cinema, etc.).

H.264 includes efficient video coding tools that can further improve coding efficiency. Compared with previous standards, this standard has significant advantages in rate distortion (depending on the application, the average gain can be up to 50%). It provides multiple categories for specific application needs. The basic category includes tools optimized for video conferencing and mobile applications; the extended category is for streaming media applications; and the main and advanced categories are for broadcast and storage applications.

H.264 is divided into two layers in terms of algorithm concept: the video coding layer (VCL) represents the content of video compression; the network adaptation layer (NAL) packages and transmits video compression data according to network capabilities. In addition, it also provides header information for transmission protocols such as RTP and storage systems.

The Scalable Video Codec (SVC) standard is the latest extension to H.264 for transmitting coded streams that are scalable in terms of time, space, and video quality. The SVC extension introduces a concept not present in the original H.264 - the division of the video stream into layers. The base layer encodes the most basic temporal, spatial, and quality representation of the video stream. The enhancement layers use the base layer as a starting point to encode additional information that can be used during the decoding process to reconstruct a high-quality, high-resolution, or high-frame-rate version of the video. By decoding the base layer and subsequent enhancement layers as needed, the decoder can produce a video stream with the desired characteristics. The coded video stream can be truncated to limit bandwidth usage or reduce decoding computational requirements. The truncating process refers to extracting only the required layers from the coded video stream without doing any other processing on the video stream itself. Therefore, the best quality of experience (QoE) can be achieved depending on the capabilities of the endpoint decoder (such as display size, computational resources, etc.).

MPEG Standards

The MPEG video standard series includes four main protocols: MPEG-1, MPEG-2, MPEG-3 and MPEG-4.

MPEG-1

MPEG-1 is a lossy video and audio compression standard developed by the Joint Photographic Experts Group and the Telephone Experts Group of CCITT (now ITU-T), which is designed to compress digital video and CD-level audio to 1.5Mb/s, with compression ratios of 26:1 and 6:1 respectively. According to this standard, highly compressed video and audio can be transmitted without excessive loss of signal quality.

MPEG-2

MPEG-2, derived from the MPEG-1 standard, supports lossy audio and video compression. MPEG-2 is the most common standard for digital television transmission in broadcast, cable and direct satellite television systems. It is also used to format movies for DVD distribution. MPEG-2 is an international standard, and its individual parts (Parts 1 and 2) were developed in conjunction with the ITU. Although MPEG-2 is widely used in television and DVD systems, it does not make comprehensive provisions for such environments. The standard leaves a lot of room for local interpretations.

MPEG-3

It is commonly believed that MPEG-3 is the same as MP3, the popular standard for music encoding (it differs from MPEG-1 layer 3), but this is actually the opposite. MPEG-3 specifies a series of audio and video coding standards specifically for transmitting 1080p HDTV signals at a rate of 20-40Mb/s. When HDTV came out, it was found that the existing MPEG standards seemed a bit inadequate, so MPEG-3 was chosen as a temporary solution. In 1992, HDTV was added as a special service profile to MPEG-2, and MPEG-3 became part of MPEG-2.

MPEG-4

As multimedia applications became increasingly popular in the late 1990s and early 2000s, there was a growing need for a compression standard that could meet the special requirements of such applications, and MPEG-4 was born.

MPEG-4 was introduced in 1998 and quickly became the standard for web-based streaming media, CD-based audio content, speech, and broadcast television. The standard provides many of the same features already specified by MPEG-1 and MPEG-2, but adds new specifications for the special requirements of rendering digital graphics, including support for Virtual Reality Markup Language (VRML) for 3D graphics rendering and digital rights management (DRM). Many parts of the standard are well-designed and widely used. MPEG-4 Part 2 is widely adopted by DivX®, Xvid®, Nero Digital® and QuickTime®, as well as Advanced Video Coding (AVC) included in H.264. It has also been used in HD DVD and Blu-ray Disc™.

Several peripheral standards

New video standards such as VC-1 and Flash Video deserve a mention as they gain a foothold in the technology space. VC-1 is a variation of the codec used in many of the previously discussed standards. VC-1 was written by a number of companies in the industry, but is generally considered to have been developed by Microsoft and is considered a replacement for H.264. VC-1 is optimized for interlaced video content, making it a more suitable solution for the broadcast and video industries. Although VC-1 is relatively new, its use in Blu-ray and HD DVD, as well as support for the VC-1 decoder in Windows Vista, puts it in a strong position in the market.

Flash videos are played using Adobe Flash Player and can support various codec formats such as H.264 video and AAC audio. This format is widely used for Internet video distribution and has been adopted by major websites such as YouTube and Yahoo!

The joke at the beginning of the article is no joke, it seems that there are as many video formatting and transmission standards as there are types of video that need to be transmitted. The good side of this phenomenon is that the various standards organizations have begun to work closely together, which has reduced the number of new standards and more related standards have emerged. However, manufacturers seem to be facing the same challenges all the time. Faced with so many standards, how can they quickly and efficiently get products to market? How can they determine which standard will be the best and which standard to design their products to? Obviously, they can design products that comply with multiple standards, but this requires the use of some kind of media gateway to ensure full interoperability.

It is important to note that even with the best standards, actual products are still constrained by cost and time to market, so compromises are often made, such as hard-wired logic that can handle the complex requirements of an HD video codec, but lack flexibility. As video codec standards continue to increase in complexity to achieve higher levels of compression, more flexibility is needed to handle partially compliant implementations. One approach is to use the lowest level in the technology hierarchy, and obviously, video standards have evolved over the past few years to accommodate the changing landscape of video and video users. Video originally existed in the broadcast or cable TV space, but now it is making its way to the Internet in a variety of formats suitable for a wide range of devices. The challenge, of course, is to make everything available and play it across a wide range of devices and platforms, which is no easy task. Part of the solution has come from the development of a number of successful operating systems in the market that can handle the complexity of these functions. While flexibility and functionality are great, these operating systems do this at the expense of high power and low density, but if low-density applications are what you need and power is not an issue, then the trade-off is worth it.

Multimedia is the new trend, but the situation has changed with the advent of broadband mobile phones that support Internet access, games, video and TV. The demand for networks and networked devices is also huge for other applications, such as user-generated content and social networking. Therefore, semiconductor companies that have been successful in the telecommunications field have developed video processing architectures that support various video resolutions (such as QCIF, CIF and HD) and have the flexibility to support various standards to meet this demand.

To successfully process high-resolution video, designers must consider power consumption as a key design factor. Video is one of the most power-hungry applications because of the extensive processing required to meet QoS requirements and the output that viewers expect. Achieving the highest power performance requires the use of low-power techniques. Video applications can meet the programmability and low-power requirements by using multi-core media processors with efficient pipeline designs.

Programmable multi-core media processor

As multiple video and audio formats emerge, the equipment that supports them and their applications is becoming increasingly complex, resulting in more expensive and complex semiconductor designs. Video is a very demanding application, and the high quality of the signal presented requires special processing, so a range of functions must be considered when deploying video-specific products. Programmable multi-core solutions are flexible and have low power consumption.

A large portion of the real-time cycles in the video encoding process are actually used for control and data processing. For example, it is ideal to process control-sensitive data for conversion and filtering in a single operation. Better solutions can be achieved through enhanced control and flow modification instructions, such as efficient hardware loops, parallel conditional calculations, and efficient preloading.

Additionally, parsing, motion vector prediction, interpolation, motion compensation, CABAC operations, and other computationally intensive tasks should be handled in the most efficient way possible. Each element of event processing or management requires slightly different forms of computation. Such considerations determine the overall architecture of the system. Using a single processor core for all tasks results in an overall inefficient architecture. For example, if a higher clock frequency or more cores are required, a more expensive, more power-hungry device is required. By having each process use a dedicated programmable core, computational tasks can be optimized and overall system efficiency can be improved. Media processors that offer multiple cores, advanced processing, and high data throughput enable solutions with the performance and flexibility combination needed to meet current and future multimedia needs.

Media-rich applications require high-density DSP functionality across multiple channels. Fortunately, multicore multimedia processors that meet the needs of such applications are now available. There are a number of important features that should be considered when selecting a multicore media processor for the next generation of media applications. Table 1 lists these features and their related important notes.

Table 1. Important features of multimedia-specific multi-core DSPs.
Table 1. Important features of multimedia-specific multi-core DSPs.

Multi-layer encoder architecture

A typical video transcoder implementation requires an HD decoder (SD, 720p, or 1080p), possibly resizing the image, distributing the YUV output to other cores or devices, and encoding the video at CIF, SD, 720p, or 1080p resolution. This section focuses on the complete decode/encode transcoding technology, and the same operating principles also apply to efficient transcoders, where decoder parameters (such as motion vectors) are often used in the encoder to reduce encoder complexity.

High-definition video encoding, i.e., 1080p (1920x1080) and 720p (1280x720) resolutions, is a demanding task that requires multiple media processing cores to achieve 30-60 FPS in real time. This task can even span multiple multi-core DSP devices. This article focuses on H.264, but the same principles apply to H.263 and MPEG4 encoders. There are two ways to achieve task partitioning between DSP cores.

Figure 1. H.264 encoder block diagram.
Figure 1. H.264 encoder block diagram.

One option is to distribute the functionality among the DSP cores, distributing the computational load as evenly as possible among the cores. For example, as shown in Figure 1, one core is responsible for inter-frame prediction and intra-frame prediction, another core is responsible for transform and quantization, and a third core is responsible for running the deblocking filter and entropy coding. In reality, there are several drawbacks to functional partitioning. Load balancing requires a higher level of communication and coordination between the cores. Balancing the computational load between the cores is more difficult because each functional block has a high level of complexity. When the image resolution increases from CIF to HD, the functional partitioning architecture cannot achieve scalability.

Another video encoder implementation that can overcome the above problems is a multi-layer architecture that is suitable for scalable multi-core devices. This solution is also suitable for multi-device architectures where multiple multi-core devices are connected through high-speed interconnect buses such as sRIO and PCIe.

In this architecture, the implementation of the encoder is distributed across multiple DSP cores. Each macroblock slice is assigned to a DSP core. Each of these cores provides specific functions, such as rate control and intra-frame image processing. In practice, it is not possible to pre-budget bits and allocate them to each macroblock slice, because different macroblock slices may have different image complexity, and using very different Qp values ​​between image partitions will cause artifacts at layer boundaries in the composite image. In H.264, a macroblock slice NAL can contain any number of macroblocks, so unlike H.263, the shape of the image partition does not need to conform to the GOB boundary. The header of each macroblock slice (slice header) contains the number of the first macroblock encoded in the macroblock slice data.

Figure 2. HD encoder multi-layer processing.
Figure 2. HD encoder multi-layer processing.

In the multi-macroblock stripe architecture (shown in Figure 2), the DSP core receives raw video in YUV format from the sRIO interface (connected to another multicore media processor). The media processor implements the H.264 decoder. This multiprocessor architecture can take advantage of the flexibility of sRIO to dynamically distribute the decoded macroblock stripe in one multicore media processor to the DSP core in another multicore media processor for further processing.

sRIO is a point-to-point technology that provides the flexibility to connect multiple devices to transfer data or process common data sets. Each device autonomously writes to the I/O space of the other devices. Each sRIO supports up to 10Gbps of throughput in each direction. Combining sRIO with efficient DMA channels enables:

* Parallel video processing and data transmission

* Coordinate execution

Data sharing is achieved through shared memory (if the DPS cores are located in the same device) or through the high-speed sRIO interface (if the DSP cores are located in different devices).

Figure 3. Example of a high-throughput, low-latency multicore device interconnect.
Figure 3. Example of a high-throughput, low-latency multicore device interconnect.

Figure 3 illustrates the potential use of high-speed serial I/O for complex video processing tasks. The figure shows a connection scheme for extending the processing power of multicore devices across multiple devices, which can enable more complex video processing operations or support more video transcoding channels. Using sRIO switches allows for more flexible communication between devices, but it is not necessary if the processing flow is between adjacent devices. Compared with PCIe switches, sRIO generally has lower cost, higher performance, and lower latency due to its lower packet overhead.

Multi-core decoder architecture

The implementation of the video decoder generally needs to be independent of the encoder. That is, the decoder structure must be universal to handle different encoder schemes, such as single NAL or multi-NAL implementations. The H.264 decoder involves serial and parallel operations, and an important task is to achieve efficient distribution among multiple core DSPs. An efficient multi-core implementation architecture can be divided into many serial operations.

The entropy decoder is a functional block that contains serial operations and local loops and cannot be assigned to parallel tasks running in multiple cores. Even considering advanced techniques such as context-adaptive binary arithmetic coding (CABAC), the complexity of the entropy decoder is lower than that of the reassembly block. As DSP cores become more powerful, the decoding function can be implemented in a single DSP core.

Figure 4. H.264 decoder block diagram.
Figure 4. H.264 decoder block diagram.

Figure 4 shows a multi-core architecture that uses a single DSP core for entropy decoding and distributes the more computationally intensive task of reassembling blocks to multiple DSP cores. This data distribution technique keeps inter-task communication on a specific core and enables more efficient cache performance. Another advantage of this architecture is that it is scalable from SD to HD while achieving more even load balancing among the DSP cores. Different implementations can be considered, such as single row per core or multiple columns per core. Data distribution also helps optimize overall latency because decoding is pipelined so that decoding of a macroblock can be performed as soon as data from adjacent macroblocks is received.

Reference address:Flexible implementation of HD video transcoding standard strategy based on multi-core media processors

Previous article:Design of integrated polyphase filter based on GSM receiver
Next article:Design of camera based on Gigabit Ethernet interface

Latest Analog Electronics Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号