Freescale i.MX53 application processor provides a typical structure based on hardware accelerator. Its embedded full hardware VPU supports a wide range of video formats from H.264, MPEG4, Divx to RV10, which can cover most video resources and support 1080i/p high-definition decoding and 720p encoding. In addition, the processor can also perform multi-channel video decoding and full-duplex multi-channel video encoding processing at the same time, and allows each video to use different formats, so as to realize dual-monitor configuration or video conference applications.
Typical hardware video processing engine structure
Different from the full hardware VPU in the usual sense, a significant advantage of this VPU is that it can provide programmability and update the encoding and decoding process to a certain extent. The reason is that it has a built-in 16-bit small programmable DSP. This processor called BIT can flexibly control the encoding and decoding process and the interface interaction with the CPU by executing different firmware.
For the CPU, the amount of computing required to control the VPU does not exceed 1MIPS. Such low computing requirements are also attributed to the BIT processor. It contains a dedicated hardware accelerator to accelerate the processing of the bitstream, and implements functions including frame rate control, FMO, ASO, video codec control, and error recovery. Most of the sub-modules in the VPU are also highly optimized and can be fully reused when encoding and decoding various video formats, thereby reducing the number of gates and power consumption.
The VPU structure of MX53 is shown in Figure 1. It is connected to the ARM processor through standard AXI/APB, so that it can access the on-chip cache to achieve high performance. VPU mainly consists of two components, video codec processing IP and VPU bus converter. The former is the core of the entire VPU, mainly composed of embedded BIT processor, video CODEC and bus arbiter; the latter is responsible for converting the AMBA APB3 bus to the IP Sky Blue bus inside the VPU.
Video decoding process flow
Thanks to the highly perfect control process of the BIT processor, from the perspective of the external CPU, the VPU is highly autonomous, and the CPU only needs to manage the processes related to the VPU. It should be noted that the process here does not refer to the system process in the usual sense, but the dedicated process inside the VPU.
The VPU can process up to 4 channels of video in different formats at the same time, but the processing flow is the same. It all starts with creating a process (the system is responsible for creating and setting a dedicated process), then running the process (the system needs to run the process at the time point that the decoder is idle and the bitstream is ready in memory), and finally exiting the process.
If multiple processes are ready to run, each process will be assigned a unique process index number, which is assigned based on the order in which it is created. For example, when 1 channel MPEG-4 decoding, 1 channel H.264 decoding, 1 channel MPEG-2 decoding and 1 channel VC-1 decoding are running at the same time, the MPEG-4 decoding process will be assigned index number 0, and the VC-1 decoding will be assigned index number 3.
In a multi-process environment, there is no priority for the execution of processes. After all processes are created, the CPU will start the BIT processor to execute these processes. The BIT processor also uses a mechanism similar to time slice division to schedule a process.
Let's jump out of the VPU and look at its operation from the perspective of the entire system. Let's take the example of simultaneously decoding one H.264 stream and one MPEG-4 stream.
First, initialize the VPU, including loading the firmware code required by the BIT processor into the memory and setting initialization parameters, such as BIT processor configuration parameters, working buffer base address, BIT code address, and stream buffer control, etc.
Then create the H.264 code stream and MPEG-4 decoding process, including setting the base address and size of the code stream buffer, the base address of the frame buffer, etc.
Each process is then executed alternately. A flag (Wait BusyFlag) indicates whether a frame of code stream has been decoded. The decoded code stream will be sent to the image processing unit (IPU) for post-processing and display.
Finally, after decoding is completed, the relevant memory resources are released and the process is destroyed.
Memory control is a key issue when using VPU
The VPU has full access to external memory, which it uses to load and store image frames, bitstreams, and code and data for the BIT processor. The amount of memory used depends on the video format itself and the target application. For example, H.264 decoding uses up to 16 reference frames, but H.263 decoding only requires 1. In addition, different formats also require different sizes of temporary memory when processing de-blocking or superposition smoothing filtering.
Basically, VPU uses 6 different storage areas: frame buffer (used to store a frame of image), BIT processor code memory area, working buffer (for intermediate data of BIT processor and for use by video decoding hardware), bitstream buffer (used to load bitstream), parameter buffer (used for BIT processor command execution and return data), search RAM (used by ME module to reduce the bus load of external memory).
Among them, the processing of the code stream buffer is very critical. For each process, the system must allocate an independent code stream buffer. The external code stream buffer will form a buffer ring (ring buffer). The BIT processor will automatically perform a loop operation after obtaining the starting address of the buffer ring.
During the decoding process, the CPU writes the code stream into the buffer, and then the BIT processor reads the code stream. If the two do not work well together, it will cause overwriting or underflow of the code stream. Once this happens, the decoding will fail. To prevent this from happening, the buffer read/write pointer of the current code stream must be exchanged between the external CPU and the BIT processor inside the VPU. The write pointer operated by the CPU and the read pointer operated by the BIT must both be written into the internal register. The BIT processor determines whether the code stream buffer has insufficient code stream by comparing these two pointers. If so, it is necessary to stop decoding to prevent misreading of the code stream until the CPU writes enough code stream data and updates the write pointer. Conversely, the CPU also needs to judge the read pointer before writing data to the buffer ring to ensure that code stream rewriting will not occur.
In applications such as 1080i/p high-definition decoding, the memory bandwidth required by the VPU is very high, and most current operating systems are multi-tasking operating systems, so insufficient memory bandwidth is likely to occur, which will cause unsmooth playback or even incorrect decoding. Therefore, the use of system bandwidth must be carefully planned.
Conclusion
From the above analysis, it can be seen that the use of i.MX53's VPU is very simple. The high degree of encapsulation of the encoding and decoding process by the full hardware VPU actually hides the complexity of this process, making video processing an easy task overall. This is one of the significant advantages of the full hardware VPU. At present, the market competition for multimedia devices is extremely fierce, and the product development time of system manufacturers has been compressed very short. As far as video solutions are concerned, application processor suppliers must ensure that their reference designs can provide simple and easy-to-use APIs, as well as fully verified reliability and real-time encoding and decoding performance. System design based on full hardware video processing is undoubtedly a very attractive solution in the market.
Previous article:Core technologies and features of surveillance video quality diagnosis
Next article:Sprite Mobile Video Monitoring System Based on J2ME
- Popular Resources
- Popular amplifiers
- High signal-to-noise ratio MEMS microphone drives artificial intelligence interaction
- Advantages of using a differential-to-single-ended RF amplifier in a transmit signal chain design
- ON Semiconductor CEO Appears at Munich Electronica Show and Launches Treo Platform
- ON Semiconductor Launches Industry-Leading Analog and Mixed-Signal Platform
- Analog Devices ADAQ7767-1 μModule DAQ Solution for Rapid Development of Precision Data Acquisition Systems Now Available at Mouser
- Domestic high-precision, high-speed ADC chips are on the rise
- Microcontrollers that combine Hi-Fi, intelligence and USB multi-channel features – ushering in a new era of digital audio
- Using capacitive PGA, Naxin Micro launches high-precision multi-channel 24/16-bit Δ-Σ ADC
- Fully Differential Amplifier Provides High Voltage, Low Noise Signals for Precision Data Acquisition Signal Chain
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- Understanding TMS320C6000 gel files
- NASA is going to build Wi-Fi on the moon? What do you think?
- How much does it cost to learn microcontrollers?
- Does anyone know linear system theory? Urgent! ! ! ! ! Thanks! ! ! !
- I have a question, STM32 drives a three-phase DC brushless controller to drive a DC brushless motor with Hall
- An employee was dissatisfied with being fired and used a "crawler" to delete company data
- Welcome to the 5G era: There are countermeasures for mobile phone antenna design
- Allegro board layer color configuration
- Detailed explanation of series/parallel resonant circuit
- RISC-V MCU IDE MRS (MounRiver Studio) development: Print FLASH and RAM usage information after compilation