For Video Surveillance over Internet Protocol (VSIP), the hardware that handles network traffic is an important part of the camera system because the video signal is digitized and compressed by the camera before being transmitted to the video server to address the bandwidth limitations of the network. Heterogeneous processor architectures such as DSP/GPP help maximize system performance. Video acquisition, storage, and video streaming are interrupt intensive tasks that can be assigned to the GPP for processing, while the high-MIPS video compression work is left to the DSP. After the data is transmitted to the video server, the server stores the compressed video stream as a file on the hard drive, thus avoiding the video quality degradation problem that occurs with traditional analog storage devices. We have developed a variety of standards for compression technology for digital video signals, which can be divided into the following two categories:
* Motion estimation (ME) method: Every N frames is a group of pictures (GOP). We encode the first frame in the group of pictures independently, and for the other (N-1) frames, we only encode the time difference between the current frame and the previously encoded frame (i.e., the forward reference frame). Common standards are MPEG-2, MPEG-4, H.263, and H.264.
* Still image compression method: Each video frame is independently encoded as a still image. The most commonly used standard is JPEG. The MJPEG standard uses the JPEG algorithm to encode each frame.
Comparison of Motion Estimation and Still Image Compression
Figure 1 shows the block diagram of the H.264 encoder. Similar to other ME video coding standards, the H.264 encoder divides the input image into multiple macroblocks (MBs) of 16 x 16 pixels and then processes them block by block. The H.264 encoder consists of a forward path and a reconstruction path. The forward path encodes the frame into bits; the reconstruction path generates a reference frame from the coded bits. IDCT, IQ, ME, and MC in the figure below represent (inverse) discrete cosine transform, (inverse) quantization, motion estimation, and motion compensation, respectively.
Figure 1: H.264 encoder structure diagram. |
In the forward path (from DCT to Q), each macroblock (MB) can be coded in intra mode or inter mode. In inter mode, the motion estimation (ME) module takes the reference MB located in the previous coded frame; while in intra mode, the reference MB is formed from samples in the current frame.
The purpose of the reconstruction path (from IQ to IDCT) is to ensure that the encoder and decoder use the same reference frame to generate the image. Otherwise, the error between the encoder and decoder will accumulate.
Figure 2: JPEG encoder architecture. |
Figure 2 shows the structure of the JPEG encoder. The encoder divides the input image into multiple 8x8 pixel modules and processes them one by one. Each module first passes through the DCT module, and then the quantizer rounds the DCT coefficients according to the quantization matrix. In this process, the encoding quality and compression ratio can be adjusted according to the quantization steps. Finally, the entropy encoder encodes the quantizer output and generates a JPEG image.
Since consecutive video frames usually contain a lot of relevant information, the ME method can achieve higher compression ratios. For example, for standard NTSC resolution at 30 frames per second, the H.264 encoder can encode video at 2 mbps, achieving an average compression ratio of up to 60:1 for image quality. With the same image quality, MJPEG has a compression ratio of 10:1 to 15:1.
MJPEG has several advantages over the ME method. First, the amount of computation and power consumption required by JPEG is significantly reduced. In addition, most PCs are equipped with special decoding and display software for JPEG images. If only one or a few images are needed to record a specific event, such as a person passing through a door, then MJPEG will be more efficient. If network bandwidth is not guaranteed, then we prefer the MJPEG standard because the loss or delay of a frame will not affect other frames. For the ME method, the delay or loss of a frame will cause the delay or loss of the entire GOP, because only the previous reference frame can be obtained to decode the next frame. [page]
Many VSIP cameras have multiple video encoders, so users can choose the most appropriate video encoder for their specific application requirements. Some cameras can even implement multiple codecs simultaneously. MJPEG usually has the lowest requirements for VSIP cameras, and almost all VSIP cameras can be equipped with a JPEG encoder.
Implementation of the MJPEG standard
In a typical digital surveillance system, video is acquired by sensors, compressed, and then streamed to a video server. Interruptions to the video encoder tasks performed on the new DSP architecture are problematic because each context switch results in a large amount of register storage and cache release. Therefore, a heterogeneous architecture should be adopted, which frees the DSP from video acquisition and streaming tasks. The following block diagram shows an example of a DSP/GPP processor architecture in a video surveillance application.
Figure 3: Example of a DSP/GPP processor architecture in a video surveillance application. |
When using the MJPEG standard in a DSP/GPP SoC system, developers should first split the functional modules appropriately to improve system performance.
The EMAC driver, TCP/IP network stack and HTTP server work together to output the compressed image in the form of streaming media. The video capture driver and ATA driver should be deployed on the ARM, which helps to reduce the processing pressure of the DSP. The JPEG encoder should be deployed on the core of the DSP because the DSP VLIW architecture is particularly suitable for this kind of computationally intensive work.
Once the camera acquires a video frame through the video input port on the processor, the raw image is compressed using a JPEG encoder and the compressed image is then saved to the device's hard drive.
Figure 4: Demonstration of MJPEG data streaming on the DaVinci technology-based TI DM6446 digital video evaluation board in a video surveillance system. |
We usually use PC to monitor real-time video scenes, first retrieve the streaming media from the video server, then decode it, and finally display the video image on the monitor. The encoded JPEG image file can be retrieved by the device through the Internet, so we can not only monitor multiple video streams simultaneously on a PC, but also view these retrieved video streams from multiple points simultaneously through the Internet. The VSIP local end can be connected to the video server through the TCP/IP network and can be located anywhere in the network. This is a huge improvement compared to traditional analog systems. Even if there is a problem, it only affects one digital camera and not the local end. We can also dynamically configure the JPEG image quality to meet different video quality requirements.
Optimizing JPEG encoder
Among the three major functional modules of the JPEG encoder, DCT and quantizer have heavier computational tasks. We can also notice that there is a big performance difference between highly optimized assembly code and unoptimized C code for these two modules, so it is necessary to optimize these two modules.
Optimizing the 2D 8x8 DCT function module helps reduce the number of operations such as addition, subtraction, and multiplication, and avoids redundant calculations of the original equation. Currently, many fast DCT algorithms have been launched, among which Chen's algorithm is widely used in the industry. For 2D 8x8 DCT, Chen's algorithm requires 448 addition and subtraction operations and 224 multiplication operations.
The addition, subtraction and multiplication blocks can be further split into multiple functional units (all deployed on the DSP core) to execute parallel instructions and improve performance. Under the condition of negligible overhead, highly optimized DSP assembly code can successfully complete the 2D DCT calculation task within 100 cycles. Other fast DCT algorithms require less calculation, but often require more buffers to save intermediate calculation results. For new DSPs with pipelined VLIW architecture, the workload of memory data access is larger than the workload of multiplication operations, so developers should consider the balance between calculation and memory access when optimizing algorithms.
The quantization process for each pixel requires multiplication and addition operations. This calculation result usually requires only 16 bits of accuracy, while the DSP registers require 32 bits. The first idea to optimize the quantizer module is to store 2 pixels in a single register and then perform addition and multiplication operations on these two pixels; the second method is to use multiple DSP functional units in parallel. Since the DSP core in the TMS320DM6446 has 2 multipliers and 2 adders, we can quantize up to 4 pixels at the same time. The last but not least approach is to make full use of the pipeline DSP architecture. While the DSP core is quantizing the current 4 pixels, it can read the next group of "4 pixels" from the memory, so that each cycle can provide data to the multipliers and adders. The first two methods can be implemented by the developer himself writing optimized C code or assembly code. The pipeline code can use the DSP compiler.
In addition to optimizing each functional module, we can also use the Ping-Pong buffering technique to optimize the system-level JPEG encoder. The DSP core accesses data in the internal RAM (IRAM) much faster than the external DDR2 memory. However, the IRAM capacity is limited and cannot meet the requirements of the entire input frame, so only a part of the module can be processed in the IRAM at the same time. When processing the Ping-Pong set, DMA transfers the Ping-Pong set from DDR2 to IRAM, so that the DSP core can start processing the next data immediately after completing the current work.
Clearly, the digitization of video surveillance systems is here to stay. Understanding technologies such as video compression, system partitioning, and codec optimization is critical to developing next-generation video surveillance systems to meet growing demands.
Previous article:Application of Embedded Storage Technology in SoC Design
Next article:Embedded Java Virtual Machine Optimization Technology
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Very detailed filter basics
- EEWORLD University Hall----Live Replay: How to solve the challenge of precision timing in ADAS systems
- When will it end? ST is out of stock and it is hard to find a domestic alternative, but the domestic ones are also out of stock...
- [Mil MYC-JX8MPQ Review] 3. GPIO Application Programming
- 【Laser target detector】I2C and SPI
- The inverter outputs a sine wave, so how do we measure the harmonics and waveform distortion of the output voltage?
- Understand the MSP430F149 comparator by looking at a picture
- MOS tube knowledge
- 【GD32L233C-START Review】-I. Unboxing and Hardware Appreciation
- PCB technology: PROTEL's metal slot method