Video Compression and Data Streaming in Video Surveillance-EEWORLD

Collect

As people's security awareness increases, video surveillance systems are becoming more and more popular and are now widely used in airports, banks, public transportation centers and even private homes. However, there are many problems with traditional analog systems, which has prompted people to switch to digital systems. In addition, with the increasing promotion of computer networks, semiconductors and video compression technology, the new generation of video surveillance systems will undoubtedly use digital technology, and will implement standardized technology to support IP networks.

For Video Surveillance over Internet Protocol (VSIP), the hardware that handles network traffic is an important part of the camera system because the video signal is digitized and compressed by the camera before being transmitted to the video server to address the bandwidth limitations of the network. Heterogeneous processor architectures such as DSP/GPP help maximize system performance. Video acquisition, storage, and video streaming are interrupt intensive tasks that can be assigned to the GPP for processing, while the high-MIPS video compression work is left to the DSP. After the data is transmitted to the video server, the server stores the compressed video stream as a file on the hard drive, thus avoiding the video quality degradation problem that occurs with traditional analog storage devices. We have developed a variety of standards for compression technology for digital video signals, which can be divided into the following two categories:

* Motion estimation (ME) method: Every N frames is a group of pictures (GOP). We encode the first frame in the group of pictures independently, and for the other (N-1) frames, we only encode the time difference between the current frame and the previously encoded frame (i.e., the forward reference frame). Common standards are MPEG-2, MPEG-4, H.263, and H.264.
* Still image compression method: Each video frame is independently encoded as a still image. The most commonly used standard is JPEG. The MJPEG standard uses the JPEG algorithm to encode each frame.

Comparison of Motion Estimation and Still Image Compression

Figure 1 shows the block diagram of the H.264 encoder. Similar to other ME video coding standards, the H.264 encoder divides the input image into multiple macroblocks (MBs) of 16 x 16 pixels and then processes them block by block. The H.264 encoder consists of a forward path and a reconstruction path. The forward path encodes the frame into bits; the reconstruction path generates a reference frame from the coded bits. IDCT, IQ, ME, and MC in the figure below represent (inverse) discrete cosine transform, (inverse) quantization, motion estimation, and motion compensation, respectively.

Figure 1: H.264 encoder structure diagram.

In the forward path (from DCT to Q), each macroblock (MB) can be coded in intra mode or inter mode. In inter mode, the motion estimation (ME) module takes the reference MB located in the previous coded frame; while in intra mode, the reference MB is formed from samples in the current frame.

The purpose of the reconstruction path (from IQ to IDCT) is to ensure that the encoder and decoder use the same reference frame to generate the image. Otherwise, the error between the encoder and decoder will accumulate.

Figure 2: JPEG encoder architecture.

Figure 2 shows the structure of the JPEG encoder. The encoder divides the input image into multiple 8x8 pixel modules and processes them one by one. Each module first passes through the DCT module, and then the quantizer rounds the DCT coefficients according to the quantization matrix. In this process, the encoding quality and compression ratio can be adjusted according to the quantization steps. Finally, the entropy encoder encodes the quantizer output and generates a JPEG image.

Since consecutive video frames usually contain a lot of relevant information, the ME method can achieve higher compression ratios. For example, for standard NTSC resolution at 30 frames per second, the H.264 encoder can encode video at 2 mbps, achieving an average compression ratio of up to 60:1 for image quality. With the same image quality, MJPEG has a compression ratio of 10:1 to 15:1.

MJPEG has several advantages over the ME method. First, the amount of computation and power consumption required by JPEG is significantly reduced. In addition, most PCs are equipped with special decoding and display software for JPEG images. If only one or a few images are needed to record a specific event, such as a person passing through a door, then MJPEG will be more efficient. If network bandwidth is not guaranteed, then we prefer the MJPEG standard because the loss or delay of a frame will not affect other frames. For the ME method, the delay or loss of a frame will cause the delay or loss of the entire GOP, because only the previous reference frame can be obtained to decode the next frame. [page]

Many VSIP cameras have multiple video encoders, so users can choose the most appropriate video encoder for their specific application requirements. Some cameras can even implement multiple codecs simultaneously. MJPEG usually has the lowest requirements for VSIP cameras, and almost all VSIP cameras can be equipped with a JPEG encoder.

Implementation of the MJPEG standard

In a typical digital surveillance system, video is acquired by sensors, compressed, and then streamed to a video server. Interruptions to the video encoder tasks performed on the new DSP architecture are problematic because each context switch results in a large amount of register storage and cache release. Therefore, a heterogeneous architecture should be adopted, which frees the DSP from video acquisition and streaming tasks. The following block diagram shows an example of a DSP/GPP processor architecture in a video surveillance application.

Figure 3: Example of a DSP/GPP processor architecture in a video surveillance application.

When using the MJPEG standard in a DSP/GPP SoC system, developers should first split the functional modules appropriately to improve system performance.

The EMAC driver, TCP/IP network stack and HTTP server work together to output the compressed image in the form of streaming media. The video capture driver and ATA driver should be deployed on the ARM, which helps to reduce the processing pressure of the DSP. The JPEG encoder should be deployed on the core of the DSP because the DSP VLIW architecture is particularly suitable for this kind of computationally intensive work.

Once the camera acquires a video frame through the video input port on the processor, the raw image is compressed using a JPEG encoder and the compressed image is then saved to the device's hard drive.

Figure 4: Demonstration of MJPEG data streaming on the DaVinci technology-based TI DM6446 digital video evaluation board in a video surveillance system.

We usually use PC to monitor real-time video scenes, first retrieve the streaming media from the video server, then decode it, and finally display the video image on the monitor. The encoded JPEG image file can be retrieved by the device through the Internet, so we can not only monitor multiple video streams simultaneously on a PC, but also view these retrieved video streams from multiple points simultaneously through the Internet. The VSIP local end can be connected to the video server through the TCP/IP network and can be located anywhere in the network. This is a huge improvement compared to traditional analog systems. Even if there is a problem, it only affects one digital camera and not the local end. We can also dynamically configure the JPEG image quality to meet different video quality requirements.

Optimizing JPEG encoder

Among the three major functional modules of the JPEG encoder, DCT and quantizer have heavier computational tasks. We can also notice that there is a big performance difference between highly optimized assembly code and unoptimized C code for these two modules, so it is necessary to optimize these two modules.

Optimizing the 2D 8x8 DCT function module helps reduce the number of operations such as addition, subtraction, and multiplication, and avoids redundant calculations of the original equation. Currently, many fast DCT algorithms have been launched, among which Chen's algorithm is widely used in the industry. For 2D 8x8 DCT, Chen's algorithm requires 448 addition and subtraction operations and 224 multiplication operations.

The addition, subtraction and multiplication blocks can be further split into multiple functional units (all deployed on the DSP core) to execute parallel instructions and improve performance. Under the condition of negligible overhead, highly optimized DSP assembly code can successfully complete the 2D DCT calculation task within 100 cycles. Other fast DCT algorithms require less calculation, but often require more buffers to save intermediate calculation results. For new DSPs with pipelined VLIW architecture, the workload of memory data access is larger than the workload of multiplication operations, so developers should consider the balance between calculation and memory access when optimizing algorithms.

The quantization process for each pixel requires multiplication and addition operations. This calculation result usually requires only 16 bits of accuracy, while the DSP registers require 32 bits. The first idea to optimize the quantizer module is to store 2 pixels in a single register and then perform addition and multiplication operations on these two pixels; the second method is to use multiple DSP functional units in parallel. Since the DSP core in the TMS320DM6446 has 2 multipliers and 2 adders, we can quantize up to 4 pixels at the same time. The last but not least approach is to make full use of the pipeline DSP architecture. While the DSP core is quantizing the current 4 pixels, it can read the next group of "4 pixels" from the memory, so that each cycle can provide data to the multipliers and adders. The first two methods can be implemented by the developer himself writing optimized C code or assembly code. The pipeline code can use the DSP compiler.

In addition to optimizing each functional module, we can also use the Ping-Pong buffering technique to optimize the system-level JPEG encoder. The DSP core accesses data in the internal RAM (IRAM) much faster than the external DDR2 memory. However, the IRAM capacity is limited and cannot meet the requirements of the entire input frame, so only a part of the module can be processed in the IRAM at the same time. When processing the Ping-Pong set, DMA transfers the Ping-Pong set from DDR2 to IRAM, so that the DSP core can start processing the next data immediately after completing the current work.

Clearly, the digitization of video surveillance systems is here to stay. Understanding technologies such as video compression, system partitioning, and codec optimization is critical to developing next-generation video surveillance systems to meet growing demands.

Reference address：Video Compression and Data Streaming in Video Surveillance

Previous article：Application of Embedded Storage Technology in SoC Design
Next article：Embedded Java Virtual Machine Optimization Technology

Popular Resources
Popular amplifiers

Latest Microcontroller Articles

Download from the Internet--ARM Getting Started Notes
A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
Learn ARM development(22)
Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
Learn ARM development(21)
First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
Learn ARM development(20)
With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
Learn ARM development(19)
After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
Learn ARM development(14)
Learn ARM development(15)
Learn ARM development(16)
Learn ARM development(17)

He Limin Column Microcontroller and Embedded Systems Bible

Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like