Design of Video Codec IP Core Based on SOPC-EEWORLD

Collect

introduction

SOPC is a programmable system-on-chip solution proposed by Altera. It integrates the modules required for system design, such as CPU, memory, I/O interface, DSP module and phase-locked loop, into an FPGA to form a programmable system-on-chip. It optimizes the designed circuit in terms of scale, reliability, size, power consumption, function, time to market, development cycle, product maintenance and hardware upgrade [1].

Currently, IP cores of controllers including UART, SPI, Ethernet, SDRAM, Flash, and DMA are integrated in Altera SOPC">SOPC Builder. In addition, users can also design their own IP cores or purchase IP cores from third-party manufacturers according to system needs, and easily bundle them into the system through the Avalon bus like building blocks. IP cores are functionally verified intellectual property cores. Using IP cores has the following advantages: (1) improving design performance; (2) reducing product development costs; (3) shortening the design cycle; (4) strong design flexibility; (5) convenient simulation; and (6) OpenCore Plus supports risk-free applications.

Of course, the IP core functions mentioned in this paper are not that rich. In fact, it is just a user logic with correct function verification, and there is still a certain gap between it and the IP core of commercial application. The main work of this paper is to describe the logic of video signal acquisition, distribution, storage and color space conversion through hardware description language, and verify the correctness of the functions.

1. Video encoding and decoding Camera_show principle

In addition to the necessary power supply circuit, the embedded camera control system also includes storage circuit, communication circuit and download circuit, etc. All devices are connected to the Avalon bus. Here we mainly introduce the user logic interface Camera_show, which completes the function of converting analog video data into digital video data and displaying it on VGA. It mainly includes the acquisition, distribution (completed by serial-to-parallel conversion circuit), storage (completed by storage control logic and on-chip RAM) and color space conversion of analog video signals. The specific functional block diagram is shown in Figure 1.

User Logic Camera

Figure 1 Principle block diagram of user logic Camera_show

2. Video codec IP core Camera_Show design

The main functions of the video codec IP core include the acquisition, distribution, storage and color space conversion of video signals. After passing through ADV7181B, the analog video signal becomes a YUV digital signal that complies with ITU-R656. However, to process the YUV signal, the three signals must be processed separately and in parallel, so it is necessary to acquire and distribute the three signals, which is the function that the 2.1 IP core needs to implement; since the analog video signal is interlaced, but the CRT display is progressive, if it is not processed, it will inevitably lead to line stagger, so the data needs to be stored and the interlaced to progressive conversion is realized through control, which is the function that the 2.2 IP core needs to implement; finally, the processed three-way YUV digital signal needs to complete the color space conversion to become an RGB signal, which is the function that the 2.3 IP core needs to implement.

2.1 YUV signal acquisition and distribution[2]

In the embedded camera control system, ADV7181 is mainly responsible for decoding the video data of the analog camera, converting analog signals such as CVBS into YUV signals of the ITU-R656 standard. Figure 2 shows the functional block diagram of ADV7181.

ADV7181 Functional Block Diagram

Figure 2 ADV7181 functional block diagram

As can be seen from the figure, the input analog signals such as CVBS are converted by the ADV7181B chip to output YUV signals, line synchronization signals HS, and frame synchronization signals VS. These are the required digital video signals, which solves the problem of digital video source. Figure 2 shows the composition and arrangement of the YUV signal. "FF, 00, 00" is the beginning of the AV signal, so a detection circuit needs to be constructed. Note that SAV and EAV both start with FF, 00, 00, but the values of XY are different. According to the chip data, XY[4] represents V, which is the dividing point between the useful signal and the blank signal. If V=0, it represents SAV, otherwise it is EAV. XY[6] is the distinguishing mark of the field signal. 0 is the odd field and 1 is the even field. [page]

One row of analog signal is 1716 CLOCKs, and useful signal is 1440 CLOCKs. In the process of signal acquisition and distribution, only useful signal needs to be acquired, so it is necessary to use the detection of SAV as a sign to start the signal distribution process.

Since the YUV signal is interwoven in the analog signal, a signal selection circuit is required. YUV has three signals in total. A counter is designed to select. When the count is 0 and 2, it is the UV signal, and when the count is 1 and 3, it is the Y signal. What is actually completed is the process of converting serial signals to parallel signals. The above process can be represented by the principle block diagram of Figure 3.

YUV signal acquisition

Figure 3 Schematic diagram of YUV signal acquisition and distribution

In hardware description language, it is relatively simple to complete the above process. For example, for the detection circuit, you only need to describe a shift register. The specific code is as follows:

program

The wire variable Y_check is a flag that is set to 1 when FF, 00, 00 is detected. According to the above, the distinction between SAV and EAV is determined by XY[4], and the distinction between odd and even fields is determined by XY[7]. Therefore, the signal distribution circuit is only effective when the subsequent signal is SAV, so a logic needs to be described to make the judgment. The code is as follows:

program

The START signal is the sign for starting signal collection and distribution. The signal distribution circuit will work only when TD_D = 0, that is, START = 1. The serial-to-parallel circuit code is as follows:

program

The above code completes the function of Figure 3. The input signal is called TD_D, and the three output signals are Cbb, YY, and Crr. Note that there is also a YPix_clock, which is actually 27M divided by 2. This clock is very useful and will be explained in detail below. [page]

2.2 Storage of YUV Signals

To convert interlaced video signals to progressive, there are 2 solutions:

The first method is to store a frame of data. According to the difference between odd and even fields (which can be distinguished by XY[7]), during the write cycle, because there are even field signals between the odd field lines, it is necessary to jump the address when writing data. The lines are distinguished by the line synchronization signal (or SAV). When the line is changed, the address is added with an extra 720 (to store the even field signal mixed in the odd field signal). Until the even field signal appears (that is, XY[6]=1), the address is switched to the initial base address plus 720. The rest is handled the same as the odd lines. For the specific address allocation table, please refer to Figure 4.

Address allocation table

Figure 4 Address allocation table

In the read cycle, you only need to read out in sequence. It should be noted that the write clock is 13.5M, the read clock is 27M, and the Y, U, and V signals must be stored separately.

The second method is to store one line of data. Since 1716 clock cycles are just equal to the time of two lines of VGA, 7 valid video signals can be read twice during this period of time, and the odd line signal can replace the even line signal to achieve the purpose of interlaced to progressive. In terms of implementation, it only needs two RAM blocks to perform ping-pong operation, which will be explained in detail later.

Comparing the two implementation methods, the advantage of method 1 is that the image is not distorted, that is, the odd and even line signals are still interlaced together, but method 2 cannot achieve this. In addition, method 1 can also increase the running speed through the ping-pong method, but due to the asynchronous reading and writing clocks, each storage space should be read twice. Method 2 also reads twice, but reads twice for each line, while method 1 reads twice for one frame of data.

The disadvantage of method 1 is that the amount of data stored is too large. The Y component of a frame of data alone is 8bit*720*525 =3024000bit = 378KB. This data is not suitable for operation in SRAM and requires the use of SDRAM. However, operating SDRAM is relatively complicated, so method 2 is generally considered because it requires very little space and can be implemented using the on-chip resources of FPGA. When image data is transmitted very quickly, the human eye basically cannot distinguish between odd and even field signals, so method 2 is feasible. Before discussing method 2, it is necessary to understand the ping-pong operation that is often used in pipeline operations. This is a common design concept and technique for programmable logic. Ping-pong operations are often used in data flow control. A typical ping-pong operation is shown in Figure 5 [3][4].

Ping-Pong operation diagram

Figure 5 Ping-Pong operation diagram

The processing flow of the ping-pong operation is described as follows: the input data stream passes through the "input data stream selection unit" and distributes the data stream to two data buffer modules in a synchronous manner. The data buffer module can be any storage module, and the more commonly used storage units are dual-port RAM (DPRAM), single-port RAM (SPRAM) and FIFO. In the first buffer cycle, the input data stream is cached in "data buffer module 1". In the second buffer cycle, the input data stream is cached in "data buffer module 2" through the switching of the "input data stream selection unit". At the same time, the data of the first cycle cached in "data buffer module 1" is sent to the "data stream operation processing module" for operation processing through the selection of the "output data stream selection unit". In the third buffer cycle, the input data stream is cached in "data buffer module 1" through the switching of the "input data stream selection unit" again. At the same time, the data of the second cycle cached in "data buffer module 2" is sent to the "data stream operation processing module" for operation processing through the selection of the "output data stream selection unit". This cycle repeats over and over again.

The most significant feature of the ping-pong operation is that the buffered data stream is sent to the "data stream operation processing module" without any pause through the rhythmic and coordinated switching of the "input data stream selection unit" and the "output data stream selection unit" to be operated and processed. If we regard ping-pong as a whole and look at the data from both ends of this module, the input data stream and the output data stream are continuous without any pause, so it is very suitable for pipeline processing of data streams. Therefore, the ping-pong method is often used in pipeline algorithms to complete seamless buffering and processing of data. [page]

In FPGA, the use of ping-pong operation is a reflection of the principle of area and speed trade-off.

Method 2 can be implemented as follows: Use Megacore inside the FPGA to construct a dual-port RAM. The hardware description language definition of the input and output signals of the dual-port RAM is as follows:

program

The signals used include: data signals data_a, dat_b; read and write valid signals wren_a, wren_b; address signals address_a, address_b; clock signals clock_a, clock_b; output data signals q_a, q_b. It can be seen that all signals appear in pairs, just for ping-pong data transmission. It is divided into two RAM areas, A and B, which are equivalent to the data buffer modules 1 and 2 in the ping-pong method mentioned above. The two RAM blocks are read and written alternately (determined by I_a and I_b), and the output data flow is also determined by I. As mentioned earlier, the write clock is 13.5M and the read clock is 27M, so clock_a and clock_b must be read and write clocks switching input, and the address count is also different. The clock for address increase during the write cycle is 13.5M, and the clock for address increase during the read cycle is 27M. Therefore, the data of each row is read twice, which is equivalent to changing from interlaced to progressive. Figure 6 is a simulation diagram of the ping-pong operation function of RAM under Quaartus II:

RAM Ping-Pong Operation Simulation Diagram

Figure 6 RAM Ping-Pong Operation Simulation Diagram

The allocation table of the RAM block for ping-pong operation signals is as follows:

RAM block ping-pong operation signal allocation table

The final output DATA signal enters the next level unit, which is the conversion from YUV to RGB. [page]

2.3 Design of Color-Space Conversion Part [5]

Why is this conversion necessary? Because both TVs and CRT monitors use the RGB three-primary color synthesis method to display colors. Using RGB three-primary colors to represent colors is indeed very intuitive, but it is definitely not a good method if used for image transmission. The main reasons are:

(1) Incompatible with black and white images;

(2) Occupies too much bandwidth;

(3) Poor anti-interference ability.

The image sensor of this system outputs YCbCr signal, which needs to be converted to RGB signal for CRT display. YCbCr to RGB is converted according to the following formula:

R = 1.164 ( Y-16 ) + 1.596 ( Cr-128 )；

G = 1.164 ( Y-16 ）- 0.813 ( Cr-128 ) - 0.392(Cb-128)；

B = 1.164 ( Y-16 ) + 2.017 ( Cb-128 )；

From the above formula, we can see that the conversion requires multiplication and addition operations, and decimals are used in the formula, so the coefficients must be magnified. After reasonable conversion, the formula is as follows:

R = (1/256) * ( 298*Y + 409*Cr - 57065 )；

G = (1/256) * ( 298*Y - 100*Cb - 208*Cr + 34718 )；

B = (1/256) * ( 298*Y + 516*Cb - 70861 )；

Use Verilog HDL to write code to achieve the conversion from YUV to RGB. It includes 3 modules and 1 simulation stimulus. In the module const_mult, the multiplication operation is mainly implemented. The main code is as follows:

program

In the module csc.v, the const_mult module is called, the values of the parameters IN_SIZE, OUT_SIZE, and CST_MULT are changed through parameter passing, and then the addition operation is implemented.

Taking R = (1/256) * ( 298*Y + 409*Cr - 57065 ) as an example, the main codes are as follows:

program [page]

The code used to implement G and B is similar to the above, so I will not repeat it here. The following code implements the R_full*1/256 function.

program

The main module yuv2rgb implements the calling of submodules and is simulated using Modelsim. The simulation waveform is shown in Figure 7:

Simulation waveform

Figure 7 YUV to RGB conversion simulation diagram

3. Conclusion

This paper designs a video codec controller IP core based on SOPC">SOPC. According to the top-down design concept, the IP core is divided into hierarchical functions, and the IP core is simulated and verified to achieve the acquisition, distribution, storage and color space conversion of video signals. This IP core has good portability and can be easily applied to various embedded systems that require video codec controller functions with Nios II as the core.

Keywords：IP core Reference address：Design of Video Codec IP Core Based on SOPC

Previous article：Application of PSoC in the acquisition of fiber optic gyroscope pulse output
Next article：Digital Frequency Domain Interference Canceller Based on Xilinx FPGA

Popular Resources
Popular amplifiers