[Project Source Code] Altera SOPC FrameBuffer System Design Tutorial
[Copy link]
This article and design code were written by FPGA enthusiast Xiao Meige. Without the author's permission, this article is only allowed to be copied and reproduced on online forums, and the original author must be indicated when reprinting.
In embedded systems, LCD screens are widely used in various systems as the most friendly human-computer interaction method. In systems based on ARM processors, the application is even more extensive. As a member of the broad embedded system, FPGA naturally needs to drive the display screen to display some content. For example, there is often a demand to use FPGA to make LCD test frames and display driver test cards. Many friends who have learned FPGA know that it is relatively easy to drive VGA monitors with FPGA. Almost every board manufacturer provides information such as displaying color bars, displaying patterns, and even displaying simple text. However, when you want to display more complex content, it is often difficult to do so. Because FPGA designs circuits and hardware, and its characteristics are simple-minded and well-developed. To display complex content, a well-developed controller is needed to perform this task.
In the development of FPGA over the years, it has experienced simple logic expansion, complex logic design, SOPC system and SOC system. I won’t say much about the logic design part. For SOPC system design, Xilinx has MicroBlaze soft core and Altera has the famous NIOS II soft core processor. In the SOC era, Xilinx and Altera (now acquired by Intel and became Intel's programmable division) have launched SOC chips with embedded dual-core ARM Cortex-A9, Xilinx's famous Zynq7000 series, and Altera's Cyclone V SOC, which has been promoted in colleges and universities.
Although the application of SOC will become more and more extensive with the development of technology and the passage of time, SOPC technology still has its value for our general learning and use. After all, embedded hard-core SOC chips cost hundreds of yuan, while the soft-core solution can complete a system design with a BOM cost of less than 40 yuan, and the advantage is still obvious. In this example, we will lead you to use the lowest-end FPGA chip EP4CE6/EP4CE10 of the Altera Cyclone IV series to design a system that can drive a 640*480 resolution display to dynamically display complex content. For example, display text, display pictures
We have explained the driver of RGB display screen before. The driver timing of RGB display screen is almost the same as that of VGA, except that the timing parameters are different between different resolutions. Speaking of driving 480*272 resolution display screen to dynamically display complex content, I believe many friends are familiar with it. Yes, in our previous SOPC open class, we have explained to you a way to implement this system, that is, using an independent SRAM as the display memory of the display screen. The display driver module directly reads the data in the SRAM and refreshes the display in real time, while the NIOS II CPU only performs write operations to the SRAM when the display content needs to be refreshed. This method requires an independent SRAM as the display memory and an SDRAM as the running memory of NIOS II. In addition to the use of SRAM increasing the BOM cost of the system, SRAM will also occupy more pins of the FPGA chip, resulting in the need to use a 256-pin BGA chip to achieve it. Therefore, this solution is only suitable for use in occasions where cost is not very sensitive. The solution to be introduced in this section is an improvement and optimization of the system.
The system described in this section only requires the following hardware resources:
1. EP4CE6/EP4CE10 FPGA chip (30 yuan)
2. 16Mbit or above SDRAM chip (3 yuan, such as Winbond w9816g6)
3. 50M active crystal oscillator (3 yuan)
4. 4Mbit SPI FLASH chip (1.5 yuan, such as W25Q80)
5. 3.3V, 2.5V, 1.2V LDO regulated power supply (AMS117 series, less than 1 yuan in total)
The purpose of this section is
to display a frame of image on the display and display the character content at the specified position
1
The above framework is the minimum hardware system required to implement this system solution after analysis. In this section, we will complete the FPGA system construction and NIOS II software design based on the AC620 learning kit of Core Line. The overall hardware configuration of the AC620 FPGA development board of Core Line is higher than the above analysis, which helps us to verify the prototype.
FPGA system construction
This system is mainly completed in the Qsys system, including the following IP cores:
clk_0: input clock management unit, the system adds altpll_0 by default
: PLL phase-locked loop, divides and multiplies the input clock to obtain two clock signals with a frequency of 100MHz and a phase difference of 180 degrees, which are respectively provided to the logic circuit work (including NIOS II soft core processor) and SDRAM chip in the system. And a 24M clock signal is used for the VGA driver circuit with a resolution of 640*480.
NIOS II CPU: realizes system control and display content processing.
SDRAM: NIOS II CPU running memory and TFT display image frame buffer.
onchip_memory: on-chip memory, specifies the data transfer to be performed by SGDMA, and is mainly used as the descriptor memory of SGDMA.
lcd_sgdma: SGDMA IP, mainly realizes the efficient transfer of large amounts of data, supports stream mode, and is more efficient than the DMA core of the Avalon MM interface.
timing_adapter: timing matching IP core, mainly used to add a certain delay to some signals when the stream data is transmitted between two different modules, so that the data and the flag signal can be completely synchronized (to put it simply, it is to delay some signals by using the register beat method to align the data and control signals).
fifo: dual-clock fifo, mainly completes the transmission speed conversion of the data stream. The data stream output by SGDMA is carried out at the same clock speed as the memory (100M in this case), while the data use end, that is, the VGA display part, uses the data according to the 9M clock, so the dual-clock fifo is used to solve the problem of crossing the clock domain.
VGA_SINK: This IP is a third-party open source IP provided by Terasic Technology. It mainly converts Avalon ST stream data into VGA line field timing and drives the VGA screen for display.
EPCS: EPCS FLASH memory control IP. With this IP, the EPCS chip can store FPGA firmware and NIOS II programs.
jtag_uart0: Debug serial port, mainly used to print some debugging information, which is very useful in the early stage of design debugging2
Next, the construction process of this system will be explained in detail.
1.
In order to ensure that the entire system can have a high performance, altpll_0 runs at 100MHz, so it is necessary to use PLL to multiply the 50MHz input clock signal generated by the external crystal oscillator to obtain two clock signals with a frequency of 100M and a phase difference of 90 degrees. In addition, a 9MHz clock signal is required for the 480*272 resolution TFT screen driver. The detailed addition and parameter modification methods of PLL will not be explained here, only the setting results are provided:
Inclk0: 50M
does not require areset signal and locked signal
C0: 9MHz
C1: 100MHz, phase is 0
C2: 100MHz, phase is -180 degrees
2.
All settings of nios ii cpu use the default values. After adding sdram and EPCS, set the reset vector (Reset Vector) to epcs and the exception vector (Exception Vector) to sdram.
3. The sdram
data width (Data Width) is 16bit.
The architecture is one chip select, 4 banks.
The address width is 12 for row address and 9 for column address.
Others are left as default.
3
4. onchip_memory
onchip cache onchip_memory is used as the descriptor memory of SGDMA to store the data transmission description of SGDMA.
The type is RAM;
the data bit width is 32 bits;
the total memory is 16384Bytes (users can increase or decrease this value appropriately according to the RAM capacity of their own chip);
4
5. lcd_sgdma
selects the transmission mode as Memory To Stream mode;
the data width is 16 bits, which is consistent with the Avalon MM bus width of the memory (SDRAM).
Other settings can be left as default, as shown in the figure below.
5
6. The timing_adapter
timing matching IP core is mainly used to add a certain delay to some signals when the stream data is transmitted between two different modules, so that the data and the flag signal can be completely synchronized (to put it simply, it is to delay some signals by using the register beat method to align the data and the control signal).
Set it as shown in the figure below, check all the options, set the input Ready Latency to 0, and the output Ready Latency to 1. Each data has a total of two symbols, and each symbol contains 8 bits of data. (According to the actual meaning of the RGB data stream, there should be 3 symbols for each data here, but this design uses the 16-bit color RGB565 mode, that is, RGB has a total of 16 bits, so when building the system, it is assumed that there are only 2 symbols, but when the final data is output to VGA, the 16-bit data is split into RGB565 and sent to the R, G, and B color channels of VGA respectively)
6
All the above modules work at a clock frequency of 100MHz. The main function is to read the data to be displayed from SDRAM. The read data will eventually be sent to the TFT display screen for display. For the TFT display part, the data transmission rate is 9M for this system, so it is necessary to realize the cross-clock domain transmission of data from the 100M clock domain to the 9M clock domain. The best way to realize the cross-clock domain transmission of data stream is to use a dual-clock fifo. Therefore, it is necessary to add a dual-clock fifo to cache the data in the 100M clock domain first, and then wait for the reading logic in the 9M clock domain to read it.
7.
Set the fifo depth to 512 bytes, each data contains 2 symbols, each symbol consists of 8 bits of data, that is, the entire data width is 16 bits. Set the clock to dual clock mode (Dual Clock Mode), use the Avalon ST interface for both input and output, and enable the data packet (Enable Packet Data). By the way, the embedded block RAM in the Cyclone IV E device is M9K memory, and each device has several M9K memories. The so-called M9K memory means that there are 9Kb storage bits in a block memory. Each M9K memory can be configured as follows:
8192 × 1
4096 × 2
2048 × 4
1024 × 8
1024 × 9
512 × 16
512 × 18
256 × 32
256 × 36
Since the data in this design is 16 bits, if you want to use only one M9K memory to implement this Fifo, the storage depth can be selected between 1 and 512. If we choose a storage depth of 128, then only 1/4 of the storage capacity of M9K will be used. However, the remaining 3/4 of the storage capacity cannot be used alone and will be wasted forever in this design. Therefore, it is better to set the storage depth directly to 512, which can also ensure that the cache is large enough to avoid accidental data loss
.
8. VGA_SINK
The VGA_SINK module is modified from the design provided by Terasic. The IP originally only supports 24-bit color mode. However, in this system, since SDRAM is 16-bit wide, if the 24-bit color mode is used forcibly, two data must be read from SDRAM to splice into one pixel, resulting in excessive load on SDRAM, which makes the system bandwidth encounter bottlenecks and cannot work properly. Therefore, the IP is modified, that is, it is reduced to 16-bit RGB565 mode.
(According to the actual meaning of RGB data stream, each data should have 3 symbols here, but this design uses 16-bit color RGB565 mode, that is, RGB has only 16 bits in total, so when building the system, it is assumed that there are only 2 symbols. Only inside the VGA_SINK module, when the final data is output to VGA, it is modified to split the 16-bit data into RGB565 and send them to the R, G, and B color channels of VGA respectively). The parameters here are set as follows:
H_DISP (row display valid pixels): 640
H_FPORCH: 16 clock cycles
H_SYNC: 96 clock cycles
H_BPROCH: 48 clock cycles
V_DISP (row display valid pixels): 480
V_FPORCH: 1 clock cycle
V_SYNC: 2 clock cycles
V_BPROCH: 33 clock cycles
8
These parameters are all found in the VGA standard.
9. EPCS
There is nothing special to pay attention to when adding EPCS, just follow the default settings.
So far, we have completed the component addition work of the entire display system, and the next step is to connect the bus between the components.
Before the bus connection, we must first explain the data flow of this system, because this system is not quite the same as the system architecture we talked about in the SOPC public class before. This system mainly introduces the Avalon ST bus. The various examples mentioned in the previous public class are all based on the Avalon MM bus. All peripheral IPs are connected to the NIOS II CPU through the Avalon MM bus, so the bus connection is very simple. However, the data handling capacity of the Avalon MM bus is relatively weak and cannot support high-speed and large-scale data transmission. In the data flow based on the Avalon ST bus, the NIOS II CPU is actually bypassed and only plays the role of state management. All data flows are interacted between modules through the Avalon ST bus, and NIOS II does not need to perform the handling work, so the data handling efficiency is greatly improved.
This section does not explain the Avalon ST bus too much. In short, we can imagine the Avalon ST bus as a water pipe. Water can flow easily and quickly from one end of the water pipe to the other end without the need for a third party to scoop it up one by one.
Framebuffer system data flow
In the entire FrameBuffer system, the SGDMA module is the core part. It implements the direct read operation of the SDRAM memory and outputs the read data in the form of a data stream. SGDMA is widely used, not only in this system, but also in many official designs of Altera. SGDMA IP cores are used, such as PCIE examples, Gigabit Ethernet examples, VIP examples, etc. For a detailed introduction to SGDMA, please refer to the SGDMA user manual. We also hope to launch a special explanation of SGDMA as soon as possible. Here we only briefly describe the ports and functions of SGDMA.
In this system, SGDMA is configured from storage mapping mode to data stream mode, that is, the source of data is organized in the form of address mapping, and the receiver of data is received in the form of data stream. In this mode, SGDMA has a total of 5 ports, namely the control bus of the Avalon MM Slave interface, the source data read bus of the Avalon MM Master interface, the descriptor write bus of the Avalon MM Master interface, the descriptor read bus of the Avalon MM Master interface, and the final Avalon ST Source data output bus
.
csr: The control bus of the Avalon MM Slave interface, which is connected to the NIOS II CPU. The NIOS II CPU reads and writes the control and status registers inside the SGDMA through this interface to realize the transmission control of the DMA. This is also the only place where NIOS II needs to participate in the entire data stream transmission. NIOS II actually plays the role of a boss, arranging what SGDMA should do. After receiving the task, SGDMA goes down to do the hard work of carrying, while the boss NIOS II can sit in the office and drink coffee, read newspapers, and wait for SGDMA to come back and report after the hard work (interrupt) or call every once in a while to ask about the work status and progress (query).
m_read: The source data read bus of the Avalon MM Master interface. This interface realizes the reading of the data source memory mounted on the Avalon MM bus that needs to be transmitted, such as SDRAM and DDR2 SDRAM. From here, you can also see that in the system built by Qsys, not only NIOS II can read SDRAM, DDR2 and other memories. In fact, as long as it is an IP of Avalon MM Master interface, it can read these memories. We can even write an Avalon MM Master interface logic to replace NIOS II CPU to complete the read and write operations of various IP cores.
descriptor_write: descriptor write port. SGDMA needs a descriptor to realize data transmission. All transmissions of SGDMA are controlled by descriptors. The real-time transmission status of SGDMA is stored through descriptors. Here, an on-chip memory is used to store the descriptor of SGDMA, which can save FPGA resources. Otherwise, if FPGA resources are used to implement descriptors, it will bring very large resource consumption.
descriptor_read: descriptor read port. SGDMA uses this port to read descriptors to obtain transmission information.
out: data stream output port. The data read from SDRAM or DDR2 will flow out through this port to provide to the data user.
For our data flow analysis, we can temporarily ignore the csr, descriptor_write and descriptor_read ports, because these ports are not on the data path. The ports that are actually on the data path are the m_read (data inflow) and out (data outflow) ports. The following figure is the data flow diagram of this system.
10
NIOS II CPU uses SDRAM as running memory to store programs and data. At the same time, SGDMA reads the data that needs to be displayed on the VGA screen from SDRAM in real time. The arbitration between NIOS II and SGDMA for the use of SDRAM is automatically handled by the arbitration mechanism of the Avalon MM bus.
Through the above introduction, we understand the data flow of the system, and we can first complete the connection of the data stream bus. Each module of the st stream interface has two stream ports, one is the stream input port (in: Avalon Streaming Sink), and the other is the stream output port (out: Avalon Streaming Source). When connecting, just connect the stream out of the upper-level module to the stream in port of the lower-level module to achieve automatic connection of the data stream. In this example, the connection of each module port is shown in the following table:
11
After the overall connection is completed, the result is as shown in the figure below:
12
Of course, after connecting the bus, don't forget to connect the reset network, export the signals that need to be led out to the top level of the system, and the interrupt network.
Change the reset vector of NIOS II to EPCS and the exception vector to SDRAM.
Automatically assign the base address. Copy
the instantiation template to the HDL Eaxmple column.
Save and start generating.
At this point, the entire system has been built. The next step is to add the system to the Quartus II project. The first step is to add the mysystem.qsys file to the project, and then complete the writing of the top-level file of the project. These most basic operations will not be repeated here. The following is the top-level code of the project:
module Framebuffer_VGA(
input wire clk_50m,
input wire reset_n,
output wire vga_clk,
output wire vga_de,
output wire [7:0] vga_r,
output wire [7:0] vga_g,
output wire [7:0] vga_b,
output wire vga_hs,
output wire vga_vs,
output wire vga_bl,
output wire sdram_clk,
output wire [11:0] sdram_addr,
output wire [1:0] sdram_ba,
output wire sdram_cas_n,
output wire sdram_cke,
output wire sdram_cs_n,
inout wire [15:0] sdram_dq ,
output wire [1:0] sdram_dqm,
output wire sdram_ras_n,
output wire sdram_we_n,
output wire epcs_dclk,
output wire epcs_sce,
output wire epcs_sdo,
input wire epcs_data0
);
assign vga_bl = 1;
wire vga_clk_r;
assign vga_clk = ~vga_clk_r; // Invert the VGA clock to ensure data center alignment
mysystem u0 (
.clk_50m_clk (clk_50m),
.reset_50m_reset_n (reset_n),
.vga_clk (vga_clk_r),
.vga_de (vga_de),
.vga_r (vga_r), .vga_g
(vga_g),
.vga_b (vga_b),
.vga_hs (vga_hs),
.vga_vs (vga_vs),
.altpll_0_phasedone_conduit_export (
), .altpll_0_locked_conduit_export (),
.altpll_0_areset_conduit_export (),
.epcs_dclk (epcs_dclk),
.epcs_sce (epcs_sce),
.epcs_sdo (epcs_sdo),
.epcs_data0 (epcs_data0),
.sdram_clk_clk (sdram_clk),
.sdram_addr (sdram_addr),
.sdram_ba (sdram_ba),
.sdram_cas_n (sdram_cas_n),
.sdram_cke (sdram_cke),
.sdram_cs_n (sdram_cs_n), .sdram_dq
(sdram_dq),
.sdram_dqm (sdram_dqm),
.sdram_ras_n (sdram_ras_n),
.sdram_we_n (sdram _we_n)
);
endmodule
Then assign pins. For the pin assignment of this system on the AC620 development board, please refer to the AC620 development board pin assignment table.
After assigning pins, remember to set all dual-function pins to user mode in the Quartus II software.
13
Compile the project to get the sof programming file.
At this point, the entire FrameBuffer system hardware design is complete. The next step is to write the corresponding driver and application and run it on the target board.
Create an Eclipse project
Open the NIOS II EDS software and import the two projects we provide, VGA and VGA_bsp. Modify the 7th and 9th lines of the setting.bsp file to the corresponding locations on your computer. Close the file and build the project. If the compilation fails and it prompts that there is no permission, please close all files first, then select the entire folder and right-click to obtain administrator permissions.
14
The following is the content of the main function
#include "stdio.h"
#include "stdlib.h"
#include "io.h"
#include "sys/alt_alarm.h"
#include "altera_avalon_sgdma.h"
#include "altera_avalon_sgdma_descriptor.h"
# include "altera_avalon_sgdma_regs.h"
#include "alt_types.h"
#include "alt_video_display.h"
#include "unistd.h"
#include "pic1.h"
#include "pic2.h"
#include "system.h"
#define WIDTH 640
#define HEIGHT 480
#define NUM_FRAME 1
int main() {
unsigned int d = 0;
////Initial LCD Display
alt_video_display* display_global;
// printf("Initializing LCD display controller\n ");
display_global = alt_video_display_init(LCD_SGDMA_NAME, // Name of video controller
WIDTH, // Width of display
HEIGHT, // Height of display
16, // Color depth (32 or 16)
SDRAM_BASE + SDRAM_SPAN / 2 , // Where we want our frame buffers
ONCHIP_MEMORY_BASE, // Where we want our descriptors
NUM_FRAME);
// if (display_global)
// printf(" - LCD Initialization OK\n");
// else
// printf(" - LCD FAILED\n");
alt_video_display_clear_screen(display_global, 0xff);
show_pic(display_global, pic1);
usleep(1000000);
while(1){
}
}
After the compilation is completed, download sof and execute run in nios ii eds. Connect a VGA monitor and you can display a picture of a beautiful woman on the screen, as shown below:
|