[STM32F769Discovery development board trial] Discussion and application of DMA2D screen refresh efficiency
[Copy link]
This post was last edited by donatello1996 on 2020-7-19 18:51
The high-end models of the STM32F4 and STM32F7 series can use the DMA2D graphics accelerator to quickly move the graphics image cache on the LTDC display peripheral. The DMA2D peripheral is equivalent to a fast channel, which can quickly move the data in the memory address to the address of the LTDC peripheral bus. This method is different from the user's direct operation of the GPU memory address. There are two different principles and uses. The question is, if it is for a range or even a full-screen refresh, how far is the efficiency of these two methods? The official data given is an ideal value. The actual test must be assisted by instruments. Here I will simply use the GPIO pin level flip + oscilloscope to read the waveform to discuss. Connect the oscilloscope's input probe No. 1 to the board and turn on channel 1:
Use an oscilloscope to check the delay time of various codes. The easiest way is to use the GPIO pin level flip method. Here I use the PJ1 pin of the Arduino pin header on the board. The initialization code and register level flip code are as follows:
void PJ1_Init()
{
__HAL_RCC_GPIOJ_CLK_ENABLE();
GPIO_InitTypeDef GPIO_InitStruct;
GPIO_InitStruct.Pin = GPIO_PIN_1;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
GPIO_InitStruct.Alternate = GPIO_NOPULL;
HAL_GPIO_Init(GPIOJ, &GPIO_InitStruct);
}
GPIOJ->BSRR=0x00000002;
GPIOJ->BSRR=0x00020000;
First, let's look at the efficiency of using direct operation of the video memory address to draw points. The code directly uses the point drawing function:
void fun(int color)
{
int i,j;
for(i=0;i<800;i++)
for(j=0;j<480;j++)
BSP_LCD_DrawPixel(i,j,color);
}
while (1)
{
fun(0xffffff00);
GPIOJ->BSRR=0x00000002;
fun(0xffff00ff);
GPIOJ->BSRR=0x00020000;
}
Then check the oscilloscope waveform:
It can be seen that the delay of refreshing the full screen in this way of drawing points one by one is about 50~60ms per frame.
Then use BSP_LCD_Clear to refresh the screen directly, which is done using the DMA2D channel:
while (1)
{
BSP_LCD_Clear(0xffffff00);
GPIOJ->BSRR=0x00000002;
BSP_LCD_Clear(0xffff00ff);
GPIOJ->BSRR=0x00020000;
}
Looking at the oscilloscope waveform, there is a significant improvement, about 7~8ms per frame, and the efficiency is about 7 times that of the point drawing method:
Finally, here is the highlight of today, using DMA2D to brush custom image data. There are some problems with the official code here. The official code supports three image color formats: ARGB888, RGB888, and RGB565. However, I have tried them all, and only RGB888 and RGB565 can be brushed out normally. I don’t know the reason, but it doesn’t matter, because the most commonly used array in BMP format is RGB888 24-bit color, and it doesn’t matter even if ARGB8888 is not supported. Here I modified the official BMP file data brushing function and changed it to directly use the BMP array as a parameter without introducing the file header data:
void BSP_LCD_DrawBuffer(int Xpos, int Ypos,int width,int height,int bit_pixel,uint8_t *buf)
{
uint32_t index = 0;
uint32_t Address;
uint32_t InputColorMode = 0;
Address = 0xC0000000 + (((800*Ypos) + Xpos)*(4));
switch(bit_pixel)
{
case 16: InputColorMode = DMA2D_INPUT_RGB565; break;
case 24:InputColorMode = DMA2D_INPUT_RGB888; break;
case 32: InputColorMode = DMA2D_INPUT_ARGB8888; break;
}
buf += (index + (width * (height - 1) * (bit_pixel/8)));
for(index=0; index < height; index++)
{
LL_ConvertLineToARGB8888((uint32_t *)buf, (uint32_t *)Address, width, InputColorMode);
Address+= (BSP_LCD_GetXSize()*4);
buf -= width*(bit_pixel/8);
}
}
-Xpos and Ypos are the image starting XY coordinates
-width and height are the image width and height
-bit_pixel is the bit length, optional parameters are 16, 24 and 32
-uint8_t *buf is the image data, taken from the Image2Lcd software
The default mode of DMA2D is to scan from bottom to top, so the option should be selected
In the Image2Lcd software, set the image width and height to 600 and 360 respectively. This is because if the full screen is displayed, it is impossible to save the data of two pictures in Flash at the same time. Therefore, only pictures with 3/4 of the width and height can be selected. After the setting is completed, refresh the image data of the two pictures in the main loop:
while (1)
{
BSP_LCD_DrawBuffer(100,60,600,360,24,(uint8_t *)image1);
GPIOJ->BSRR=0x00000002;
BSP_LCD_DrawBuffer(100,60,600,360,24,(uint8_t *)image2);
GPIOJ->BSRR=0x00020000;
}
The oscilloscope waveform shows that no matter it is the full-screen 800*480*24 cache data or the small 600*360*24 picture, there is no obvious difference in DMA2D transmission efficiency, both of which are 7~8ms per frame.
Finally, put the full screen display picture effect:
Finally, it is important to point out that all the above tests were performed at a main frequency of 200MHz, and the main frequency of the LTDC bus also directly follows the official DMA2D default parameters. It is generally not recommended to use a faster multiplier than the official one, as it is likely to cause screen tearing.
|