Is it faster for STM32 code to run in RAM or Flash?

Publisher:JoyfulHeartedLatest update time:2015-04-14 Source: eechinaKeywords:STM32 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
This is definitely a question that many people are concerned about. Let's take a look at an example to see what kind of conclusion we will get:
 
The test methods are as follows:
 
The main loop is always doing a variable self-increment (sum1++), of course, the premise is to ensure that there is no overflow.
 
Using the Systick count inside the Cortex-M3, with a limit of one second, the value of sum1 can determine which method is faster. For the sake of rigor, we observe the counting effect between the first second and the second second; rather than from the 0th second to the 1st second (because there may be a gap between enabling Systick and actually starting to execute sum1++). When entering the Systick ISR for the first time, record the value of sum1; when entering the Systick ISR for the second time, record the value of sum1 again. The difference between the two values ​​is how many times sum1 executes self-increment in the one-second interval. This shows which method is faster.
 
Same test premise: Prefetch Buffer Enable + Flash Latenty="2" (according to the Flash Programming Manual, when 48MHz
 
The test results are as follows:
 
Without code optimization, executing the program in RAM: sum1 counts 69467/sec
Without code optimization, executing the program in FLASH: sum1 counts 43274/sec (runs slower in Flash)
 
/***********The code in the loop body is less than N blocks*************/
(1)LDR R0,[PC, #0x154]
(2)LDR R1,[PC, #0x154]
 
(3)LDR R1,[R1,#0]
(4)ADDS R1, R1,#0x1
 
(5)STR R1,[R0, #0]
 
    ......
/****************************************************/
 
Turn on the speed optimization switch and execute the program in RAM: sum1 counts 98993/second
Turn on the speed optimization switch and execute the program in FLASH: sum1 counts 115334/second (runs faster in Flash)
 
/***********The code in the loop body is less than N blocks*************/
(1)LDR R1,[R1,#4]
(2)ADDS R1, R1,#0x1
(3)STR R1,[R0, #0]
    ......
/****************************************************/
 
The conclusion is:
 
1) Whether a program runs faster in RAM or in Flash is not absolute and depends on the code;
 
2) Regarding the above two specific code situations, I think that without optimization, if the code is executed in Flash: (1) (2) instruction fetch (read flash) -> decode -> execute (read flash); the target address of flash in the instruction fetch and execution stages is not continuous, so it is non-sequencial access, so it will be very slow;
When optimization is turned on, (1), (2), and (3) will not cause non-sequential access to the flash, so the advantages in the flash (instruction and data fetches use different buses ICode and DCode and Prefetch) are reflected.
 
Further analysis leads to the following conclusions:
 
When there is no optimization, constants need to be fetched from Flash when instructions are executed, resulting in an interruption of the instruction prefetch queue. After fetching the constants, the instruction prefetch queue needs to be refilled, and Flash access needs to insert a waiting cycle, which of course takes a longer time.
 
After code optimization, there is no need to fetch constants from Flash when executing instructions, the instruction prefetch queue will not be interrupted, and the effect of inserting wait cycles for Flash access is offset by the instruction fetch buffer introduced in the post below, so the speed is naturally faster; at this time, execution in RAM is slower because RAM is not on the ICode bus, and fetching instructions from RAM requires a detour, which is of course slower than Flash on the ICode bus.
 
Regarding the performance of Flash, please see my other analysis: [Analysis] Timing analysis of STM32 running programs from Flash
 
In addition, the bus architecture of STR9 is the same as that of STM32. Here is some measured data of the FFT function implemented on STR9, which can further illustrate that running code in Flash can be faster than in RAM!
 
There is a DSP function library on ST's website. This is its document "STR91x DSP library (DSPLIB)". In this document, there is a section discussing the FFT operation speed, which gives a comparison of the actual operation time. The excerpt is as follows:
 
Radix-4
Complex FFT     Operation Mode     Cycle Count     Microseconds
64 Point     Program in Flash & Data in SRAM     2701     28.135
64 Point     Program & Data in SRAM     3432     35.75
64 Point     Program & Data in Flash     3705     38.594
256 Point     Program in Flash & Data in SRAM     13740     143.125
256 Point     Program & Data in SRAM     18079     188.323
256 Point     Program & Data in Flash     19908     207.375
Keywords:STM32 Reference address:Is it faster for STM32 code to run in RAM or Flash?

Previous article:How to send multiple packets using the USB non-control endpoint of STM32
Next article:How to use PC14 and PC15 in STM32

Recommended ReadingLatest update time:2024-11-17 00:30

stm32 AD reference voltage
Recently, I encountered a problem when designing the schematic diagram, that is, the STM32 chip with 100 pins or less has no Vref. The power pins of the chip with 64Pin and below package are: VDD - MCU 3.3V power positive, VSS - MCU 3.3V power negative, VDDA - MCU A/D converter power positive, VSSA - MCU A/D converter
[Microcontroller]
[STM32 Motor FOC] Record 15 - TIM input capture
Input capture principle and configuration steps   1. Input Capture Concept   STM32 input capture, in simple terms, detects the edge signal on TIMx_CHx (channel X of timer X), and when the edge signal changes (such as rising edge/falling edge), stores the current timer value (TIMx_CNT) in the capture/compare register (
[Microcontroller]
[STM32 Motor FOC] Record 15 - TIM input capture
STM32 clock system
As we all know, the clock system is the pulse of the CPU, just like a person's heartbeat. So the importance of the clock system is self-evident. The clock system of STM32 is relatively complex, unlike the simple 51 microcontroller where one system clock can solve everything. So some people ask, isn't it simple to use
[Microcontroller]
STM32 clock system
S3C2440-Bare Metal Edition-08 | Using S3C2440 to operate SDRAM (Configuring the memory controller)
1 Introduction When it comes to SDRAM, everyone thinks it is too difficult. It is even more difficult to program the control timing of SDRAM. Yes, that's right! I thought so a year ago. I found it very difficult to learn the timing of this section. I watched the video several times but didn't understand it. I didn't u
[Microcontroller]
S3C2440-Bare Metal Edition-08 | Using S3C2440 to operate SDRAM (Configuring the memory controller)
STM32 learning record 14 serial port interrupt in ucosii
First, let’s look at what Teacher Shao wrote in his book. It says: In μC/OS, the interrupt service subroutine must be written in assembly language. However, if the C language compiler used by the user supports online assembly language, the user can directly put the interrupt service subroutine code in the C langu
[Microcontroller]
STM32 learning record 14 serial port interrupt in ucosii
Read the unique identity register of the stm32 product
Read the unique identity register of the stm32 product voidGet_ChipID(void) {     uint32_t temp0,temp1,temp2;         temp0 = *(__IO uint32_t*)(0x1FFF7A10);             temp1 = *(__IO uint32_t*)(0x1FFF7A14);         temp2 = *(__IO uint32_t*)(0x1FFF7A18); // temp0=(*( uint32_t *)0x1FFF7A10); //Product unique identifica
[Microcontroller]
Motorola One Vision Plus: 6.3-inch screen + 4GB memory
       In May 2019, Motorola released the entry-level phone One Vision, and now the successor of the phone has appeared in Google's Android Enterprise Directory, with the device name "Motorola One Vision Plus". Foreign media speculate that this phone has entered the late development stage and may be officially release
[Mobile phone portable]
stm32 download problem
The reason is that JTAG is already occupied, so of course you can't use JTAG to operate! At this time, you must ensure that the CPU does not enter the normal operating state before you can use JTAG. Solution: options for target ---- Debug---- upper right ---- use the setting button behind------ change JTAG under
[Microcontroller]
stm32 download problem
Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号