This is definitely a question that many people are concerned about. Let's take a look at an example to see what kind of conclusion we will get:
The test methods are as follows:
The main loop is always doing a variable self-increment (sum1++), of course, the premise is to ensure that there is no overflow.
Using the Systick count inside the Cortex-M3, with a limit of one second, the value of sum1 can determine which method is faster. For the sake of rigor, we observe the counting effect between the first second and the second second; rather than from the 0th second to the 1st second (because there may be a gap between enabling Systick and actually starting to execute sum1++). When entering the Systick ISR for the first time, record the value of sum1; when entering the Systick ISR for the second time, record the value of sum1 again. The difference between the two values is how many times sum1 executes self-increment in the one-second interval. This shows which method is faster.
Same test premise: Prefetch Buffer Enable + Flash Latenty="2" (according to the Flash Programming Manual, when 48MHz
Keywords:STM32
Reference address:Is it faster for STM32 code to run in RAM or Flash?
The test results are as follows:
Without code optimization, executing the program in RAM: sum1 counts 69467/sec
Without code optimization, executing the program in FLASH: sum1 counts 43274/sec (runs slower in Flash)
/***********The code in the loop body is less than N blocks*************/
(1)LDR R0,[PC, #0x154]
(2)LDR R1,[PC, #0x154]
(3)LDR R1,[R1,#0]
(4)ADDS R1, R1,#0x1
(5)STR R1,[R0, #0]
......
/****************************************************/
Turn on the speed optimization switch and execute the program in RAM: sum1 counts 98993/second
Turn on the speed optimization switch and execute the program in FLASH: sum1 counts 115334/second (runs faster in Flash)
/***********The code in the loop body is less than N blocks*************/
(1)LDR R1,[R1,#4]
(2)ADDS R1, R1,#0x1
(3)STR R1,[R0, #0]
......
/****************************************************/
The conclusion is:
1) Whether a program runs faster in RAM or in Flash is not absolute and depends on the code;
2) Regarding the above two specific code situations, I think that without optimization, if the code is executed in Flash: (1) (2) instruction fetch (read flash) -> decode -> execute (read flash); the target address of flash in the instruction fetch and execution stages is not continuous, so it is non-sequencial access, so it will be very slow;
When optimization is turned on, (1), (2), and (3) will not cause non-sequential access to the flash, so the advantages in the flash (instruction and data fetches use different buses ICode and DCode and Prefetch) are reflected.
Further analysis leads to the following conclusions:
When there is no optimization, constants need to be fetched from Flash when instructions are executed, resulting in an interruption of the instruction prefetch queue. After fetching the constants, the instruction prefetch queue needs to be refilled, and Flash access needs to insert a waiting cycle, which of course takes a longer time.
After code optimization, there is no need to fetch constants from Flash when executing instructions, the instruction prefetch queue will not be interrupted, and the effect of inserting wait cycles for Flash access is offset by the instruction fetch buffer introduced in the post below, so the speed is naturally faster; at this time, execution in RAM is slower because RAM is not on the ICode bus, and fetching instructions from RAM requires a detour, which is of course slower than Flash on the ICode bus.
Regarding the performance of Flash, please see my other analysis: [Analysis] Timing analysis of STM32 running programs from Flash
In addition, the bus architecture of STR9 is the same as that of STM32. Here is some measured data of the FFT function implemented on STR9, which can further illustrate that running code in Flash can be faster than in RAM!
There is a DSP function library on ST's website. This is its document "STR91x DSP library (DSPLIB)". In this document, there is a section discussing the FFT operation speed, which gives a comparison of the actual operation time. The excerpt is as follows:
Radix-4
Complex FFT Operation Mode Cycle Count Microseconds
64 Point Program in Flash & Data in SRAM 2701 28.135
64 Point Program & Data in SRAM 3432 35.75
64 Point Program & Data in Flash 3705 38.594
256 Point Program in Flash & Data in SRAM 13740 143.125
256 Point Program & Data in SRAM 18079 188.323
256 Point Program & Data in Flash 19908 207.375
Previous article:How to send multiple packets using the USB non-control endpoint of STM32
Next article:How to use PC14 and PC15 in STM32
Recommended ReadingLatest update time:2024-11-17 00:30
stm32 AD reference voltage
Recently, I encountered a problem when designing the schematic diagram, that is, the STM32 chip with 100 pins or less has no Vref. The power pins of the chip with 64Pin and below package are: VDD - MCU 3.3V power positive, VSS - MCU 3.3V power negative, VDDA - MCU A/D converter power positive, VSSA - MCU A/D converter
[Microcontroller]
[STM32 Motor FOC] Record 15 - TIM input capture
Input capture principle and configuration steps 1. Input Capture Concept STM32 input capture, in simple terms, detects the edge signal on TIMx_CHx (channel X of timer X), and when the edge signal changes (such as rising edge/falling edge), stores the current timer value (TIMx_CNT) in the capture/compare register (
[Microcontroller]
STM32 clock system
As we all know, the clock system is the pulse of the CPU, just like a person's heartbeat. So the importance of the clock system is self-evident. The clock system of STM32 is relatively complex, unlike the simple 51 microcontroller where one system clock can solve everything. So some people ask, isn't it simple to use
[Microcontroller]
S3C2440-Bare Metal Edition-08 | Using S3C2440 to operate SDRAM (Configuring the memory controller)
1 Introduction When it comes to SDRAM, everyone thinks it is too difficult. It is even more difficult to program the control timing of SDRAM. Yes, that's right! I thought so a year ago. I found it very difficult to learn the timing of this section. I watched the video several times but didn't understand it. I didn't u
[Microcontroller]
STM32 learning record 14 serial port interrupt in ucosii
First, let’s look at what Teacher Shao wrote in his book.
It says: In μC/OS, the interrupt service subroutine must be written in assembly language. However, if the C language compiler used by the user supports online assembly language, the user can directly put the interrupt service subroutine code in the C langu
[Microcontroller]
Read the unique identity register of the stm32 product
Read the unique identity register of the stm32 product voidGet_ChipID(void) { uint32_t temp0,temp1,temp2; temp0 = *(__IO uint32_t*)(0x1FFF7A10); temp1 = *(__IO uint32_t*)(0x1FFF7A14); temp2 = *(__IO uint32_t*)(0x1FFF7A18); // temp0=(*( uint32_t *)0x1FFF7A10); //Product unique identifica
[Microcontroller]
Motorola One Vision Plus: 6.3-inch screen + 4GB memory
In May 2019, Motorola released the entry-level phone One Vision, and now the successor of the phone has appeared in Google's Android Enterprise Directory, with the device name "Motorola One Vision Plus". Foreign media speculate that this phone has entered the late development stage and may be officially release
[Mobile phone portable]
stm32 download problem
The reason is that JTAG is already occupied, so of course you can't use JTAG to operate! At this time, you must ensure that the CPU does not enter the normal operating state before you can use JTAG. Solution: options for target ---- Debug---- upper right ---- use the setting button behind------ change JTAG under
[Microcontroller]
Recommended Content
Latest Microcontroller Articles
He Limin Column
Microcontroller and Embedded Systems Bible
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
MoreSelected Circuit Diagrams
MorePopular Articles
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
MoreDaily News
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
Guess you like
- In order to kill mosquitoes, the doctor DIYed a laser gun using Raspberry Pi!
- Thoughts on the development of single-chip microcomputers
- FPGA_100 Days Journey_Breathing Light
- The Chinese community is down?
- I just started learning analog electronics and I don't understand how to calculate the cutoff frequency of an op amp.
- Analog Electronics Course Selection Test Experience Activity Version 2.0~How to learn analog electronics? TI helps you customize a course list, and you can win a prize by completing the course list~
- How to reduce electromagnetic interference between switching power supplies?
- Download kicad library files
- ADC conversion with 1M sampling rate
- Loto practical tips (8) Fuse measurement using an oscilloscope with a current probe