How to speed up the program running speed of Huada HC32F460 HC32F4A0?
[Copy link]
HuaDa MCU HC32F4xx series (HC32F460 HC32F4A0) can run at a maximum of 200Mhz, but the internal Flash needs to add different waiting cycles after the CPU operating frequency reaches 33M.
Therefore, when the actual program runs in the internal FLash, it cannot keep up with the CPU speed and cannot reach the maximum operating speed of 200Mhz.
As shown in the figure below: You can see that at the highest 200Mhz, it takes 5 CPU clock cycles to read the instructions in the Flash, so the actual CPU can run at less than 40Mhz when running at about 200M.
So how can we make the program run faster and at the actual CPU frequency?
Two methods:
1. The first method that should come to mind is to move the key program or the code that needs to run faster to SRAM for execution.
This method is not within the scope of this post, but the principle and details are not too complicated.
Here are two reminders. The specific method is not difficult. No matter which MCU is used, the method of running in SRAM is the same.
1) Just pay attention to the interrupt vector table to be remapped to SRAM if you want to move the interrupt to SRAM for execution.
2) Another point is that if the code moved to SRAM for execution calls the code in Flash, it will also affect the speed. All the code in the call chain needs to be moved to SRAM together.
2. There is a 1K Flash Cache in the Huada HC32Fxx series MCU, which can accelerate the execution of the code in the Flash or the reading of the data in the Flash.
The execution speed of the code running in the Flash is the same as the CPU speed. Of course, since it is a read cache, if the CPU does not hit the content in the cache, it will still fetch data and instructions from the Flash.
Therefore, the cache plays a role in acceleration during the entire program execution, and it cannot be assumed that the program execution speed will be consistent with the CPU in each runtime period.
If HCLK is 200Mhz, then using cache can speed up the program to 200Mhz.
(Don't assume that the program runs at 200M all the time, so don't use busy waiting as a delay function. It is recommended to use systick as a busy waiting delay function)
Image Image
I did an experiment to test the performance of a piece of code when it is running without cache and when it is running with cache:
The code I tested:
Running results:
Without cache, the flash_run_performance_test function uses 728023 10ns (10 nanoseconds),
and with cache, the flash_run_performance_test function uses 259880 10ns. You can calculate the time taken by the test function, and compare the speed with and without cache.
To sum up: the speed of this MCU of HuaDa HC32F4XX series ARM cortex-M4 is still good.
|