Preface
Dhrystone is a comprehensive benchmark program designed by Reinhold P. Weicker in 1984. The program is used to test the CPU (integer) computing performance. Its output is the number of Dhrystone runs per second, that is, the number of main loop iterations per second. We use this benchmark program to test the CPU integer computing performance.
process
Add code
Get the code
http://www.roylongbottom.org.uk/classic_benchmarks.tar.gz
Unzip classic_benchmarks.tar.gz and copy the \classic_benchmarks\classic_benchmarks\source_code\dhrystone2 folder to your project.
Add code
Copy the \classic_benchmarks\source_code\dhrystone2 folder to the project directory and add it to the project
Porting interface
dhry_1.c
Comment out //#include "cpuidh.h"
Increase
#include "definitions.h"
It turns out that the following code gets the code execution time (S) to the global variable User_Time
start_time();
......
end_time();
User_Time = secs
We use plib_systick.c to get the time
The interrupt period of systick we configured using the configuration tool is 1mS
SYSTICK_TimerInitialize
Get the number of interrupts, i.e., the mS value, through SYSTICK_GetTickCounter.
So it can be changed to
uint32_t s_stime_u32 = SYSTICK_GetTickCounter();
......
uint32_t s_etime_u32 = SYSTICK_GetTickCounter();
User_Time = (s_etime_u32 - s_stime_u32)/1000.0;
void main (int argc, char *argv[])
Change to
void dhry_main(int argc, char *argv[])
Comment out the following
///getDetails();
///for (i=1; i<10; i++)
///{
/// printf("%s\n", configdata);
///}
///printf("\n");
///fprintf (Ap, " ######################################### ###########\n\n");
///for (i=1; i<10; i++)
///{
/// fprintf(Ap, "%s \n", configdata);
///}
///fprintf (Ap, "\n");
185 lines
#endif "Register option Selected."
Change to
#endif // "Register option Selected."
Comment out line 452
///local_time();
///fprintf (Ap, " ######################################### ###########\n\n");
///fprintf (Ap, " Dhrystone Benchmark 2.1 %s via C/C++ %s\n", options, timeday);
///fprintf (Ap, " VAX MIPS rating: %12.2lf\n\n",Vax_Mips);
Comment out the content of 130
///if ((Ap = fopen("Dhry.txt","a+")) == NULL)
/// {
/// printf(" Can not open Dhry.txt\n\n");
/// printf(" Press Enter\n\n");
/// int g = getchar();
/// exit(1);
// }
Line 113
int nopause = 1;
Change to
int nopause = 0;
test
firmware\src\config\default\stdio\keil_monitor.c
When modifying send\n, replace it with send\r\n
int fputc(int c, FILE* stream)
{
int chenter = '\r';
uint8_t size = 0;
if(c == '\n')
{
do
{
size = SERCOM2_USART_Write((void*)&chenter, 1);
}while (size != 1);
}
do
{
size = SERCOM2_USART_Write((void*)&c, 1);
}while (size != 1);
return c;
}
main.c
Declaration void dhry_main (int argc, char *argv[]);
int main ( void )
{
/* Initialize all modules */
SYS_Initialize ( NULL );
SYSTICK_TimerStart();
PORT_REGS->GROUP[1].PORT_PINCFG[24] = 0x1U;
PORT_REGS->GROUP[1].PORT_PINCFG[25] = 0x1U;
PORT_REGS->GROUP[1].PORT_PMUX[12] = 0x33U;
dhry_main(0,0);
while ( true )
{
/* Maintain state machines of all polled MPLAB Harmony modules. */
SYS_Tasks ( );
}
/* Execution should not come here during normal operation */
return ( EXIT_FAILURE );
}
Download the program and run the test
You can see the print information as follows
The score is 88.33
We changed to -Ofast compilation
We see a significant improvement in the score to 130.05
We then enable dcache
startup_keil.c
135 lines added
ICache_Enable();
DCache_Enable();
The scores are the same, indicating that data retrieval is not a bottleneck
If icache is not enabled
//ICache_Enable();
DCache_Enable();
We can see that the scores are quite different, so flash is the performance bottleneck.
It can be seen that the slow flash instruction fetch is the bottleneck. If you want to speed up the operation, you can run it in the internal RAM.
I won’t do any comparative tests here, but you can try it if you are interested.
Summarize
Through the above tests, the maximum Dhrystone test score is around 130.
The main factor affecting performance is the speed at which instructions are read from flash.
Generally, the CPU runs faster than the flash access, so a waiting cycle is usually inserted when reading the flash. The higher the CPU main frequency, the more waiting cycles are inserted. For example, the STM32 chip has a corresponding table. If the waiting cycle is less than the minimum requirement of the specified frequency, it may cause a read instruction error. To improve performance, the waiting cycle must be set to the minimum value within the allowable range as much as possible. However, for stability considerations, such as meeting different temperature and humidity conditions, it is best to set the waiting cycle to the maximum value, so that redundancy and reliability are better.
I didn't see the setting of the flash waiting period of this chip. It may be inserted automatically. I didn't read the manual carefully. If you are interested, you can look for it.