How to measure code execution time on ARM Cortex-M MCUs?[Copy link]
In many real-time applications, such as motor control, engine control, wireless communication and other time-sensitive applications, the CPU may spend less than 5% of its time executing code. These embedded systems are usually written in C language, and developers may use assembly language to optimize the code to meet the deadline. Measuring the actual execution time of part of the code can help us find the time-critical points in the code.
This article will show how to easily measure and display the execution time of code snippets on Cortex-M based MCUs.
Methods for measuring execution time
There are many ways to measure code execution time. Embedded engineers often use digital outputs and oscilloscopes. We can set the output high before the monitored code executes, and set the output low after the code executes. Of course, there is a lot of setup work before doing this: find an output pin that is easy to probe, configure the port as an output, write the code, compile, and so on. Once you have a signal, you may want to monitor it for a period of time to see the minimum and maximum values of its running time.
Another way to measure execution time is to use a debugging tool with a trace feature. You can simply run the code, look at the trace, manually calculate the delta time and convert CPU cycles to microseconds. Unfortunately, the trace only provides one instance of execution, and you may need to look further into the trace capture to find the worst-case execution time, which can be a tedious process.
Cortex-M cycle counter
The CoreSight debug port on Cortex-M processors contains a 32-bit free-running counter for counting CPU clock cycles. The counter is part of the Debug Watch and Trace (DWT) module and can be easily used to measure the execution time of code. The code required to enable and initialize this feature is as follows:
Measuring code execution time using the DWT cycle counter
We can measure and calculate the execution time of a code snippet by reading the value of the loop counter before and after the code snippet, as shown below.
The unsigned number delta represents the actual execution time of the code under test (in CPU clock cycles).
Interrupts may occur during the execution of the code, so the timing values obtained for each execution of this sequence may not be the same. We can also disable interrupts during the measurement to remove the impact of interrupts. However, it is recommended to enable interrupts during the measurement because they will affect the timing of the code.
If the code being measured contains conditional statements, loops, or anything that could cause variation, then the value obtained may not represent the worst-case execution time. To fix this, you can simply add a peak detection as shown below. Before taking any measurements, max needs to be declared and initialized to its minimum value (i.e. 0).
Similarly, if you need to know the shortest execution time, min only needs to be declared and initialized to the maximum count value (i.e. 0xFFFFFFFF) before measurement. The code is as follows:
Execution time also depends on whether the CPU is equipped with cache. Some Cortex-M4 and Cortex-M7 processors have integrated cache. If the system uses instruction or data cache, multiple measurements of the same code segment may not be consistent. You can consider disabling the cache to measure the worst case.
Most debuggers are able to display these variable values in real time. We can use Global to declare the displayed variables to retain their values and allow real-time monitoring. These values represent CPU clock cycles, but most debuggers cannot scale the variables for display purposes. Assuming the CPU clock speed is 16 MHz, it is much more convenient to display 70.19 microseconds than 1123 cycles. There is actually a better way to display real-time variables, using the μC/Probe real-time monitoring tool, which also provides the ability to zoom in and view variable values in an easy-to-read form.
Displaying Measurement Values Using μC/Probe
Add measurements to the application to monitor the execution time of four code snippets and display the measurement results through μC/Probe.
The following figure shows the raw measurements using IAR's LiveWatch (left) and μC/Probe's Tree View control (right). elapsed_time_tbl[] is an array that stores the measurements for different code snippets.
You can also assign min/max/current values to gauge and numeric indicator controls, as shown in the following figure. The values are shown in microseconds, and in this example the CPU is running at 80 MHz, so a scaling factor of 0.0125 is used. Only the longest execution time is shown.
As embedded developers, we have many tools to test and verify our designs. μC/Probe provides many functions that allow users to monitor variables in applications using gauges, meters, numeric indictors, Excel, or graphs/plots. Based on the DWT cycle counter function of the Cortex-M processor and μC/Probe, we can easily implement design verification for Cortex-M MCU applications.