"Time Analysis of Embedded Software" reading activity: 5 Chapter 5 Reading Notes - Software Time Analysis Methods

hehung

"Time Analysis of Embedded Software" reading activity: 5 Chapter 5 Reading Notes - Software Time Analysis Methods [Copy link]

Chapter 5 is the focus of this book and also the largest chapter in this book. It mainly introduces various software time analysis methods used in different development stages.

The directory is as follows:

第5章 软件时间分析方法 69
5.1 概览及在不同层面上的分类 69
- 5.1.1 通信层级 69
- 5.1.2 调度层级 70
- 5.1.3 代码层级 71
5.2 术语定义 72
- 5.2.1 追踪 72
- 5.2.2 分析、时间测量和（再次）追踪 72
5.3 静态代码分析 73
- 5.3.1 基础功能和工作流 73
- 5.3.2 用例 75
- 5.3.3 静态代码分析的限制 76
- 5.3.4 静态代码分析专家访谈 78
5.4 代码仿真 80
- 5.4.1 功能与工作流 80
- 5.4.2 用例 81
- 5.4.3 静态代码仿真的限制 82
- 5.4.4 与代码仿真领域专家的访谈 83
5.5 运行时间测量 86
- 5.5.1 基本功能和工作流 86
- 5.5.2 用例 95
- 5.5.3 运行时间测量的限制 96
5.6 基于硬件的追踪 97
- 5.6.1 基本功能和工作流 97
- 5.6.2 用例 99
- 5.6.3 基于硬件的追踪的限制 101
- 5.6.4 基于硬件的追踪专家访谈 102
5.7 基于软件方法的追踪 108
- 5.7.1 基本功能和工作流 108
- 5.7.2 用例 115
- 5.7.3 基于测量的追踪的限制 115
- 5.7.4 基于测量的追踪领域专家访谈 116
5.8 调度模拟 121
- 5.8.1 基本功能和工作流 121
- 5.8.2 用例 124
- 5.8.3 调度模拟的限制 124
- 5.8.4 调度模拟专家访谈 125
5.9 静态调度分析 127
- 5.9.1 基本功能和工作流 128
- 5.9.2 用例 129
- 5.9.3 静态调度分析的限制 131
- 5.9.4 静态调度分析专家访谈 131
5.10 使用进化算法进行优化 135
5.11 时间分析方法在 V-Model中的应用 137

1. Concept and classification

Communication layer: The time at the communication layer is usually related to the elapsed time on the network bus. The focus is generally on the end-to-end time at the communication layer (for example, from sensor to actuator), or the time difference from one event in the software to another event on the server;
Scheduling level: The time at the scheduling level (RTOS level) affects all time-related behaviors of the operating system. Therefore, the scheduling level is also called the operating system level or RTOS level;
Code level: When performing time analysis on elements at the code level, the focus is on the processing of the element and the time it takes. The central time parameter at the code level is the net execution time (CET, core execution time).

2. Terminology

2.1. Tracking

Tracing is the process of recording events in memory and attaching a time stamp. At a later point in time, this trace storage can be used to reconstruct the circumstances under which the original event occurred.

2.2. Profiling, timing and (re)tracing

Profiling is the process of accessing the timing parameters of a running embedded system or simulation. Two profiling approaches are shown, one of which is to perform runtime measurements to determine the desired timing parameters directly, and the other is to do it indirectly through tracing. When tracing, initially only the trace data is collected, from which the timing parameters can then be extracted.

3. Static Code Analysis

This chapter focuses on static code analysis of software timing, mainly studying how to determine the maximum possible kernel execution time of a specific code fragment (such as a function).

The maximum possible CET is referred to as WCET (worst case execution time), which should actually be expressed as WCCET (worst case core execution time).

3.1. Analysis process

First, static code analysis reads the executable file and disassembles it. From the disassembled code, the control flow and function call tree (the calling relationship of the function) can be deduced. This analysis can also determine the maximum number of loop iterations.

The collected data is merged together and the possible maximum runtime is accumulated along the control flow. Because the executable contains the storage addresses of all machine instructions and data, this analysis can even take into account the effects of caches and pipelining on runtime. However, this analysis also requires a very accurate processor model.

It can be seen that this analysis method requires extremely high capabilities of engineers, so tools are generally used for analysis.

The book mentions a tool called AbsInt, which can be used to analyze the static maximum execution time. I searched and found that this tool has a 30-day trial, but there is no direct download link. Instead, you need to fill out a PDF application and submit it. The download is troublesome, so I didn't try it.

3.2. Limitations of Static Analysis

There are still many difficulties in static code analysis, as follows:

Indirect function calls: In some cases, static analysis will not be able to determine what values this parameter can take. In this case, the function call tree is not complete.
Recursion: The recursion depth is difficult to determine.
Loop upper bound: It is impossible to determine the maximum number of loop iterations, which is required to calculate WCET. Usually a value that may be too large is used for analysis, resulting in a significant overestimation.
Annotations: Users must manually clarify the three "stumbling blocks" mentioned above by providing additional information. In other words, the code must be annotated (provided by the code developer).
Operation modes and mutually exclusive codes: Generally, a program will have several different operation modes (such as "on the ground" and "in flight" for an airplane). When performing static analysis, the codes of each mode will overlap, making analysis difficult.
Overestimation: Even if the analysis can be fully completed and the application working patterns annotated, the analysis results often still give unexpectedly high WCETs. One reason for this can be overestimation, i.e. a large difference between the reported upper bound of the WCET and the actual WCET.
Interrupts, Multicore and Transient Errors: Static Code Analysis is a code-level approach and therefore does not cover any scheduling aspects. Interrupts or access violations on memory interfaces (which often occur on multicore devices) are not considered. For completeness, static code analysis also ignores transient errors.

3.3. Code Simulation

Code emulators can execute machine code for any processor on a PC (x86 architecture), which emulates other processors.

A compiler for a target processor is also called a cross-compiler.

Code simulation usually involves the examination of a small portion of code, such as a single function or algorithm. The simulator consists of software that can simulate the target processor and can execute executable files generated for the target processor. In order to make unit tests closer to the real target system, they can be compiled for the target processor and then executed using the code simulator.

Limitations of static code simulation: If the simulation process does not use the "correct" variables, it may be very different from the real situation.

》》The article explains the introduction of code simulation and some problems that may be encountered through interviews between the author and code simulation experts, which is very helpful.

3.4. Run time measurement

Test method 1: Test with normal IO port

The traditional way to measure time is to measure it by flipping the IO port. The method is very simple:

During initialization, set a certain IO port to a high level;
Set this IO port to low level before the code to be tested;
Then set this IO port to high level after the test code;
Then use an oscilloscope or logic analyzer to collect the IO port from high to low, and then the duration of the low level will display the execution time of the code segment to be tested.

Personal summary (not the content of this book): Although this method is simple, it is necessary to take into account the execution time of the IO port flip when measuring. If the code to be tested is executed for a very short time, the test results may be inaccurate.

Do not use IO port test (use hardware timer)

There are also some disadvantages in using IO ports for testing, such as the inability to automate the test, the need for an oscilloscope or logic analyzer, and the relatively cumbersome nature of the test. You can also use a hardware timer for testing.

test method:

Get the counter value of the timer before the code to be tested;
After the code to be tested, capture the timer counter value again;
Then the two values are subtracted and the execution time can be calculated using the timer frequency.

This method needs to subtract the overhead of obtaining the timer value itself, that is, call the value acquisition function separately, record the number of counters consumed, and then subtract twice this counter from the total time.

This measurement method also needs to consider whether the timer overflows. Generally, using unsigned variables can allow one overflow, but if it overflows multiple times, it is not applicable. You need to consider reducing the frequency of the timer or replacing a timer with a wider bit width.

OSEK PreTaskHook/PostTaskHook

OSEK introduces a runtime measurement mechanism that can also be used for AUTOSAR CP, namely the PreTaskHook/PostTaskHook mechanism. PreTaskHook runs before the application code of a task is called, and PostTaskHook runs when the task ends.

However, there are also some disadvantages. For example, it is impossible to know which task started the test, and the hook function will be called frequently, so the test results are not accurate.

Idle cycle counter measures CPU load

Complaint: It took me half a day to understand this method. How can I put it? It feels like a machine translation. It is really hard to understand. If it is replaced with the expressions commonly used by Chinese people, it will be much easier to understand.

Methods as below:

This method requires script parameters before measurement, that is, to turn off global interrupts, watchdogs, various monitoring functions, and all other functions that may interrupt code execution;
In the idle loop, let the counter value increment. After a period of time, the value obtained is Z0 (this is a standard value and also a reference value, that is, incrementing a variable in the idle loop);
Then start normal function operation (interrupt on, watchdog on...)
Count the execution time Z of the idle function (the size of the variable after increment)
The CPU load rate is 1-(Z/Z0).

"Performance Counters" for measurement

Many processors offer hardware features that can determine time-critical parameters during code execution, including the number of cache misses and conflicts when multiple cores (simultaneously) access shared memory and pipeline stalls.

This needs to be implemented according to the specific processor.

Ping for measurement

Using the network diagnostic tool ping, you can measure the response time for a simple request to be sent from the client, pass through the network to the host, and then return.

4. Hardware-based tracking

Current processors generally provide debuggers that can be used to monitor the operation of instruction tracing, such as TRACE32. The book only introduces relevant knowledge, and you need to refer to the tool manual for specific software usage.

Exactly how the trace logic works depends on the processor used, but all have one thing in common: only a few deterministic events are recorded in the trace memory, while most of the instructions executed are interpolated during reconstruction.

Hardware-based tracing provides a solution: the software to be examined can be observed during runtime (i.e., without stopping) and the execution path recorded. Unlike software-based tracing, this method does not require software modifications to achieve this goal.

Besides the "debugging" use case, hardware-based tracing is particularly useful for run-time analysis. With this approach, all time parameters of functions executed during the trace can be determined. There is usually a view showing all executed functions sorted by CPU load, which is particularly useful for run-time optimization.

This chapter also describes the interview records between the author and hardware tracking experts, which is very inspiring.

5. Software-based tracking

The following figure illustrates the tracking based on hardware method, the tracking based on software method and external memory tracking data cache, and the tracking based on software method alone.

Software-based tracing (external tracing data storage): The acquisition of tracing data is based on software measurements. Connection to external tracing hardware can be achieved via all available interfaces (SPI, Ethernet, etc.). In this way, the timestamp can be generated by the tracing software itself (high overhead) and transmitted to the outside together with other information, or it can be generated by external hardware (accuracy will be affected).

Pure software-based tracking: The processor's RAM is used to store the tracking data, which is then transmitted to the outside through other means.

There are many details to pay attention to in tracking measurement, as well as the tracking of different objects, which are all described in the book and need to be savored carefully.

6. Scheduling simulation

Scheduling simulation is to simulate the execution logic of the operating system to organize tasks and interrupts.

Scheduling simulation process:

The simulation must be configured for a specific scheduling method, i.e., the operating system to be used for the simulation must be selected;
Create tasks and interrupts and define scheduling-related parameters. The most important parameter is the priority, others involve multiple activations of the task or the "preemptibility" setting.

Scheduling simulation working method:

During the simulation, tasks are activated and interrupts are triggered according to the activation pattern. For each simulated execution of a task or interrupt, the elapsed time (CET) is randomly determined according to the specified distribution as well as the BCET and WCET. If there is an interruption caused by another task or interrupt, it is also mapped into the simulation, also using the randomly selected CET. A trace graph of all events relevant to the schedule is usually generated (at least the activation, start and end of all tasks, and the start and end of all interrupts). The trace data can be visualized and time parameters can be calculated from the trace data.

The author also recorded an interview conversation with a scheduling simulation expert, which was very helpful. He mentioned a tool "TA Tool Suite" that can be used for scheduling simulation.

7. Static Scheduling Analysis

Static schedule analysis is performed at the schedule level. It is a mechanism to determine the worst-case time parameters "mathematically" without the need for simulation, measurement, or tracing. The time parameters here refer to the time parameters at the schedule level, especially the response time.

Static scheduling analysis can also be performed at the communication level. For example, CAN bus communication can be verified and optimized in this way to ensure that all messages are transmitted within their respective deadlines despite a bus load significantly higher than the widely used 40%.

Here's how scheduling analysis works:

The article mentions a tool: INCHRON's chronVAL, which can be used for static scheduling analysis.

8. Optimization using evolutionary algorithms

Evolutionary algorithms are used to solve problems where scheduling can be very complex, to the point where time parameters (such as the response time RT of a task) cannot be easily calculated.

process:

First, specify the optimization goal, such as minimizing the response time of a task;
Define the degrees of freedom, i.e. the parameters that may be changed during the optimization process. These parameters may include the offset of periodic tasks or the priority of certain tasks;
Finally, the actual optimization is started. In simple terms, the parameters that form the degrees of freedom are randomly changed, then analyzed and the final impact on the optimization goal is considered. The parameter modifications are tracked to ensure that the optimization goal is achieved, and then the whole process is restarted. Randomly modifying parameters is similar to mutations in evolution. The "gene composition" after successful modifications will gradually prevail, and after several generations, the configuration will continue to improve and get closer and closer to the optimization goal. If the optimization goal is fully achieved or exceeds a previously defined time, the evolution will stop.

9. Conclusion

Static code analysis: Static code analysis requires compiled and linked code. However, it is also possible to link the functions to be tested into a test application (e.g., via unit tests). For static code analysis, the analysis tool must support the processor used.

Code Simulation: Code simulation is similar to the case of static code analysis, except that it can also be done at a higher level than the function level.

Runtime measurement: As long as an evaluation version with the required processor and the corresponding compiler toolchain is available, runtime measurement can be performed in PIL (Processor In the Loop) testing. This method can be used later in the development process as well as in the final product to monitor runtime in regular operation.

Hardware-based tracing: This also requires a running processor. When using this approach, extensive analysis is performed at the code level and the scheduler level. If possible, these analyses can be extended to the HIL (Hardware In the Loop) environment and even to the final product environment.

Software-based tracing: Strictly speaking, tracing can be started at the same time as runtime measurement, but scheduling is usually a critical aspect of the analysis. Therefore, a prerequisite for this approach is to have a system available and an operating system already running on it.

Scheduling simulation: This analysis is largely independent of the availability of processors, compilers, hardware, or software. A general understanding of the system is sufficient to simulate, analyze, and optimize at a high level. Additional information during subsequent development can be added to the model, making it more detailed over time.

Static scheduling analysis: Static scheduling analysis is very similar to scheduling simulation, but the focus of the analysis is different because it more explicitly considers the worst case. However, static scheduling analysis cannot provide analysis of the normal behavior of the system.

This chapter describes various event analysis methods, many of which I have heard of for the first time. It also introduces some time analysis tools. Some chapters record conversations with relevant experts. Overall, this chapter is quite rewarding.