Tips for MCU abnormal debugging - taking ARM core as an example

Latest update time：2024-09-14

Reads：

Table of contents

1.Cortex-M7 exception handling

2.Cortex-R52+ exception handling

3. Summary

When debugging a board, we always encounter inexplicable abnormalities. What methods can we use to locate the scene?

Today, I will continue to talk about the exceptions and error handling of the ARM kernel, following the article introducing the kernel unit a few days ago.

1.Cortex-M7 exception handling

The Cortex-M7 architecture is Armv7-M, and the kernel comes with exceptions including NMI, HardFault, MemManage, etc., as shown in the following figure:

We can find these exception handling functions in the code. Taking S32K344 as an example, its interrupt vector table looks like this:

Basically, the implementation of each function is While(1), as shown below:

So what can we do in this function? Obviously, finding the scene where the error occurred is the primary goal.

In the Armv7-M architecture, there is a space called System Control Block, which contains different registers designed to record information when the above-mentioned exceptions occur.

Let's take the common triggering of MPU protection exception as an example to see how to locate errors such as array out-of-bounds errors.

When an error occurs, MMFAR ( MemManage Fault Address Register ) will save the memory address that triggered the MPU error.

Why is it triggered? It is very likely that the kernel LSU performs a load or store access operation on an address outside the MPU protection. Furthermore, if we want to know what specific operation caused the MPU to be triggered, we need to check the detailed information of the MMFSR ( MemManage Fault Status Register ).

Although MMFSR is called a register, it is actually a bit field in CFSR ( Configurable Fault Status Register ):

As you can see, MMFSR is in the lower 8 bits of CFSR, which provides the status information of MemManage, as follows:

MMARVALID: Set to 1 to indicate that the address in MMFAR is valid;
MLSPERR: Set to 1 to indicate a MemManage Fault occurred when the floating point lazy context was saved;
MSTKERR: Set to 1 to trigger again when entering an exception;
MUNSTKERR: set to 1 to trigger again in the exit exception;
DACCVIOL: Set to 1 to indicate that data access triggers this exception, and MMFAR saves the address of the data to be loaded\stored;
IACCVIOL: Set to 1 to indicate that the address of the MPU or XN was accessed during instruction fetch, and MMFAR saves the address of the instruction fetch.

By jointly analyzing the above two registers, we can locate the problem site and determine whether it is a problem with instruction fetching or a problem with reading and writing data.

The M core is relatively simpler, but a lot of interesting things happened recently when I was playing with the R core.

2.Cortex-R52+ exception handling

Cortex-R52+ is the architecture of ArmV8-R, and has only one execution state, AArch32.
In this state, R52 can execute in 8 different modes, namely User, System, FIQ, IRQ, Supervisor, Abort, Undefined and Hyp. These modes are associated with different exception levels. In R52, there are 3 exception levels: EL0-EL2. The recommended usage is as follows:

The modes are associated with the anomaly levels as follows:

When an exception occurs, the kernel switches to a different mode. Since each mode has private general registers, a lot of information cannot be seen in the general registers of the current mode.

For example, if we run in SVC mode, when a DataAbort exception occurs, the kernel will switch to Abort mode, and the SP, LR, and SPSR in the current mode will be displayed as data in Abort mode. If we want to see the exception, we can only understand which mode our program is running in first, and then look at the SP and LR of the corresponding mode to see the scene before the exception occurred.

We can find these clues from the R52+SVD description:

Furthermore, if we want to further understand what caused the exception, we can continue to look at the following registers.

R52+ also has a system control module, where groups 5 and 6 describe some registers related to faults, as follows:

From the above literal meaning, we can find that it is actually quite similar to the M core. The difference is that R52 needs to use atomic instructions to read data.

To access DFAR ( Data Fault Address Register ), the required instructions are:

As usual, R52 has the following exception handling functions:

When an exception occurs, we can read DFSR, DFAR, IFSR, and IFAR.

Taking DFSR and DFAR as examples, DFSR contains the status information of the most recent data error, as follows:

Among them, the most critical is bit5-0, which records the error status bit, as follows:

Furthermore, if we want to know whether it is a read or write exception, we can observe Bit11, WnR:

Then, based on the address recorded by DFAR, we can find out where the error occurred.

It is worth mentioning that two auxiliary registers ADFSR and AIFSR are provided in the C5 group to further locate the problem.

When we get the specific problem from DFSR, we look at ADFSR to get the specific error on which interface, and then we can go to FAE support.

For example, if we find that PORT is located at BTCM and TYPE is ECC error on data, then we can consider whether TCM has ECC protection enabled and we have not initialized ECC, causing these exceptions.

3. Summary

Let me briefly talk about the recent debugging gains. It is not particularly in-depth yet. I will continue to study it and strive to build my own small debugging process.

Latest articles about

■Understanding the OSI Model Using Logistics

■Talk about the controversy of the maximum load of 375 kg for new energy vehicles

■What is the car moose test?

■Live Preview | AUTOSAR SOME/IP Technology Interpretation

■AP AUTOSAR Hard-Core Technology (5): Diagnostic Management

■How much does it cost to customize an automotive-grade ECU?

■Live broadcast today | CAN XL International Seminar

■“Customers are not afraid, so what are you afraid of?” - Reflection on the value of static analysis of automotive software

■Detailed explanation of the control algorithm of the electromechanical brake system (EMB) - Taking Tongyu Automobile and Feige Intelligent as examples

■[Opening this week] SAE-AWC 2024 Automotive EEA Innovation Technology Forum | Free registration