ARM CORTEX-M3 core architecture understanding summary[Copy link]
The Cotex-M3 core mainly includes: Nested Vectored Interrupt Controller (NVIC), value fetching unit, instruction decoder, arithmetic logic unit (ALU), register group, memory mapping (4GB unified addressing, division and definition of functions of each area). For developers, the main focus is actually divided into three major areas: 1. Register group 2. Address function division mapping 3. Interrupt mechanism (NVIC). The Cortex-M3 core has a total of 19 groups of 32-bit registers: R0-R12 (general registers); low register group R0-R7 can be accessed by both 32-bit Thumb-2 instructions and 16-bit Thumb instructions. High register group R8-R12 can be accessed by 32-bit Thumb-2 instructions and very few 16-bit Thumb instructions. R13 (stack pointer register); Only one of the main stack register MSP (main-SP) and process stack register PSP (Process-SP) can be used at the same time. MSP is used by the operating system kernel and interrupt (exception) handling subroutines, and PSP is only used by the user's application code (for detailed usage, see 3. Summary of the Nested Vectored Interrupt Controller (NVIC)). The stack pointer is 4-byte aligned, so the lowest two bits are always 00; R14 (connection register) is used to store the program return address and the PC return address; R15 (program register) points to the address of the current program execution; 2) Special function register group xPSR (program status word register group), 32 bits, can be divided into three registers for access separately, or can be directly accessed by combining the names of PSR or xPSR. Application PSR (APSR) Interrupt Number PSR (IPSR) Execution PSR (EPSR) Interrupt Mask Register PRIMASK is a single bit. When it is set, all interrupts except NMI and hard fault will not be responded to. FAULTMASK is a single bit. When it is set, all interrupts except NMI will not be responded to. BASEPRI has a total of 9 bits. Interrupt numbers less than or equal to the value set in this register will not be responded to. Control Register control Control[0] 0 determines the privileged thread mode; 1 user-level thread mode; Control[1] 0 main stack; 1 process stack; The control register can only be rewritten in the privileged mode, the handler mode is always privileged, and only the main stack MSP is allowed to be used After reset, the processor enters the privileged + thread mode; 2, address function division mapping Cortex-m3 is a 32-bit processor, its address bus and data bus are both 32 bits, so it can address resources in the 4G address range. The Cortex-m3 kernel defines the basic framework of the 4G space and defines different uses. 0x0000 0000 ----0x1FFF FFFF(512MB) This area is the code area (flash area), which is used by the instruction bus and data bus to fetch instructions and numbers; instructions can be executed; 0x2000 0000 ----0x3FFF FFFF(512MB) This area is the on-chip SRAM area, where chip manufacturers can layout RAM and copy code to run, and this area can also execute instruction code; the lower 1MB space is bit-addressable, which can be expanded to 32Mb of bit addressing through bit-band aliasing. 0x4000 0000 ----0x5FFF FFFF (512MB) This area is the "on-chip peripherals" area, which is mainly the related registers of the on-chip peripherals, that is, the special function register area. Similarly, the lower 1MB can also be bit-addressed; this area cannot execute code; 0x6000 0000 ----0x9FFFFFFF (1G) This area is the off-chip RAM area, and this area can execute code; 0xA000 0000 ----0xDFFFFFFF (1G) This area is the off-chip peripheral area, and this area cannot execute code; 0xE000 0000 ----0xFFFFFFFF (1G) This area is the system area, and this area cannot execute code; So the starting addresses of different address fragments can be simply recorded as: 0, 2, 4, 6, 10, E, The system area is divided into two parts: Internal private peripheral area 0xE000 0000 ---- 0xE003FFFF (256KB) mainly includes NVIC, FPB, DWT, ITM, etc. External private peripheral area 0xE004 0000 ----0xE00FFFFF (512+256=768KB) has ROM table, ETM, TPIU, etc. Data endian mode: CM-3 supports both big-endian mode and little-endian mode. For big-endian mode, ARM7 uses word-unchanged big-endian mode, while CM3 uses byte-unchanged big-endian mode. Although both big-endian and small-endian modes are supported, it is still recommended to use little-endian mode in most cases. If some peripherals are in big-endian mode, the endian mode conversion can be easily completed through REV/REVH instructions. 3, Interrupt mechanism (NVIC) Since it is called MCU instead of MPU, it is mainly based on control. A key indicator of control is real-time performance, which can respond to changes in time, and this is mainly achieved through the interrupt mechanism. It can be said that except for computing performance, the main improvement of the cortex-M3 core is reflected in the real-time performance of control, that is, the immediate response mechanism of interruption. How does CM3 define interruption? One is the interruption of the currently running program caused by the abnormality of the CM3 core, and the other is caused by the introduction of external events. System exceptions are mainly at the CM3 kernel level, reset, NMI, hard fault, these three have fixed and highest priority, in addition there are bus fault, memory management fault, usage fault, etc. The priority of svc system call service, systick, etc. can be set by programming; these are all placed in a vector table, which stores the entry address of the interrupt service function, 32 bits, a total of 256 items, the first 16 are interrupts within the system, excluding reserved bits, there are 16-5-1=10 system interrupts, and the remaining 240 external interrupt IRQs, three of which have fixed priorities: reset, NMI, hard fault, interrupt numbers are -3, -2, -1 respectively, the smaller the interrupt number, the higher the priority, other priorities are programmable. In addition, for the reset startup process, the M3 core MCU is reset differently from the traditional microcontroller. The traditional microcontroller starts running directly from address 0, and then executes the jump instruction at address 0 to jump to the set program start segment; after the CM3 is reset, the initial value of the main stack MSP is first taken out at address zero 0x00000000 (because the CM3 stack is downward-growing, this initial value is generally set to the end address of the RAM area + 1 to ensure that the stack is large enough. For example, if the RAM area is 0x20000000-0x3FFF FFFF, then the initial value is set to 0x40000000); then the address unit (0x00000004) storing the reset program entry address is found through the reset interrupt number. The address unit (0x00000004) stores the address of the first instruction execution (0x00000100) and is assigned to PC. PC executes the instructions stored in this address in sequence. The address of instruction execution needs serious attention: CM3 runs in thumb state, so the lowest bit LSB of the value loaded into PC must be set to 1 to distinguish ARM state (ARM is an even number). So if you want to point to the address value of 0x00000100, the reset program entry address stored in the interrupt vector table should be 0x00000101, and 0x00000101 is used to represent 0x00000100. Don't confuse the address of stored data with the address pointed to by PC. Only the execution address LSB of PC cannot be equal to 0, and other bus access addresses do not have this restriction. (In order to explain the problem clearly, I can't stand being so long-winded myself) Why should we initialize the stack MSP first? Because interrupts may also occur during the reset process, such as NMI, hard fault, etc. The interruption mainly needs to go through the following steps from occurrence to end: 1 Capture and respond to interrupts, 2 Field protection, 3 Interrupt program entry, 4 Return. The following is a summary of the important knowledge points involved in improving the interrupt response speed of cortexM3 based on this context (this explanation order is for people who already have a certain foundation): Speaking of interrupts, it is necessary to involve priority and nesting. In CM3, 8 bits are used to program the priority number of interrupts, which can achieve 256 levels of priority. These 8 bits are divided into two sections, one section determines the level of preemption priority, and the other determines the level of sub-priority. It is stipulated that the preemption priority shall not be less than 3 bits (8 levels of priority), and the sub-priority shall not be less than 1 bit. Therefore, the preemption priority in M3 is at most 128 levels. The starting bit of grouping is determined by the PRIGROUP in the register in NVIC (application interrupt and reset control register); but in practice, chip manufacturers generally only use the highest few bits, such as 5 bits, and the upper three bits (7, 6, 5) are used to program the preemption priority. The remaining two second-highest bits (4, 3) are used to determine the sub-priority. The 4th bit is used as the sub-priority grouping. Here, a grouping register is used to determine which bit to start the sub-priority grouping. Now that the priority is determined, what mechanism does CM3 use to improve the response speed in the face of many interrupts of different priorities? This requires a table showing the work that the processor must do before responding - scene protection, that is, protecting the current program running environment, and stacking the following 8 registers in sequence: program status register XPSR, program counter PC, return address register LR (connection register), R12, R0-R3, all of which are automatically completed by hardware. If the stack is currently in use, push the corresponding stack register MSP/PSP value. If the PSP is currently in use, push the PSP, otherwise push the MSP,When entering the interrupt service program, MSP will be used all the time; well, please note that improving the response speed of interrupts depends on the order in which these registers are pushed into the stack. We know that the stack is built in the on-chip RAM and operated through the system bus (systemcode) (why it should be built on RAM, because the stack needs to continuously perform stacking, stacking and other actions, and needs to continuously change the stored value, while flash or rom only writes data when burning). The instructions are stored in the flash, that is, the code area, and operated through the instruction bus (codebus), so when pushing the stack, the value of xPSR is pushed first, and then the value of PC is pushed. After the value of PC is pushed, the address of the interrupt service function can be taken out through the instruction bus according to the interrupt vector number and assigned to the new instruction address of PC to pre-fetch the instruction, and the stack can still continue to push the values of other registers through the system bus, which is in parallel with the instruction fetching operation and does not interfere with each other, thus speeding up the response speed of the interrupt. The relevant registers can be consulted in the authoritative manual. All the preparations before responding to interrupts have been done. So when many interrupts come knocking, how should we handle a series of interrupts with the shortest delay? Needless to say, the first-level interrupt is arbitrated according to the interrupt number. When interrupt nesting occurs, some response mechanisms in CM3 can speed up the entire interrupt response process. The low priority is suspended due to the preemption of the high priority interrupt. When the high priority interrupt is processed, according to the traditional nested interrupt processing flow, the high priority interrupt should be popped out of the stack after processing, and then the previously popped content should be pushed into the stack, and then the suspended low priority interrupt should be processed. According to the book, this is the process of smashing the pot to make iron and then casting the pot again. It is completely unnecessary. Therefore, when the CM3 kernel handles a series of nested interrupts, it only performs the stacking and popping work once in total. In this way, when processing several consecutive nested interrupts, many links are reduced and the time is shortened, especially when the nesting level is deep. However, please note that you should not nest too deeply, because at least 8 32-bit register values (32B) are pushed onto the stack for each nesting level. If the interrupt process itself is in use, the current code must push the stack value onto the stack again, which undoubtedly increases the storage pressure on the stack. If the stack is used up and overflow occurs, the program will most likely run away, which is very dangerous. Therefore, in the application, try to reduce the nesting depth of interrupts as much as possible. The previous interrupt response mechanism occurs after the high-level interrupt interrupts the low-priority interrupt and completes the response. The translator calls it "tail-biting interrupt"; the following interrupt occurs when the high-priority interrupt interrupts the low-priority interrupt and prepares to respond. When the high-priority interrupt has not yet arrived, the low-priority interrupt has already completed the preliminary preparations. At this time, the high-priority interrupt arrives. At this time, the high-priority interrupt will directly use the preliminary work results of the low-priority interrupt to directly start responding and enter the service interrupt service program, while the previous low-level interrupt is forced to suspend and serve others. There is no way. The kernel stipulates this. The translator translates it as "late arrival (high priority) exception" and I call it "latecomer" interrupt. Interrupt delay time: the time elapsed from the detection of the interrupt to the execution of the first instruction of the interrupt service program. The book says: If the storage system is fast enough, stacking and fetching can be done separately, and the interrupt can be responded to immediately without being preempted, then the time consumed is a fixed 12 cycles (meeting the determinism required by hard real-time). If multiple levels of interrupt response are nested, taking into account the stacking time saved by tail-biting interrupts, each interrupt can be reduced to 6 cycles. Finally, let’s talk about exceptions, or faults, at the kernel activity level. This should be something that needs to be written about in detail, especially when it comes to finding bugs when debugging programs. It is a very important aspect that reflects a person’s skill level, but due to time and research depth, I will not delve into it for now. In short, system errors that occur during runtime are sometimes not syntax errors in programming. If you don’t understand the kernel architecture and some usage guidelines, it is difficult to find where the bug is. This requires tracing back to where the program went wrong, leading to errors in kernel operation. Thanks to CM3’s complete debugging architecture, many faults will be detected and recorded in the corresponding registers. By following the clues, you can find out where the problem is and which instruction caused it, so you can conduct targeted self-inspection, analysis, and modification.