ARM Linux interrupt vector table migration design process-EEWORLD

Collect

Preface

I will use some space here to describe how to initialize the interrupt vector table in Linux under the arm architecture. Because this method is very universal, I call it code migration. You said that everyone knows how to move code, isn't it just copying? It is true, but there are also skills in copying. Copying is very simple, in fact, it is memcpy, which is needless to say. What I want to talk about here is how to design your code to be copied at will, in other words, it is called position-independent code, which can be used anywhere. I have used a similar method for startup before, and I will talk about it today.

Scenario 1 copy

Let's look at the actual action first. The code is located in arch/arm/traps.c, kernel version: 2.6.27. This is the initialization code, setup_arch()->early_trap_init(). Friends who are familiar with the initialization part may have seen this code.

void __init early_trap_init(void)

{

unsigned long vectors = CONFIG_VECTORS_BASE;

extern char __stubs_start[], __stubs_end[];

&nsp; extern char __vectors_start[], __vectors_end[];

extern char __kuser_helper_start[], __kuser_helper_end[];

int kuser_sz = __kuser_helper_end - __kuser_helper_start;

* Copy the vectors, stubs and kuser helpers (in entry-armv.S)

* into the vector page, mapped at 0xffff0000, and ensure these

* are visible to the instruction stream.

memcpy((void *)vectors, __vectors_start, __vectors_end - __vectors_start);

memcpy((void *)vectors + 0x200, __stubs_start, __stubs_end - __stubs_start);

memcpy((void *)vectors + 0x1000 - kuser_sz, __kuser_helper_start, kuser_sz);

…

}

The actual copy operation is clear at a glance, which is two memcpy (the third one is actually copying something else, the principle is the same, so I won't mention it here). The source of the copy is vectors, and the value is CONFIG_VECTORS_BASE, which is generally 0xffff0000, of course you can configure this value yourself according to the hardware settings. What is copied to? The first part is the code from __vectors_start to __vectors_end, and the second part is the code from __stubs_start to __stubs_end, and the second part is copied to the starting position of vectors + 0x200. In other words, the distance between the two parts is 0x200, or 512 bytes.

Let's take a look at what __vectors_start, __vectors_end, font face="Times New Roman">__stubs_start, and __stubs_end are. As long as we know where they are defined, we will know what's going on.

Scenario 2 The protagonist makes his debut

They are hidden in arch/arm/kernel/entry-armv.S. This file is the entry code of each mode in arm. Friends who are familiar with arm know that arm has several modes. Those who don't know can check it out by themselves. I won't say it. Let's take a fragment, the part related to our explanation. In order to make it clearer for everyone, I deleted some codes and comments and highlighted the main trunk. Interested friends can view the source code and study it in full. It is quite meaningful.

.globl __stubs_start

__stubs_start:

* Interrupt dispatcher

vector_stub irq, IRQ_MODE, 4

// Please note that vector_stub is a macro, which is a block of code after expansion. Below is a jump table. We expand the code structure to the following structure: (The expansion process of vector_stub dabt, ABT_MODE, 8, etc. is the same, so we will not mention it here)

// -------------------------------- begin Expand

.align 5

vector_irq:

sub lr, lr, 4

@ Save r0, lr_ (parent PC) and spsr_

@ (parent CPSR)

stmia sp, {r0, lr} @ save r0, lr

mrs lr, spsr

str lr, [sp, #8] @ save spsr

@ Prepare for SVC32 mode. IRQs remain disabled.

mrs r0, cpsr

eor r0, r0, IRQ_MODE ^ SVC_MODE)

msr spsr_cxsf, r0

@ the branch table must immediately follow this code

and lr, lr, #0x0f

mov r0, sp

ldr lr, [pc, lr, lsl #2]

movs pc, lr @ branch to handler in SVC mode

// -------------------------------- end Expand

.long __irq_usr @ 0 (USR_26 / USR_32)

.long __irq_invalid @ 1 (FIQ_26 / FIQ_32)

.long __irq_invalid @ 2 (IRQ_26 / IRQ_32)

.long __irq_svc @ 3 (SVC_26 / SVC_32)

. . .

.long __irq_invalid@f

* Data abort dispatcher

* Enter in ABT mode, spsr = USR CPSR, lr = USR PC

vector_stub dabt, ABT_MODE, 8

.long __dabt_usr @ 0 (USR_26 / USR_32)

.long __dabt_invalid @ 1 (FIQ_26 / FIQ_32)

.long __dabt_invalid @ 2 (IRQ_26 / IRQ_32)

.long __dabt_svc @ 3 (SVC_26 / SVC_32)

. . .

.long __dabt_invalid@f

* Prefetch abort dispatcher

* Enter in ABT mode, spsr = USR CPSR, lr = USR PC

vector_stub pabt, ABT_MODE, 4

.long __pabt_usr @ 0 (USR_26 / USR_32)

.long __pabt_invalid @ 1 (FIQ_26 / FIQ_32)

.long __pabt_invalid @ 2 (IRQ_26 / IRQ_32)

.long __pabt_svc @ 3 (SVC_26 / SVC_32)

. . .

.long __pabt_invalid@f

* Undef instr entry dispatcher

* Enter in UND mode, spsr = SVC/USR CPSR, lr = SVC/USR PC

vector_stub und, UND_MODE

.long __und_usr @ 0 (USR_26 / USR_32)

.long __und_invalid @ 1 (FIQ_26 / FIQ_32)

.long __und_invalid @ 2 (IRQ_26 / IRQ_32)

.long __und_svc @ 3 (SVC_26 / SVC_32)

. . .

.long __und_invalid@f

.align 5

vector_fiq:

disable_fiq

subs pc, lr, #4

vector_addrexcptn:

b vector_addrexcptn

* We group all the following data together to optimize

* for CPUs with separate I & D caches.

.align 5

.LCvswi:

.word vector_swi

.globl __stubs_end

__stubs_end:

.equ stubs_offset, __vectors_start + 0x200 - __stubs_start

.globl __vectors_start

__vectors_start:

swi SYS_ERROR0

b vector_und + stubs_offset

ldr pc, .LCvswi + stubs_offset

b vector_pabt + stubs_offset

b vector_dabt + stubs_offset

b vector_addrexcptn + stubs_offset

b vector_irq + stubs_offset

b vector_fiq + stubs_offset

.globl __vectors_end

__vectors_end:

To make it clearer for everyone, I simplified the code structure again as follows:

.globl __stubs_start

__stubs_start:

.align 5

vector_irq:

[code part] // Expand code

[jump table part] //Address jump table

. . .

.align 5

vector_dabt:

[code part]

[jump table part]

. . .

.align 5

vector_pabt:

[code part]

[jump table part]

. . .

.align 5

vector_und:

[code part]

[jump table part]

. . .

.align 5

vector_fiq:

. . .

.globl __stubs_end

__stubs_end:

.globl __vectors_start

__vectors_start:

swi SYS_ERROR0

b vector_und + stubs_offset

ldr pc, .LCvswi + stubs_offset

b vector_pabt + stubs_offset

b vector_dabt + stubs_offset

b vector_addrexcptn + stubs_offset

b vector_irq + stubs_offset

b vector_fiq + stubs_offset

.globl __vectors_end

__vectors_end:

I won't spend too much time explaining the meaning of the code here, as this is not the purpose of this article. As long as you understand the structure, you will have achieved your goal. But I will take some time to study the characteristics of the expanded code part (blue). This part of the code is position-independent. Let's study it a little bit to see why it is written this way.

.align 5

vector_irq:

[code part] // Expand code

[jump table part] //Address jump table

. . .

First of all, this part of the code has roughly the same structure, with some code in front and a jump table behind. The jump table defines some addresses. Let's cut this part and see

. . .

@ the branch table must immediately follow this code

and lr, lr, #0x0f (1) // lr currently stores the value of the previous status register, and performs AND on the last few bits.

// It is to determine whether it is in user mode or kernel mode before the interrupt. This value is used as a jump

// Transfer table index

mov r0, sp (2) // Use it for other purposes, and pass the sp value as the first parameter to the following function

ldr lr, [pc, lr, lsl #2] (3) // pc is the address of the currently executed instruction plus 8, i.e. the base address of the jump table, lr is the index

// Very good technique, it is always correct to get the current address from PC

mov pc, lr @ branch to handler in SVC mode

[jump table]

.long __irq_usr @ 0 (USR_26 / USR_32)

.long __irq_invalid @ 1 (FIQ_26 / FIQ_32)

.long __irq_invalid @ 2 (IRQ_26 / IRQ_32)

.long __irq_svc @ 3 (SVC_26 / SVC_32)

The actual jump is done in the last sentence, and everyone can see it clearly. Where does it jump to? If the interrupt was in svc mode before, it will jump to __irq_svc. We found that b (bl, bx, etc.) is not used directly here.

ü First, there is an offset after the b jump, and this offset is limited and cannot be too large

ü Second, you don't know whether the offset after the b jump will remain the same after the code is copied, because we need to move the code. So if you are not sure that the offset will remain unchanged after the move, then you can use the absolute address. The first three sentences of the above code calculate the absolute address, and then use the absolute address to assign it to pc to directly complete the jump.

These are some tips. In short, you need to pay attention to the jump part when writing position-independent code. Use b to jump or directly assign an absolute address (implemented through a jump table). If you cannot ensure that the offset is consistent after the move, you must pay attention when writing this part and use some tips.

You can use the -fPIC and -S options of gcc to assemble a small function. fPIC is the position-independent option. I believe that people who have compiled dynamic libraries are familiar with it. Take a look at how it is done. You will find that it is similar in nature.

Scenario 3: The Big Move

I use a chapter to introduce the large migration process, as well as some Linux problems and solutions during the migration. I put the entire migration process into a diagram and then discussed some technical details. We can see that this is a huge diagram, and the content of our chapter is all in the diagram.

ARM Linux interrupt vector table migration design process

We call the code organization before migration the Code/Load view, because this is the organization in the code (or image), and the code organization after migration is called the Exec view, which reflects the code in memory when the code is executed. I just talked about the first scenario, and those who forgot to go back to the first scenario, the execution process of the two memcpy is also shown in the figure, which is the blue and red dotted lines with arrows. This is the process of copying the code from the code view to the exec view, which is clear at a glance, no need to say more.

Now there is a problem. We find that the code between __vector_start and __vector_end is a bit weird. Let's take a look at it again:

.equ stubs_offset, __vectors_start + 0x200 - __stubs_start

.globl __vectors_start

__vectors_start:

swi SYS_ERROR0

b vector_und + stubs_offset

ldr pc, .LCvswi + stubs_offset

b vector_pabt + stubs_offset

b vector_dabt + stubs_offset

b vector_addrexcptn + stubs_offset

b vector_irq + stubs_offset

b vector_fiq + stubs_offset

.globl __vectors_end

__vectors_end:

In the second scenario, we said that this is called position-independent code because it needs to be copied to another place. And it is full of jump instructions. We found that except for the third line of code that uses an absolute address for jumping, the rest use b jumps. For example, b vector_dabt + stubs_offset, (vector_dabt is between __stubs_start and __stubs_end). If you use b vector_dabt, this will definitely be a problem, because the organization (map) of the exec view after copy is different, so the offset after b is wrong. Here, we have to adjust this offset. Stubs_offset is this adjustment value, which can be calculated. The specific calculation process is explained clearly in the figure, so I won’t mention it here. You can see the detailed derivation process in the figure.

In fact, although the instruction ldr pc, .LCvswi + stubs_offset uses an absolute address jump and a jump table, the address search process also uses this technique. We can see

.align 5

.LCvswi:

.word vector_swi

The .LCvswi location stores an address, which is where we will jump to. .align 5 means 32-byte alignment, which is to ensure cache line alignment. To find this address in the exec view, we need to add an offset. The principle is the same, because .LCvswi is between __stubs_start and __stubs_end, this area has been moved, so we can't use this label address directly. vector_swi has not been moved, so it can be used directly.

To sum up, I think what I am going to talk about is the technical details of Linux, and it does describe the principles and precautions of the code migration process. In fact, what is more important is how we can reverse this process, that is, how to design in situations involving code migration, and how to use these technologies to implement this design process. You can follow these steps:

1. Draw the big picture, determine the Code view and Exec view according to your requirements, design the migration section and the location of the migration

2. Write the code to be moved, and use the position-independent techniques (mentioned above) to encode and verify it.

3. Use code similar to memcpy to move

Keywords：ARM Reference address：ARM Linux interrupt vector table migration design process

Previous article：Brief analysis of the interrupt process under arm linux initialization
Next article：A brief analysis of the Arm Linux operating system calling process.

Recommended ReadingLatest update time:2024-11-16 13:30

ARM architecture and programming series blog - ARM system variants

Introduction to ARM architecture variants Some people may be very surprised that ARM can mutate. Could it be a gene mutation? Haha, in fact, to put it simply, ARM mutation means that ARM suddenly has a specific function! It is not a gene mutation! ARM is a reboot, okay? back to the top ARM architecture variants Fir

[Microcontroller]

ARM architecture and programming series blog - ARM system variants

arm-linux learning (1) lighting up the first LED program

1. Learn the Gpio schematic diagram Wiring diagram of LED and development board, Summarize: nled1—— GPF4 nled2—— GPF5 nled4——– GPF6 2. Register Two registers are used here, one is the port control register: GPACON-GPJCON, and the other is the port data register GPADAT-GPJDAT Here we can see that the address of

[Microcontroller]

arm-linux learning (1) lighting up the first LED program

Linux-2.6.39 porting on Tiny6410 - Peripheral driver porting

Linux kernel version number: linux 2.6.39 Cross-compilation tool: arm-linux-gcc 4.5.1 Linux kernel download: www.kernel.org Development board: Friendly Arm Tiny6410 LCD: Friendly Arm S70 1. Transplant LED driver Open arch/arm/mach-s3c64xx/mach-mini641

[Microcontroller]

Driver design of nRF24L01 under ARM and WinCE6.0

introduction nRF24L01 is a single-chip wireless transceiver chip that works in the 2.4-2.5 GHz universal ISM frequency band. It is used in wireless data communications, wireless access control, remote sensing, industrial sensors and toys. With the development of measurement and control technology, there ar

[Microcontroller]

Driver design of nRF24L01 under ARM and WinCE6.0

ARM (including ARMv7 operating mode introduction) registers, operating modes and instruction sets

　　This chapter introduces the basic features of the ARM processor, including details of registers, operating modes, and instruction sets. We will also touch on some processor implementation details, including instruction pipelining and branch prediction. 　　The ARMv7 architecture is a 32-bit processor architecture. I

[Microcontroller]

ARM (including ARMv7 operating mode introduction) registers, operating modes and instruction sets

Keil (MDK-ARM) series tutorial (VI)_Configuration (Ⅱ)

Ⅰ. Write in front This article follows the previous article "Configuration (Ⅰ)" to describe the three items after Configuration: Shortcut Keys, Text Completion, and Other. Shortcut Keys: All shortcut keys in Keil software can be viewed in Configuration, and shortcut keys can also be customized. Text Completion: in

[Microcontroller]

Keil (MDK-ARM) series tutorial (VI)_Configuration (Ⅱ)

Design of AU1200MAE Driver in Embedded Linux

With the increasing popularity of mobile multimedia terminals and their increasingly powerful functions, people are no longer satisfied with their handheld terminals being able to only listen to MP3 music, but hope that the terminals can play high-quality videos while playing music and support multiple video formats

[Industrial Control]

Design of AU1200MAE Driver in Embedded Linux

Introduction to ARM DSP X86 POWERPC MIPS FPGA

ARM: RISC (Harbin series), based on the memory access method of Load/Store, fixed-length instructions, pipeline structure (RISC instructions are all fixed-length, which also leads to more waste in instruction encoding, making the program space larger). Microprocessor applications based on ARM technology occupy more tha

[Microcontroller]

Popular Resources
Popular amplifiers