s3c2440 bare metal - code relocation - 4 - clear bss optimization and position-independent code-EEWORLD

Collect

1. Improvements in code relocation

Use ldr, str instead of ldrb, strb to speed up code relocation.

When relocating previously, we used the ldrb command to read 1 byte of data from the Nor Flash, and then used the strb command to write the 1 byte of data to the SDRAM. The Nor Flash of our 2440 development board is 16 bits, and the SDRAM is 32 bits. Suppose now you need to copy 16 bytes of data.

Different read and write instructions	The number of times the CPU reads NOR	Number of times the cpu writes to sdram
ldrb, strb	16	16
ldr, str	8	4

It can be seen that after we changed the read and write instructions, the number of read and write times decreased, which improved the access efficiency of the CPU.

The modified start.s code is shown in the figure below. Here I simply list the implementation of relocation:

... cpy: ldr r4, [r1] str r4, [r2] add r1, r1, #4 //r1 plus 4 add r2, r2, #4 //r2 plus 4 cmp r2, r3 //If r2 =< r3, continue copying ble cpy ...

Implementing relocation using C language

Add the following linker script:

SECTIONS

{

. = 0x30000000;

__code_start = .;

. = ALIGN(4);

.text :

{

*(.text)

}

. = ALIGN(4);

.rodata : { *(.rodata) }

. = ALIGN(4);

.data : { *(.data) }

. = ALIGN(4);

__bss_start = .;

.bss : { *(.bss) *(.COMMON) }

_end = .;

}

Add the following function implementation in main.c:

void copy2sdram(void)

{

//To get __code_start, __bss_start from the lds file

//Then copy the data from address 0 to __code_start

extern int __code_start, __bss_start;

volatile unsigned int *dest = (volatile unsigned int *)&__code_start;

volatile unsigned int *end = (volatile unsigned int *)&__bss_start;

volatile unsigned int *src = (volatile unsigned int *)0;

while (dest < end)

{

*dest++ = *src++; //Copy from address 0 to __code_start (the running address of the code segment)

}

Then, after setting the stack pointer sp in start.s, you can execute bl copy2sdram to relocate the code. For how to set the stack pointer, please refer to Clock Programming (II. Configuring Clock Registers) for implementation. I will not post the repeated code.

2. Improvements on clearing bss

Use ldr, str instead of ldrb, strb to speed up the clearing of bss

Similar to the above relocation, the code is as follows:

ldr r1, =__bss_start ldr r2, =_end mov r3, #0 clean: str r3, [r1] add r1, r1, #4 cmp r1, r2 ble clean bl main halt: b halt

C language to implement clear bss

The implementation is the same as the relocation code above, which is to write all 0s to the bss segment. After executing bl copy2sdram, and then bl clean_bss to complete the clearing of the bss segment.

void clean_bss(void) { /* Obtain __bss_start, _end*/ extern int _end, __bss_start from lds file; volatile unsigned int *start = (volatile unsigned int *)&__bss_start; volatile unsigned int *end = (volatile unsigned int * )&_end; while (start <= end) { *start++ = 0; } }

Note: The assembly code obtains the address of the variable in the link script, while the C language code obtains the value of the variable in the link script, so whether you use C language to improve relocation or clear bss, you need to add an address operator.

Ensure that the start addresses of all segments are aligned to 4 bytes

In order to speed up relocation and clearing bss, we used ldr and str to read and write in 4-byte units. However, this may cause a problem. If the link script does not use ALIGN(4) to align different segments with 4 bytes, access confusion will occur.

Let me give you an example:

#include "s3c2440_soc.h" #include "uart.h" #include "init.h" char g_Char = 'A'; //.data char g_Char3 = 'a'; const char g_Char2 = 'B'; //. rodata int g_A = 0; //bss int g_B; //bss int main(void) { uart0_init(); puts("nrg_A = "); printHex(g_A); puts("nr"); putchar(g_Char); return 0; }

Remove ALIGN(4) between the .data section and the .bss section in the linker script. Then we will find that when the program is executed, the output g_A=0. Why is that? We clearly initialized g_A='A'.

Let's analyze the disassembly and see:

Our .bss segment is right after the .data segment. When we clear the bss segment, we operate in 4-byte units, so when we clear g_A, the values of g_Char and g_Char are also cleared. So ALIGN(4) is added between the data segment and the data segment. After the modification, we will find that the address of the bss segment starts at 0x30000248, as shown below:

3. Position-independent code

We analyze the 'bl sdram_init' instruction: View the disassembly: (the link address of the code segment is 0x3000,0000)

Here, bl 3000036c does not jump to 3000036c. At this time, sdram is not initialized, so this physical address is inaccessible. To verify, we do another experiment, modify the connection script sdram.lds, change the link address to 0x3000,0800, compile, and view the disassembly:

You can see that it has become bl 300003ec, but the machine code e1a0c00d is the same for both. If the machine code is the same, the execution content must be the same. Therefore, it does not jump to the displayed address, but jumps to: pc + offset, which is determined by the linker.

Assuming the program is executed from 0x30000000, the current instruction address is: 0x3000005c, then it jumps to 0x3000036c; if the program runs from 0, the current instruction address is: 0x5c, then it jumps to: 0x000003ec

Jumping to a certain address is not determined by the bl instruction, but by the current pc value. The disassembly shows this value only for the convenience of reading the code.

Key point: In the disassembled file, the value of B or BL is just for easy viewing and does not really jump.

How to write position-independent code?

Use relative jump command b or bl; Before relocation, absolute addresses cannot be used, global variables/static variables cannot be accessed, and arrays with initial values cannot be accessed (because the initial values are placed in rodata and are accessed using absolute addresses); After relocation, use ldr pc = xxx to jump to the/runtime address; Writing position-independent code actually means not using absolute addresses. In addition to the previous rules, the most fundamental way to determine whether absolute addresses are used is to look at the disassembly.

Therefore, the previous example program uses the bl command for relative jump, and the program is still executed in NOR/sram. If you want the main function to be executed in SDRAM, you need to modify the code:

//bl main /*bl relative jump, the program is still executed in NOR/sram*/

ldr pc, =main/*absolute jump, jump to SDRAM*/

Reference address：s3c2440 bare metal - code relocation - 4 - clear bss optimization and position-independent code

Previous article：s3c2440 bare metal - abnormal interruption 1 - the principle and process of abnormal interruption
Next article：s3c2440 bare metal - code relocation - 3 - clear bss principle and implementation

Popular Resources
Popular amplifiers

Latest Microcontroller Articles

Naxin Micro and Xinxian jointly launched the NS800RT series of real-time control MCUs
On November 20, Naxin Micro announced that it would launch the NS800RT series of real-time control MCUs in cooperation with ChipSine. This series of MCUs has more efficient and powerful real-time control capabilities and rich ...
How to learn embedded systems based on ARM platform
1. The concept of embedded system focuses on understanding the concept of "embedded" from three aspects: 1. From the hardware perspective, the CPU-based peripheral devices are integrated into the CPU chip, such as the early X86-based ...
Summary of jffs2_scan_eraseblock issues
Summarize the problems encountered before: 1 Similar: mtd->read(0x44 bytes from 0x68cf44) returned ECC errorjffs2_get_inode_nodes(): CRC failed ...
Application of SPCOMM Control in Serial Communication of Delphi7.0
Abstract: Using Delphi to develop industrial control system software has become the choice of more and more developers, and serial port communication is one of the problems that must be solved in this process. ...
Using TComm component to realize serial communication in Delphi environment
Abstract: Using Delphi to develop industrial control system software has become the choice of more and more developers, and serial port communication is one of the problems that must be solved in this process. ...
Bar chart code for embedded development practices
Embedded Development Learning (10)
Embedded Development Learning (8)
Embedded Development Learning (6)

He Limin Column Microcontroller and Embedded Systems Bible

Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like