s3c2440 bare metal - code relocation - 4 - clear bss optimization and position-independent code

Publisher:tau29Latest update time:2024-07-05 Source: elecfans Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

1. Improvements in code relocation

Use ldr, str instead of ldrb, strb to speed up code relocation.

When relocating previously, we used the ldrb command to read 1 byte of data from the Nor Flash, and then used the strb command to write the 1 byte of data to the SDRAM. The Nor Flash of our 2440 development board is 16 bits, and the SDRAM is 32 bits. Suppose now you need to copy 16 bytes of data.

Different read and write instructions The number of times the CPU reads NOR Number of times the cpu writes to sdram
ldrb, strb 16 16
ldr, str 8 4

It can be seen that after we changed the read and write instructions, the number of read and write times decreased, which improved the access efficiency of the CPU.

The modified start.s code is shown in the figure below. Here I simply list the implementation of relocation:

... cpy: ldr r4, [r1] str r4, [r2] add r1, r1, #4 //r1 plus 4 add r2, r2, #4 //r2 plus 4 cmp r2, r3 //If r2 =< r3, continue copying ble cpy ...

Implementing relocation using C language

Add the following linker script:


SECTIONS

{

. = 0x30000000;


__code_start = .;


. = ALIGN(4);

.text :

{

*(.text)

}


. = ALIGN(4);

.rodata : { *(.rodata) }


. = ALIGN(4);

.data : { *(.data) }


. = ALIGN(4);

__bss_start = .;

.bss : { *(.bss) *(.COMMON) }

_end = .;

}

Add the following function implementation in main.c:


void copy2sdram(void)

{

//To get __code_start, __bss_start from the lds file

//Then copy the data from address 0 to __code_start


extern int __code_start, __bss_start;


volatile unsigned int *dest = (volatile unsigned int *)&__code_start;

volatile unsigned int *end = (volatile unsigned int *)&__bss_start;

volatile unsigned int *src = (volatile unsigned int *)0;


while (dest < end)

{

*dest++ = *src++; //Copy from address 0 to __code_start (the running address of the code segment)

}

}

Then, after setting the stack pointer sp in start.s, you can execute bl copy2sdram to relocate the code. For how to set the stack pointer, please refer to Clock Programming (II. Configuring Clock Registers) for implementation. I will not post the repeated code.


2. Improvements on clearing bss

Use ldr, str instead of ldrb, strb to speed up the clearing of bss

Similar to the above relocation, the code is as follows:

ldr r1, =__bss_start ldr r2, =_end mov r3, #0 clean: str r3, [r1] add r1, r1, #4 cmp r1, r2 ble clean bl main halt: b halt

C language to implement clear bss

The implementation is the same as the relocation code above, which is to write all 0s to the bss segment. After executing bl copy2sdram, and then bl clean_bss to complete the clearing of the bss segment.

void clean_bss(void) { /* Obtain __bss_start, _end*/ extern int _end, __bss_start from lds file; volatile unsigned int *start = (volatile unsigned int *)&__bss_start; volatile unsigned int *end = (volatile unsigned int * )&_end; while (start <= end) { *start++ = 0; } }

Note: The assembly code obtains the address of the variable in the link script, while the C language code obtains the value of the variable in the link script, so whether you use C language to improve relocation or clear bss, you need to add an address operator.

Ensure that the start addresses of all segments are aligned to 4 bytes

In order to speed up relocation and clearing bss, we used ldr and str to read and write in 4-byte units. However, this may cause a problem. If the link script does not use ALIGN(4) to align different segments with 4 bytes, access confusion will occur.

Let me give you an example:

#include "s3c2440_soc.h" #include "uart.h" #include "init.h" char g_Char = 'A'; //.data char g_Char3 = 'a'; const char g_Char2 = 'B'; //. rodata int g_A = 0; //bss int g_B; //bss int main(void) { uart0_init(); puts("nrg_A = "); printHex(g_A); puts("nr"); putchar(g_Char); return 0; }

Remove ALIGN(4) between the .data section and the .bss section in the linker script. Then we will find that when the program is executed, the output g_A=0. Why is that? We clearly initialized g_A='A'.

Let's analyze the disassembly and see:

Our .bss segment is right after the .data segment. When we clear the bss segment, we operate in 4-byte units, so when we clear g_A, the values ​​of g_Char and g_Char are also cleared. So ALIGN(4) is added between the data segment and the data segment. After the modification, we will find that the address of the bss segment starts at 0x30000248, as shown below:

3. Position-independent code

We analyze the 'bl sdram_init' instruction: View the disassembly: (the link address of the code segment is 0x3000,0000)

Here, bl 3000036c does not jump to 3000036c. At this time, sdram is not initialized, so this physical address is inaccessible. To verify, we do another experiment, modify the connection script sdram.lds, change the link address to 0x3000,0800, compile, and view the disassembly:

You can see that it has become bl 300003ec, but the machine code e1a0c00d is the same for both. If the machine code is the same, the execution content must be the same. Therefore, it does not jump to the displayed address, but jumps to: pc + offset, which is determined by the linker.


Assuming the program is executed from 0x30000000, the current instruction address is: 0x3000005c, then it jumps to 0x3000036c; if the program runs from 0, the current instruction address is: 0x5c, then it jumps to: 0x000003ec


Jumping to a certain address is not determined by the bl instruction, but by the current pc value. The disassembly shows this value only for the convenience of reading the code.


Key point: In the disassembled file, the value of B or BL is just for easy viewing and does not really jump.


How to write position-independent code?


Use relative jump command b or bl; Before relocation, absolute addresses cannot be used, global variables/static variables cannot be accessed, and arrays with initial values ​​cannot be accessed (because the initial values ​​are placed in rodata and are accessed using absolute addresses); After relocation, use ldr pc = xxx to jump to the/runtime address; Writing position-independent code actually means not using absolute addresses. In addition to the previous rules, the most fundamental way to determine whether absolute addresses are used is to look at the disassembly.


Therefore, the previous example program uses the bl command for relative jump, and the program is still executed in NOR/sram. If you want the main function to be executed in SDRAM, you need to modify the code:


//bl main /*bl relative jump, the program is still executed in NOR/sram*/

ldr pc, =main/*absolute jump, jump to SDRAM*/


Reference address:s3c2440 bare metal - code relocation - 4 - clear bss optimization and position-independent code

Previous article:s3c2440 bare metal - abnormal interruption 1 - the principle and process of abnormal interruption
Next article:s3c2440 bare metal - code relocation - 3 - clear bss principle and implementation

Latest Microcontroller Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号