1. Improvements in code relocation
Use ldr, str instead of ldrb, strb to speed up code relocation.
When relocating previously, we used the ldrb command to read 1 byte of data from the Nor Flash, and then used the strb command to write the 1 byte of data to the SDRAM. The Nor Flash of our 2440 development board is 16 bits, and the SDRAM is 32 bits. Suppose now you need to copy 16 bytes of data.
Different read and write instructions | The number of times the CPU reads NOR | Number of times the cpu writes to sdram |
---|---|---|
ldrb, strb | 16 | 16 |
ldr, str | 8 | 4 |
It can be seen that after we changed the read and write instructions, the number of read and write times decreased, which improved the access efficiency of the CPU.
The modified start.s code is shown in the figure below. Here I simply list the implementation of relocation:
... cpy: ldr r4, [r1] str r4, [r2] add r1, r1, #4 //r1 plus 4 add r2, r2, #4 //r2 plus 4 cmp r2, r3 //If r2 =< r3, continue copying ble cpy ...
Implementing relocation using C language
Add the following linker script:
SECTIONS
{
. = 0x30000000;
__code_start = .;
. = ALIGN(4);
.text :
{
*(.text)
}
. = ALIGN(4);
.rodata : { *(.rodata) }
. = ALIGN(4);
.data : { *(.data) }
. = ALIGN(4);
__bss_start = .;
.bss : { *(.bss) *(.COMMON) }
_end = .;
}
Add the following function implementation in main.c:
void copy2sdram(void)
{
//To get __code_start, __bss_start from the lds file
//Then copy the data from address 0 to __code_start
extern int __code_start, __bss_start;
volatile unsigned int *dest = (volatile unsigned int *)&__code_start;
volatile unsigned int *end = (volatile unsigned int *)&__bss_start;
volatile unsigned int *src = (volatile unsigned int *)0;
while (dest < end)
{
*dest++ = *src++; //Copy from address 0 to __code_start (the running address of the code segment)
}
}
Then, after setting the stack pointer sp in start.s, you can execute bl copy2sdram to relocate the code. For how to set the stack pointer, please refer to Clock Programming (II. Configuring Clock Registers) for implementation. I will not post the repeated code.
2. Improvements on clearing bss
Use ldr, str instead of ldrb, strb to speed up the clearing of bss
Similar to the above relocation, the code is as follows:
ldr r1, =__bss_start ldr r2, =_end mov r3, #0 clean: str r3, [r1] add r1, r1, #4 cmp r1, r2 ble clean bl main halt: b halt
C language to implement clear bss
The implementation is the same as the relocation code above, which is to write all 0s to the bss segment. After executing bl copy2sdram, and then bl clean_bss to complete the clearing of the bss segment.
void clean_bss(void) { /* Obtain __bss_start, _end*/ extern int _end, __bss_start from lds file; volatile unsigned int *start = (volatile unsigned int *)&__bss_start; volatile unsigned int *end = (volatile unsigned int * )&_end; while (start <= end) { *start++ = 0; } }
Note: The assembly code obtains the address of the variable in the link script, while the C language code obtains the value of the variable in the link script, so whether you use C language to improve relocation or clear bss, you need to add an address operator.
Ensure that the start addresses of all segments are aligned to 4 bytes
In order to speed up relocation and clearing bss, we used ldr and str to read and write in 4-byte units. However, this may cause a problem. If the link script does not use ALIGN(4) to align different segments with 4 bytes, access confusion will occur.
Let me give you an example:
#include "s3c2440_soc.h" #include "uart.h" #include "init.h" char g_Char = 'A'; //.data char g_Char3 = 'a'; const char g_Char2 = 'B'; //. rodata int g_A = 0; //bss int g_B; //bss int main(void) { uart0_init(); puts("nrg_A = "); printHex(g_A); puts("nr"); putchar(g_Char); return 0; }
Remove ALIGN(4) between the .data section and the .bss section in the linker script. Then we will find that when the program is executed, the output g_A=0. Why is that? We clearly initialized g_A='A'.
Let's analyze the disassembly and see:
Our .bss segment is right after the .data segment. When we clear the bss segment, we operate in 4-byte units, so when we clear g_A, the values of g_Char and g_Char are also cleared. So ALIGN(4) is added between the data segment and the data segment. After the modification, we will find that the address of the bss segment starts at 0x30000248, as shown below:
3. Position-independent code
We analyze the 'bl sdram_init' instruction: View the disassembly: (the link address of the code segment is 0x3000,0000)
Here, bl 3000036c does not jump to 3000036c. At this time, sdram is not initialized, so this physical address is inaccessible. To verify, we do another experiment, modify the connection script sdram.lds, change the link address to 0x3000,0800, compile, and view the disassembly:
You can see that it has become bl 300003ec, but the machine code e1a0c00d is the same for both. If the machine code is the same, the execution content must be the same. Therefore, it does not jump to the displayed address, but jumps to: pc + offset, which is determined by the linker.
Assuming the program is executed from 0x30000000, the current instruction address is: 0x3000005c, then it jumps to 0x3000036c; if the program runs from 0, the current instruction address is: 0x5c, then it jumps to: 0x000003ec
Jumping to a certain address is not determined by the bl instruction, but by the current pc value. The disassembly shows this value only for the convenience of reading the code.
Key point: In the disassembled file, the value of B or BL is just for easy viewing and does not really jump.
How to write position-independent code?
Use relative jump command b or bl; Before relocation, absolute addresses cannot be used, global variables/static variables cannot be accessed, and arrays with initial values cannot be accessed (because the initial values are placed in rodata and are accessed using absolute addresses); After relocation, use ldr pc = xxx to jump to the/runtime address; Writing position-independent code actually means not using absolute addresses. In addition to the previous rules, the most fundamental way to determine whether absolute addresses are used is to look at the disassembly.
Therefore, the previous example program uses the bl command for relative jump, and the program is still executed in NOR/sram. If you want the main function to be executed in SDRAM, you need to modify the code:
//bl main /*bl relative jump, the program is still executed in NOR/sram*/
ldr pc, =main/*absolute jump, jump to SDRAM*/
Previous article:s3c2440 bare metal - abnormal interruption 1 - the principle and process of abnormal interruption
Next article:s3c2440 bare metal - code relocation - 3 - clear bss principle and implementation
- Popular Resources
- Popular amplifiers
- Naxin Micro and Xinxian jointly launched the NS800RT series of real-time control MCUs
- How to learn embedded systems based on ARM platform
- Summary of jffs2_scan_eraseblock issues
- Application of SPCOMM Control in Serial Communication of Delphi7.0
- Using TComm component to realize serial communication in Delphi environment
- Bar chart code for embedded development practices
- Embedded Development Learning (10)
- Embedded Development Learning (8)
- Embedded Development Learning (6)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- STMicroelectronics discloses its 2027-2028 financial model and path to achieve its 2030 goals
- 2024 China Automotive Charging and Battery Swapping Ecosystem Conference held in Taiyuan
- State-owned enterprises team up to invest in solid-state battery giant
- The evolution of electronic and electrical architecture is accelerating
- The first! National Automotive Chip Quality Inspection Center established
- BYD releases self-developed automotive chip using 4nm process, with a running score of up to 1.15 million
- GEODNET launches GEO-PULSE, a car GPS navigation device
- Should Chinese car companies develop their own high-computing chips?
- Infineon and Siemens combine embedded automotive software platform with microcontrollers to provide the necessary functions for next-generation SDVs
- Continental launches invisible biometric sensor display to monitor passengers' vital signs
- Dear experts, please tell me! Why can't I debug the array or char* defined in CCS6? ? ?
- 【NXP Rapid IoT Review】+ Development Kit Demonstration and Hardware Familiarization
- Allwinner R16 with NAND FLASH
- EEWorld’s 15th Anniversary, Thanks to All of You! A Review of Those Unforgettable Little Happinesses~
- Beacon navigation sound positioning and recognition - analysis of sound source positioning algorithm
- TI Live | How to make industrial communications more reliable? Take a look at this cost-optimized solution!
- Simple FM walkie-talkie
- How to fill in the PCB footprint of Cadence device?
- STM32F4 NVIC_SystemReset() function software reset fails due to external pull-up resistor on RST pin
- How to connect CC3200 to Gizwits Cloud