Preface
ARM has been popular for a long time. It is unlikely that you don't know ARM when doing embedded development. Given its high performance at low power consumption, it has become the first choice for most embedded devices.
However, for those who are just starting out, you may encounter some strange problems. After all, most people are used to programming under IA-32. Although both are 32-bit processors, the system architecture is completely different, which also leads to some implicit problems. Here I want to describe a somewhat confusing problem, that is, accessing non-aligned address content on ARM will result in the so-called "unpredictable" result.
Alignment problem of ARM memory access
According to the description in the ARM document, its access rules are as follows:
1. When accessing 4 bytes of content at a time, the starting address of the content must be at a 4-byte aligned position;
2. When accessing 2 bytes of content at a time, the starting address of the content must be at a 2-byte aligned position;
(Single-byte content does not have this problem, so don't consider it.)
Well, since the rules are like this, they should be followed. However, restless people often like to break the rules and see what the consequences will be if they don't follow the rules. In addition, even those who follow the rules sometimes fail to consider carefully and it is normal to make mistakes. Okay, let's take a look at the consequences of making mistakes. For example, the following code:
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
int v32, *p32;
short v16, *p16;
p32 = (int*)&( buff[1] ); //unalignment
p16 = (short*)&( buff[1] ); //unalignment
v32 = *p32; //what's the result?
v16 = *p16; //what's the result?
If the above code runs on IA-32, the result should be as follows:
v32 = 0x9a785634
v16 = 0x5634Even
if the access is on an unaligned address, IA-32 will only sacrifice a little performance, but the result is guaranteed to be correct. Well, this is what we expect...
but...what about switching to ARM? Let's take a look at the results of the execution after ADS1.2 compilation:
v32 = 0x12785634
v16 = 0x1234
This result is a bit strange. It should point to 0x34, so if it is Big-Endian, v32 should be 0x3456789a, and if it is Little-Endian, it should be the result of IA-32. But what is the result now? It is neither, and the lower address 0x12 is inexplicably added... If you look at the assembly code generated by the compilation, these two assignments are very simple, using ldr and ldrsh instructions respectively. There is no problem with the instructions. They are used to read 32-bit and 16-bit data respectively, and they are both the most basic instructions. Well, well, this is the problem of accessing unaligned addresses that we are going to describe.
The cause of the problem (personal guess, unofficial information...)
Personally, I feel that this is a problem with the implementation of the ARM architecture, or it is originally By Design. This simplifies the implementation of the processor. When implementing IA-32, it will definitely determine whether the read address is aligned and then convert it into the corresponding operation. But what about ARM? It does not do this. It assumes that everyone follows the rules. If you dare to break it, I will show you a bad taste~~~
Is there any way to solve it?
In fact, ARM itself knows this problem, so it has added some support in the compiler. However, some people may ask, what about the above situation? Why is the result still wrong? It seems that no support has been added...
Well, in fact, ARM has made some efforts, but it cannot solve this situation... What it does is: try to ensure the correctness of the access content under the condition that the compiler can know it. This sentence is a bit general, so let's take a look at the specific situations one by one.
Compiler's efforts (1) - All local/global/static variables are placed on 4-byte aligned addresses
In fact, this effort is very common. Since accessing 4 bytes at a time is the most efficient on 32-bit platforms, most 32-platform compilers handle it this way, and ARM's ADS is no exception.
Compiler's efforts (2) - Fill, fill, and fill again
This thing is actually also common. Various compilers will automatically fill in the misalignment in some structure definitions to improve access efficiency (for example, accessing misaligned structures on IA-32 will add 1 cycle). The ARM compiler does the same, but it seems that this is not only for improving efficiency, but also to solve the misalignment problem.
Compiler Efforts (3) - Generate Special Code
Well, this is the key, and it is also the difference of the ARM compiler. Let's take a look at a piece of code:
__packed typedef struct _test
{
char a;
short c;
int d;
} test;
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
test *p = (test *)buff;
v32 = p->d; //The v32 here uses the definition above;
it seems that there is an additional struct limited to __packed to cause misalignment, but I can't see much difference. But if you run it, you will find that the result here is correct. Let's take a look at the assembly code generated by ADS.
v32 = q->d;
[0xe2890003] add r0,r9,#3
[0xeb000088] bl __rt_uread4
[0xe1a05000] mov r5,r0
You have seen the instruction "bl __rt_uread4" here. Those who have some knowledge of ARM instructions know that bl is actually a function call. Therefore, the code here actually calls the __rt_uread4 function provided by ADS itself, which performs the operation of reading four bytes. ADS provides a similar series of functions for signed/unsigned, and 4-byte/2-byte read/write operations.
I guess you will ask, what if there is no __packed qualifier? You guessed it right. Without the __packed qualifier, the compiler will pending the above situation, so the position of d in this struct is 4-byte aligned (compile-time information, not actual runtime information). So we are back to the original example.
Then, there is another situation, that is, when there is __packed, and the fields in the struct meet the alignment requirements, what will the generated code look like? From the actual generated code, the only difference from the above assembly code is that the first instruction changes #3 to #4, and the __rt_uread4 function is still called later. Well, the conclusion is:
the compiler will automatically add special code to the 4-byte/2-byte access when using __packed to ensure the correct result.
Well, this is almost the description of this problem. If possible, try to rely on these functions of the compiler, and be extremely careful about the parts that the compiler cannot do anything about... p.s. In fact,
there are many things that can be done to prevent such problems. For example, embedded projects often like to manage memory allocation by themselves, so the memory allocation function written by yourself ensures that the returned address is 4-byte aligned...
Reference address:Address alignment issues on ARM platforms
ARM has been popular for a long time. It is unlikely that you don't know ARM when doing embedded development. Given its high performance at low power consumption, it has become the first choice for most embedded devices.
However, for those who are just starting out, you may encounter some strange problems. After all, most people are used to programming under IA-32. Although both are 32-bit processors, the system architecture is completely different, which also leads to some implicit problems. Here I want to describe a somewhat confusing problem, that is, accessing non-aligned address content on ARM will result in the so-called "unpredictable" result.
Alignment problem of ARM memory access
According to the description in the ARM document, its access rules are as follows:
1. When accessing 4 bytes of content at a time, the starting address of the content must be at a 4-byte aligned position;
2. When accessing 2 bytes of content at a time, the starting address of the content must be at a 2-byte aligned position;
(Single-byte content does not have this problem, so don't consider it.)
Well, since the rules are like this, they should be followed. However, restless people often like to break the rules and see what the consequences will be if they don't follow the rules. In addition, even those who follow the rules sometimes fail to consider carefully and it is normal to make mistakes. Okay, let's take a look at the consequences of making mistakes. For example, the following code:
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
int v32, *p32;
short v16, *p16;
p32 = (int*)&( buff[1] ); //unalignment
p16 = (short*)&( buff[1] ); //unalignment
v32 = *p32; //what's the result?
v16 = *p16; //what's the result?
If the above code runs on IA-32, the result should be as follows:
v32 = 0x9a785634
v16 = 0x5634Even
if the access is on an unaligned address, IA-32 will only sacrifice a little performance, but the result is guaranteed to be correct. Well, this is what we expect...
but...what about switching to ARM? Let's take a look at the results of the execution after ADS1.2 compilation:
v32 = 0x12785634
v16 = 0x1234
This result is a bit strange. It should point to 0x34, so if it is Big-Endian, v32 should be 0x3456789a, and if it is Little-Endian, it should be the result of IA-32. But what is the result now? It is neither, and the lower address 0x12 is inexplicably added... If you look at the assembly code generated by the compilation, these two assignments are very simple, using ldr and ldrsh instructions respectively. There is no problem with the instructions. They are used to read 32-bit and 16-bit data respectively, and they are both the most basic instructions. Well, well, this is the problem of accessing unaligned addresses that we are going to describe.
The cause of the problem (personal guess, unofficial information...)
Personally, I feel that this is a problem with the implementation of the ARM architecture, or it is originally By Design. This simplifies the implementation of the processor. When implementing IA-32, it will definitely determine whether the read address is aligned and then convert it into the corresponding operation. But what about ARM? It does not do this. It assumes that everyone follows the rules. If you dare to break it, I will show you a bad taste~~~
Is there any way to solve it?
In fact, ARM itself knows this problem, so it has added some support in the compiler. However, some people may ask, what about the above situation? Why is the result still wrong? It seems that no support has been added...
Well, in fact, ARM has made some efforts, but it cannot solve this situation... What it does is: try to ensure the correctness of the access content under the condition that the compiler can know it. This sentence is a bit general, so let's take a look at the specific situations one by one.
Compiler's efforts (1) - All local/global/static variables are placed on 4-byte aligned addresses
In fact, this effort is very common. Since accessing 4 bytes at a time is the most efficient on 32-bit platforms, most 32-platform compilers handle it this way, and ARM's ADS is no exception.
Compiler's efforts (2) - Fill, fill, and fill again
This thing is actually also common. Various compilers will automatically fill in the misalignment in some structure definitions to improve access efficiency (for example, accessing misaligned structures on IA-32 will add 1 cycle). The ARM compiler does the same, but it seems that this is not only for improving efficiency, but also to solve the misalignment problem.
Compiler Efforts (3) - Generate Special Code
Well, this is the key, and it is also the difference of the ARM compiler. Let's take a look at a piece of code:
__packed typedef struct _test
{
char a;
short c;
int d;
} test;
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
test *p = (test *)buff;
v32 = p->d; //The v32 here uses the definition above;
it seems that there is an additional struct limited to __packed to cause misalignment, but I can't see much difference. But if you run it, you will find that the result here is correct. Let's take a look at the assembly code generated by ADS.
v32 = q->d;
[0xe2890003] add r0,r9,#3
[0xeb000088] bl __rt_uread4
[0xe1a05000] mov r5,r0
You have seen the instruction "bl __rt_uread4" here. Those who have some knowledge of ARM instructions know that bl is actually a function call. Therefore, the code here actually calls the __rt_uread4 function provided by ADS itself, which performs the operation of reading four bytes. ADS provides a similar series of functions for signed/unsigned, and 4-byte/2-byte read/write operations.
I guess you will ask, what if there is no __packed qualifier? You guessed it right. Without the __packed qualifier, the compiler will pending the above situation, so the position of d in this struct is 4-byte aligned (compile-time information, not actual runtime information). So we are back to the original example.
Then, there is another situation, that is, when there is __packed, and the fields in the struct meet the alignment requirements, what will the generated code look like? From the actual generated code, the only difference from the above assembly code is that the first instruction changes #3 to #4, and the __rt_uread4 function is still called later. Well, the conclusion is:
the compiler will automatically add special code to the 4-byte/2-byte access when using __packed to ensure the correct result.
Well, this is almost the description of this problem. If possible, try to rely on these functions of the compiler, and be extremely careful about the parts that the compiler cannot do anything about... p.s. In fact,
there are many things that can be done to prevent such problems. For example, embedded projects often like to manage memory allocation by themselves, so the memory allocation function written by yourself ensures that the returned address is 4-byte aligned...
Previous article:Byte alignment issues on ARM platform
Next article:ARM processor architecture
Recommended Content
Latest Microcontroller Articles
- Naxin Micro and Xinxian jointly launched the NS800RT series of real-time control MCUs
- How to learn embedded systems based on ARM platform
- Summary of jffs2_scan_eraseblock issues
- Application of SPCOMM Control in Serial Communication of Delphi7.0
- Using TComm component to realize serial communication in Delphi environment
- Bar chart code for embedded development practices
- Embedded Development Learning (10)
- Embedded Development Learning (8)
- Embedded Development Learning (6)
He Limin Column
Microcontroller and Embedded Systems Bible
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
MoreSelected Circuit Diagrams
MorePopular Articles
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
MoreDaily News
- STMicroelectronics discloses its 2027-2028 financial model and path to achieve its 2030 goals
- 2024 China Automotive Charging and Battery Swapping Ecosystem Conference held in Taiyuan
- State-owned enterprises team up to invest in solid-state battery giant
- The evolution of electronic and electrical architecture is accelerating
- The first! National Automotive Chip Quality Inspection Center established
- BYD releases self-developed automotive chip using 4nm process, with a running score of up to 1.15 million
- GEODNET launches GEO-PULSE, a car GPS navigation device
- Should Chinese car companies develop their own high-computing chips?
- Infineon and Siemens combine embedded automotive software platform with microcontrollers to provide the necessary functions for next-generation SDVs
- Continental launches invisible biometric sensor display to monitor passengers' vital signs
Guess you like
- Experts say: Important considerations in smart speaker design
- Related FFT and FHT algorithms are also quite complex
- Sensor purchase~If you have the product, please pay attention
- MSO6B What are the new features of the new 6B series mixed signal oscilloscope?
- msp430g2452 digital radio digital tube display
- Internal structure of ESP8266 chip
- Primary side issues of push-pull circuits
- [Free trial of Pingtouge Bluetooth Mesh Gateway Development Kit] + Gateway module and sub-device cloud platform and problem solving supplement
- Analog Dialogue Volume 53 Collection
- How many components do you think are needed to make an adjustable power supply?