Address alignment issues on ARM platforms-EEWORLD

Collect

Preface
ARM has been popular for a long time. It is unlikely that you don't know ARM when doing embedded development. Given its high performance at low power consumption, it has become the first choice for most embedded devices.
However, for those who are just starting out, you may encounter some strange problems. After all, most people are used to programming under IA-32. Although both are 32-bit processors, the system architecture is completely different, which also leads to some implicit problems. Here I want to describe a somewhat confusing problem, that is, accessing non-aligned address content on ARM will result in the so-called "unpredictable" result.
Alignment problem of ARM memory access
According to the description in the ARM document, its access rules are as follows:
1. When accessing 4 bytes of content at a time, the starting address of the content must be at a 4-byte aligned position;
2. When accessing 2 bytes of content at a time, the starting address of the content must be at a 2-byte aligned position;
(Single-byte content does not have this problem, so don't consider it.)
Well, since the rules are like this, they should be followed. However, restless people often like to break the rules and see what the consequences will be if they don't follow the rules. In addition, even those who follow the rules sometimes fail to consider carefully and it is normal to make mistakes. Okay, let's take a look at the consequences of making mistakes. For example, the following code:
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
int v32, *p32;
short v16, *p16;
p32 = (int*)&( buff[1] ); //unalignment
p16 = (short*)&( buff[1] ); //unalignment
v32 = *p32; //what's the result?
v16 = *p16; //what's the result?
If the above code runs on IA-32, the result should be as follows:
v32 = 0x9a785634
v16 = 0x5634Even
if the access is on an unaligned address, IA-32 will only sacrifice a little performance, but the result is guaranteed to be correct. Well, this is what we expect...
but...what about switching to ARM? Let's take a look at the results of the execution after ADS1.2 compilation:
v32 = 0x12785634
v16 = 0x1234
This result is a bit strange. It should point to 0x34, so if it is Big-Endian, v32 should be 0x3456789a, and if it is Little-Endian, it should be the result of IA-32. But what is the result now? It is neither, and the lower address 0x12 is inexplicably added... If you look at the assembly code generated by the compilation, these two assignments are very simple, using ldr and ldrsh instructions respectively. There is no problem with the instructions. They are used to read 32-bit and 16-bit data respectively, and they are both the most basic instructions. Well, well, this is the problem of accessing unaligned addresses that we are going to describe.
The cause of the problem (personal guess, unofficial information...)
Personally, I feel that this is a problem with the implementation of the ARM architecture, or it is originally By Design. This simplifies the implementation of the processor. When implementing IA-32, it will definitely determine whether the read address is aligned and then convert it into the corresponding operation. But what about ARM? It does not do this. It assumes that everyone follows the rules. If you dare to break it, I will show you a bad taste~~~
Is there any way to solve it?
In fact, ARM itself knows this problem, so it has added some support in the compiler. However, some people may ask, what about the above situation? Why is the result still wrong? It seems that no support has been added...
Well, in fact, ARM has made some efforts, but it cannot solve this situation... What it does is: try to ensure the correctness of the access content under the condition that the compiler can know it. This sentence is a bit general, so let's take a look at the specific situations one by one.
Compiler's efforts (1) - All local/global/static variables are placed on 4-byte aligned addresses
In fact, this effort is very common. Since accessing 4 bytes at a time is the most efficient on 32-bit platforms, most 32-platform compilers handle it this way, and ARM's ADS is no exception.
Compiler's efforts (2) - Fill, fill, and fill again
This thing is actually also common. Various compilers will automatically fill in the misalignment in some structure definitions to improve access efficiency (for example, accessing misaligned structures on IA-32 will add 1 cycle). The ARM compiler does the same, but it seems that this is not only for improving efficiency, but also to solve the misalignment problem.
Compiler Efforts (3) - Generate Special Code
Well, this is the key, and it is also the difference of the ARM compiler. Let's take a look at a piece of code:
__packed typedef struct _test
{
char a;
short c;
int d;
} test;
char buff[8] = {0x12, 0x34, 0x56, 0x78, 0x9a, 0xab, 0xbc, 0xcd};
test *p = (test *)buff;
v32 = p->d; //The v32 here uses the definition above;
it seems that there is an additional struct limited to __packed to cause misalignment, but I can't see much difference. But if you run it, you will find that the result here is correct. Let's take a look at the assembly code generated by ADS.
v32 = q->d;
[0xe2890003] add r0,r9,#3
[0xeb000088] bl __rt_uread4
[0xe1a05000] mov r5,r0
You have seen the instruction "bl __rt_uread4" here. Those who have some knowledge of ARM instructions know that bl is actually a function call. Therefore, the code here actually calls the __rt_uread4 function provided by ADS itself, which performs the operation of reading four bytes. ADS provides a similar series of functions for signed/unsigned, and 4-byte/2-byte read/write operations.
I guess you will ask, what if there is no __packed qualifier? You guessed it right. Without the __packed qualifier, the compiler will pending the above situation, so the position of d in this struct is 4-byte aligned (compile-time information, not actual runtime information). So we are back to the original example.
Then, there is another situation, that is, when there is __packed, and the fields in the struct meet the alignment requirements, what will the generated code look like? From the actual generated code, the only difference from the above assembly code is that the first instruction changes #3 to #4, and the __rt_uread4 function is still called later. Well, the conclusion is:
the compiler will automatically add special code to the 4-byte/2-byte access when using __packed to ensure the correct result.
Well, this is almost the description of this problem. If possible, try to rely on these functions of the compiler, and be extremely careful about the parts that the compiler cannot do anything about... p.s. In fact,
there are many things that can be done to prevent such problems. For example, embedded projects often like to manage memory allocation by themselves, so the memory allocation function written by yourself ensures that the returned address is 4-byte aligned...

Reference address：Address alignment issues on ARM platforms

Previous article：Byte alignment issues on ARM platform
Next article：ARM processor architecture

Popular Resources
Popular amplifiers