1. What is memory alignment? Why do we need memory alignment?
The memory space in modern computers is divided into bytes. Theoretically, it seems that access to any type of variable can start from any address, but the actual situation is that when accessing a specific type of variable, it is often accessed at a specific memory address. This is alignment.
The reasons for byte alignment are roughly as follows:
1. Platform reasons (porting reasons): Not all hardware platforms can access any data at any address; some hardware platforms can only fetch certain specific types of data at certain addresses, otherwise hardware exceptions will be thrown.
2. Performance reasons: Data structures (especially stacks) should be aligned on natural boundaries as much as possible. The reason is that in order to access unaligned memory, the processor needs to make two memory accesses; while aligned memory access only requires one access.
2. Alignment rules
The compiler on each specific platform has its own default "alignment coefficient" (also called alignment modulus). The programmer can change this coefficient through the pre-compilation command #pragma pack(n), n=1,2,4,8,16, where n is the "alignment coefficient" you want to specify. Rules:
1. Data member alignment rules: The data members of the structure (struct) (or union), the first data member is placed at offset 0, and the alignment of each subsequent data member is based on the smaller of the value specified by #pragma pack and the length of the data member itself.
That is, the offset of the starting address of each member variable relative to the starting address of the structure must be an integer multiple of the number of bytes occupied by the type of the variable.
2. Overall alignment rules for structures (or unions): After the data members have completed their own alignment, the structure (or union) itself must also be aligned. The alignment will be performed according to the smaller of the value specified by #pragma pack and the maximum data member length of the structure (or union).
3. Combining 1 and 2, it can be inferred that: First, if n is greater than or equal to the number of bytes occupied by the variable, then the offset must meet the default alignment method. Second, if n is less than the number of bytes occupied by the type of the variable, then the offset is a multiple of n and does not need to meet the default alignment method.
3. X86 Alignment Experiment
The following is a brief review and explanation of the above alignment rules, combined with examples for analysis:
1. The alignment value of the data type itself: For char type data, its own alignment value is 1 byte, for short type it is 2 bytes, and for
The int, float, and double types have a self-alignment value of 4 bytes.
2. The self-alignment value of a structure: the value with the largest self-alignment value among its members.
3. Specify the alignment value: #pragma pack(n) to set the variable to n-byte alignment. n-byte alignment means that the variable is stored starting at
There are two cases for the offset of the address. First, if n is greater than or equal to the number of bytes occupied by the variable, then the offset must meet the default alignment. Second, if n is less than the number of bytes occupied by the type of the variable, then the offset is a multiple of n and does not need to meet the default alignment.
4. Valid alignment value of data members and structures: the smaller value of the alignment value of the data member (data type) and the data structure itself and the specified alignment value. If the data members are aligned, the data structure will naturally be aligned.
Understanding the above four basic concepts, we begin to discuss the members of the specific data structure and its own alignment. The effective alignment value N is the value that is ultimately used to determine the data storage address mode. The effective alignment N means "aligned on N", that is, the "storage start address %N=0" of the data. The data variables in the data structure are arranged in the order of definition. The starting address of the first data variable is the starting address of the data structure. The member variables of the structure must be aligned, and the structure itself must also be rounded according to its own effective alignment value (the total length occupied by the member variables of the structure needs to be an integer multiple of the effective alignment value of the structure). The following is an in-depth understanding of the example of the compilation environment in VS2005:
Example B analysis:
struct B
{
char b;
int a;
short c;
};
Assume that B starts at address space 0x0000. In this example, the alignment value N is not explicitly specified, and the default value of VS2005 is 4.
The member variable b's own alignment value is 1, which is smaller than the specified or default alignment value of 4, so the effective alignment value is 1, and its storage address 0x0000 complies with 0x0000%1=0, which meets the byte alignment principle.
The member variable a's own alignment value is 4, which is equal to the specified or default alignment value of 4, so the effective alignment value is also 4. In order to ensure byte alignment, the member variable a can only be stored in four consecutive byte spaces starting at addresses 0x0004 to 0x0007, and 0x0004%4=0 is verified.
The member variable c has its own alignment value of 2, which is smaller than the specified or defaulted alignment value of 4. Therefore, the effective alignment value is 2, which can be stored in the two-byte space from 0x0008 to 0x0009 in sequence, which meets 0x0008%2=0.
So far, the byte alignment of the data members has been satisfied. Next, let's look at the alignment of data structure B. The alignment value of data structure B itself is the largest alignment value of its variables (that is, member variable b) 4, so the effective alignment value of structure B is also 4. According to the requirement of structure rounding, 0x0009 to 0x0000 = 10 bytes, (10 + 2) % 4 = 0. Therefore, 0x0000A to 0x000B is also occupied by structure B. Therefore, B has a total of 12 bytes from 0x0000 to 0x000B, sizeof(struct B) = 12.
The reason why 2 bytes are added to variable C is to enable the compiler to quickly and effectively access the structure array. Imagine if we define an array of B structures, the starting address of the first structure is 0, which is fine, but what about the second structure? According to the definition of the array, all elements in the array are adjacent. If the size of the structure is not added to an integer multiple of the alignment value (4), the starting address of the next structure will be 0x0000A, which obviously cannot meet the address alignment of the structure.
Example C analysis:
__align(2) struct C
{
char b;
int a;
short c;
};
Similarly, in Example C, the member variable b has its own alignment value of 1, and the specified alignment value is 2, so the effective alignment value is 1. Assuming that C starts at 0x0000, then b is stored at 0x0000, which meets 0x0000%1=0 and meets the byte alignment principle.
The member variable a has its own alignment value of 4 and the specified alignment value of 2, so the effective alignment value is 2, and it is stored in four consecutive bytes of 0x0002, 0x0003, 0x0004, and 0x0005 in sequence, which conforms to 0x0002%2=0 and meets the byte alignment principle
. The member variable c has its own alignment value of 2, which is equal to the specified alignment value, so the effective alignment value is 2, and it is stored in 0x0006 and 0x0007 in sequence, which conforms to 0x0006%2=0 and meets the byte alignment principle.
The eight bytes from 0x0000 to 0x00007 store the variables of structure C. The structure C has its own alignment value of 4, which is larger than the specified alignment value of 2, so the effective alignment value of C is 2. Since 8%2=0, C only occupies eight bytes from 0x0000 to 0x0007. Therefore, sizeof(struct C)=8, which fully meets the byte alignment principle. In addition to different specified alignment values causing data structures to be stored at different addresses, different compilers may also store structures differently.
4. Alignment issues on the ARM platform
In ARM, there are two types of instructions: ARM and Thumb.
ARM instructions: Each time an instruction is executed, the value of PC increases by 4 bytes (32 bits). To access 4 bytes of content at a time, the starting address of the byte must be at a 4-byte aligned position, that is, the lower two bits of the address are bits [0b00], which means that the address must be a multiple of 4.
Thumb instructions: Each time an instruction is executed, the value of PC increases by 2 bytes (16 bits). To access 2 bytes of content at a time, the starting address of the byte must be at a 2-byte aligned position, that is, the lower two bits of the address are bits [0b0], which means that the address must be a multiple of 2.
Following the above method is called the aligned method, and not following this method is called the unaligned storage access operation.
5. ARM platform byte alignment keywords
1. __align(num) is used to modify the byte boundary of the highest level object.
A. When using LDRD or STRD in assembly, this command __align(8) is used to modify the limit. To ensure that the data object is aligned accordingly.
B. The maximum limit of the command to modify the object is 8 bytes, which can make a 2-byte object aligned to 4 bytes, but cannot make a 4-byte object aligned to 2 bytes.
C. __align is a storage class modification. It only modifies the highest level type object and cannot be used for structure or function objects.
2. __packed is a one-byte alignment.
A. Packed objects cannot be aligned;
B. All object read and write accesses are unaligned;
C. float and structure unions containing float and objects that are not __packed will not be byte aligned;
D. __packed has no effect on local integer variables;
E. Forcing the conversion from unpacked objects to packed objects is undefined. Integer pointers can be legally defined as:
packed __packed int* p; //__packed int has no meaning.
3. __unaligned is used to modify the variable so that it can be accessed in an unaligned manner.
6. How to find problems with byte alignment. If there are alignment or assignment problems, first check:
1. The big little endian settings of the compiler;
2. Check whether the system itself supports unaligned access;
3. If it supports it, check whether the alignment is set or not. If not, check whether some special modifiers are needed during access to mark the special access operation.
VII. Conclusion
For the data structures used locally by 32-bit processors, in order to improve memory access efficiency, four-byte alignment is used; at the same time, in order to reduce memory overhead, the positions of structure members are arranged reasonably, the gaps between members caused by four-byte alignment are reduced, and memory overhead is reduced.
For data structures between processors, it is necessary to ensure that the length of the message does not change due to different compilation platforms and different processors. The message structure is compacted using one-byte alignment; to ensure the memory access efficiency of the data structure of messages between processors, byte padding is used to align the members in the message by four bytes.
The position of the members of the data structure should take into account the relationship between members, data access efficiency and space utilization. The principle of sequential arrangement is: four-byte members are placed at the front, two-byte members are immediately followed by the last four-byte member, one-byte members are immediately followed by the last two-byte member, and padding bytes are placed at the end. For example:
typedef struct tag_T_MSG{
long ParaA;
long ParaB;
short ParaC;
char ParaD;
char Pad;
} T_MSG;