ARM memory alignment summary-EEWORLD

Collect

1. What is memory alignment? Why do we need memory alignment?

     The memory space in modern computers is divided into bytes. Theoretically, it seems that access to any type of variable can start from any address, but the actual situation is that when accessing a specific type of variable, it is often accessed at a specific memory address. This is alignment.
The reasons for byte alignment are roughly as follows:
     1. Platform reasons (porting reasons): Not all hardware platforms can access any data at any address; some hardware platforms can only fetch certain specific types of data at certain addresses, otherwise hardware exceptions will be thrown.
     2. Performance reasons: Data structures (especially stacks) should be aligned on natural boundaries as much as possible. The reason is that in order to access unaligned memory, the processor needs to make two memory accesses; while aligned memory access only requires one access.

2. Alignment rules
The compiler on each specific platform has its own default "alignment coefficient" (also called alignment modulus). The programmer can change this coefficient through the pre-compilation command #pragma pack(n), n=1,2,4,8,16, where n is the "alignment coefficient" you want to specify. Rules:
1. Data member alignment rules: The data members of the structure (struct) (or union), the first data member is placed at offset 0, and the alignment of each subsequent data member is based on the smaller of the value specified by #pragma pack and the length of the data member itself.

That is, the offset of the starting address of each member variable relative to the starting address of the structure must be an integer multiple of the number of bytes occupied by the type of the variable.
2. Overall alignment rules for structures (or unions): After the data members have completed their own alignment, the structure (or union) itself must also be aligned. The alignment will be performed according to the smaller of the value specified by #pragma pack and the maximum data member length of the structure (or union).
3. Combining 1 and 2, it can be inferred that: First, if n is greater than or equal to the number of bytes occupied by the variable, then the offset must meet the default alignment method. Second, if n is less than the number of bytes occupied by the type of the variable, then the offset is a multiple of n and does not need to meet the default alignment method.

3. X86 Alignment Experiment
The following is a brief review and explanation of the above alignment rules, combined with examples for analysis:
1. The alignment value of the data type itself: For char type data, its own alignment value is 1 byte, for short type it is 2 bytes, and for

The int, float, and double types have a self-alignment value of 4 bytes.
2. The self-alignment value of a structure: the value with the largest self-alignment value among its members.
3. Specify the alignment value: #pragma pack(n) to set the variable to n-byte alignment. n-byte alignment means that the variable is stored starting at

There are two cases for the offset of the address. First, if n is greater than or equal to the number of bytes occupied by the variable, then the offset must meet the default alignment. Second, if n is less than the number of bytes occupied by the type of the variable, then the offset is a multiple of n and does not need to meet the default alignment.
      4. Valid alignment value of data members and structures: the smaller value of the alignment value of the data member (data type) and the data structure itself and the specified alignment value. If the data members are aligned, the data structure will naturally be aligned.
      Understanding the above four basic concepts, we begin to discuss the members of the specific data structure and its own alignment. The effective alignment value N is the value that is ultimately used to determine the data storage address mode. The effective alignment N means "aligned on N", that is, the "storage start address %N=0" of the data. The data variables in the data structure are arranged in the order of definition. The starting address of the first data variable is the starting address of the data structure. The member variables of the structure must be aligned, and the structure itself must also be rounded according to its own effective alignment value (the total length occupied by the member variables of the structure needs to be an integer multiple of the effective alignment value of the structure). The following is an in-depth understanding of the example of the compilation environment in VS2005:
Example B analysis:
struct B
{
char b;
int a;
short c;
};
Assume that B starts at address space 0x0000. In this example, the alignment value N is not explicitly specified, and the default value of VS2005 is 4.
   The member variable b's own alignment value is 1, which is smaller than the specified or default alignment value of 4, so the effective alignment value is 1, and its storage address 0x0000 complies with 0x0000%1=0, which meets the byte alignment principle.
   The member variable a's own alignment value is 4, which is equal to the specified or default alignment value of 4, so the effective alignment value is also 4. In order to ensure byte alignment, the member variable a can only be stored in four consecutive byte spaces starting at addresses 0x0004 to 0x0007, and 0x0004%4=0 is verified.
   The member variable c has its own alignment value of 2, which is smaller than the specified or defaulted alignment value of 4. Therefore, the effective alignment value is 2, which can be stored in the two-byte space from 0x0008 to 0x0009 in sequence, which meets 0x0008%2=0.
   So far, the byte alignment of the data members has been satisfied. Next, let's look at the alignment of data structure B. The alignment value of data structure B itself is the largest alignment value of its variables (that is, member variable b) 4, so the effective alignment value of structure B is also 4. According to the requirement of structure rounding, 0x0009 to 0x0000 = 10 bytes, (10 + 2) % 4 = 0. Therefore, 0x0000A to 0x000B is also occupied by structure B. Therefore, B has a total of 12 bytes from 0x0000 to 0x000B, sizeof(struct B) = 12.
   The reason why 2 bytes are added to variable C is to enable the compiler to quickly and effectively access the structure array. Imagine if we define an array of B structures, the starting address of the first structure is 0, which is fine, but what about the second structure? According to the definition of the array, all elements in the array are adjacent. If the size of the structure is not added to an integer multiple of the alignment value (4), the starting address of the next structure will be 0x0000A, which obviously cannot meet the address alignment of the structure.
Example C analysis:

__align(2) struct C
{
char b;
int a;
short c;
};
  Similarly, in Example C, the member variable b has its own alignment value of 1, and the specified alignment value is 2, so the effective alignment value is 1. Assuming that C starts at 0x0000, then b is stored at 0x0000, which meets 0x0000%1=0 and meets the byte alignment principle.
  The member variable a has its own alignment value of 4 and the specified alignment value of 2, so the effective alignment value is 2, and it is stored in four consecutive bytes of 0x0002, 0x0003, 0x0004, and 0x0005 in sequence, which conforms to 0x0002%2=0 and meets the byte alignment principle
  . The member variable c has its own alignment value of 2, which is equal to the specified alignment value, so the effective alignment value is 2, and it is stored in 0x0006 and 0x0007 in sequence, which conforms to 0x0006%2=0 and meets the byte alignment principle.
  The eight bytes from 0x0000 to 0x00007 store the variables of structure C. The structure C has its own alignment value of 4, which is larger than the specified alignment value of 2, so the effective alignment value of C is 2. Since 8%2=0, C only occupies eight bytes from 0x0000 to 0x0007. Therefore, sizeof(struct C)=8, which fully meets the byte alignment principle. In addition to different specified alignment values causing data structures to be stored at different addresses, different compilers may also store structures differently.

4. Alignment issues on the ARM platform
   In ARM, there are two types of instructions: ARM and Thumb.
   ARM instructions: Each time an instruction is executed, the value of PC increases by 4 bytes (32 bits). To access 4 bytes of content at a time, the starting address of the byte must be at a 4-byte aligned position, that is, the lower two bits of the address are bits [0b00], which means that the address must be a multiple of 4.
   Thumb instructions: Each time an instruction is executed, the value of PC increases by 2 bytes (16 bits). To access 2 bytes of content at a time, the starting address of the byte must be at a 2-byte aligned position, that is, the lower two bits of the address are bits [0b0], which means that the address must be a multiple of 2.
    Following the above method is called the aligned method, and not following this method is called the unaligned storage access operation.

5. ARM platform byte alignment keywords
  1. __align(num) is used to modify the byte boundary of the highest level object.
   A. When using LDRD or STRD in assembly, this command __align(8) is used to modify the limit. To ensure that the data object is aligned accordingly.
   B. The maximum limit of the command to modify the object is 8 bytes, which can make a 2-byte object aligned to 4 bytes, but cannot make a 4-byte object aligned to 2 bytes.
   C. __align is a storage class modification. It only modifies the highest level type object and cannot be used for structure or function objects.
  2. __packed is a one-byte alignment.
   A. Packed objects cannot be aligned;
   B. All object read and write accesses are unaligned;
   C. float and structure unions containing float and objects that are not __packed will not be byte aligned;
   D. __packed has no effect on local integer variables;
   E. Forcing the conversion from unpacked objects to packed objects is undefined. Integer pointers can be legally defined as:

packed __packed int* p; //__packed int has no meaning.

3. __unaligned is used to modify the variable so that it can be accessed in an unaligned manner.

6. How to find problems with byte alignment. If there are alignment or assignment problems, first check:
1. The big little endian settings of the compiler;
2. Check whether the system itself supports unaligned access;
3. If it supports it, check whether the alignment is set or not. If not, check whether some special modifiers are needed during access to mark the special access operation.

VII. Conclusion
  For the data structures used locally by 32-bit processors, in order to improve memory access efficiency, four-byte alignment is used; at the same time, in order to reduce memory overhead, the positions of structure members are arranged reasonably, the gaps between members caused by four-byte alignment are reduced, and memory overhead is reduced.
  For data structures between processors, it is necessary to ensure that the length of the message does not change due to different compilation platforms and different processors. The message structure is compacted using one-byte alignment; to ensure the memory access efficiency of the data structure of messages between processors, byte padding is used to align the members in the message by four bytes.
  The position of the members of the data structure should take into account the relationship between members, data access efficiency and space utilization. The principle of sequential arrangement is: four-byte members are placed at the front, two-byte members are immediately followed by the last four-byte member, one-byte members are immediately followed by the last two-byte member, and padding bytes are placed at the end. For example:
typedef struct tag_T_MSG{
long ParaA;
long ParaB;
short ParaC；
char ParaD;
char Pad;
} T_MSG;

Keywords：ARM Reference address：ARM memory alignment summary

Previous article：ARM root file system production
Next article：Very good ARM instruction set analysis

Recommended ReadingLatest update time:2024-11-15 18:23

Porting Embedded Linux to ARM Processor S3C2410: Operating System

In the article "C Language Embedded System Programming Practice" written by the author, the main software architecture described is a single-task platform without an operating system, while this article focuses on describing the software architecture embedded with an operating system. The difference between the two is

[Microcontroller]

Porting Embedded Linux to ARM Processor S3C2410: Operating System

Nvidia and ARM's deal has sounded the alarm for the industry

The deal to sell ARM to Nvidia for $40 billion has caused a lot of discussion in the industry, but some of ARM's partners have remained silent. "This is shocking news and spells trouble for many manufacturers," said Mark Lippett, chief executive of XMOS processor design in Bristol, England. "Contrary to traditional

[Embedded]

Nvidia and ARM's deal has sounded the alarm for the industry

Wu Xiong'ang responded to the dispute over ARM's control in China: There is no conflict of interest, and ARM has no right to remove him

In his first interview with international media, ARM China CEO Wu Xiong'ang defended his pursuit of control over ARM's Chinese business, according to the Financial Times. New details are emerging about Wu's $100 million personal investment fund that led to a dispute with ARM and its backers. Wu Xiong'ang said that A

[Mobile phone portable]

ARM series STM32F103RCT6 development

A brief introduction to the STM32F1XX series. It can have RTOS, a real-time operating system, or it can be used directly as a single-chip microcomputer. Functions include adc, dac, bkp, can, cec, crc, dbgmcu, dma, exti, flash, fsmc, gpio, i2c, iwdg, pwr, rcc, rtc, sdio, spi, tim, usart, wwdg, misc The functions of e

[Microcontroller]

ARM MCU Development Experience

Introduction: Some simple routines in ARM development Calculate the absolute value of r1 and r2 The corresponding C pseudo-instructions are: if ( r1 r2 ) r0 = r1+r2; else r0 = r1-r2; The corresponding ARM assembly is: cmp r1, r2 subgt r0, r1, r2 sublt r0, r2, r1 The above conditional judgment is only suitable

[Microcontroller]

ARM driver linux kernel interrupt programming

The first part is to get interrupts (enable hardware interrupts) 1. Cancellation of interrupted applications: 1) Interrupted applications int request_irq(unsigned int irq, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id) 2) Interrupted logout void free_irq(unsigned int irq, v

[Microcontroller]

Learn ARM series I2C bus easily

Today we are going to learn about I2C devices in ARM bare metal, including GPIO and IIC controllers. 1. Overall Architecture Diagram 2. General steps of I2C operation 2.1 View the I2C related hardware schematics of the s5pv210 development board The actual connection circuit of IIC on the development board, AT2

[Microcontroller]

6. Get to know the ARM family

Let’s look at a picture first: Figure 1-1: Figure 1-1 It seems that the above are all related to ARM, but what are they respectively? In fact, the above nouns can be divided into three categories: Chip: 6410, 210, 2440. ARM core: arm11, A8, arm9 Instruction archit

[Microcontroller]

Popular Resources
Popular amplifiers