Let's learn about the past and present of assembly language!

Latest update time：2024-10-22

Reads：

Click on the blue " Linux " in the upper left corner and select " Set as Star "

Read the useful articles first 

☞【Dry Goods】Learning Path for Embedded Driver Engineers
☞ [Dry Goods] Linux Embedded Knowledge Points - Mind Map - Free Access
☞【Employment】A comprehensive Linux-based IoT project that can be included in your resume
☞【Employment】Resume template for job search

Learning programming is actually learning high-level languages, that is, computer languages designed for humans.

However, computers do not understand high-level languages and must be converted into binary code by a compiler before they can run. Learning high-level languages does not mean understanding the actual operating steps of a computer.

What computers can really understand is low-level language, which is specifically used to control hardware. Assembly language is a low-level language that directly describes/controls the operation of the CPU. If you want to understand what the CPU does and the steps of code operation, you must learn assembly language.

Assembly language is not easy to learn, and even a concise introduction is hard to find. Below I will try to write the most understandable assembly language tutorial to explain how the CPU executes code.

1. What is assembly language?

We know that the CPU is only responsible for calculations and does not have intelligence. You enter an instruction, it runs once, and then stops and waits for the next instruction.

These instructions are all binary , called operation codes (opcodes) , such as the addition instruction . The role of the compiler is to translate programs written in high-level languages into opcodes. 00000011

For humans, binary programs are unreadable and it is impossible to tell what the machine has done. In order to solve the problem of readability and occasional editing needs, assembly language was born.

"Assembly language is the text form of binary instructions" and has a one-to-one correspondence with instructions. For example, the addition instruction 00000011 is written in assembly language as ADD. As long as it is restored to binary, assembly language can be directly executed by the CPU, so it is the lowest-level language.

2. Origin

In the early days, programming was done by handwriting binary instructions, and then inputting them into the computer through various switches. For example, if you wanted to add, you would press the addition switch. Later, a paper tape puncher was invented, which automatically input binary instructions into the computer by punching holes in paper tape.

To solve the readability problem of binary instructions, engineers wrote those instructions in octal. Converting binary to octal is easy, but octal is not readable.

Naturally, it was finally expressed in words, with the addition instruction written as ADD. Memory addresses were no longer referenced directly, but instead represented by labels.

In this case, there is an extra step to translate these text instructions into binary. This step is called assembling , and the program that completes this step is called assembler . The text it processes is naturally called assembly code . After standardization, it is called assembly language, abbreviated asm, which is translated into assembly language in Chinese.

The machine instructions of each CPU are different, so the corresponding assembly language is also different. This article introduces the most common x86 assembly language , which is the one used by Intel's CPU.

3. Register

To learn assembly language, you must first understand two points: registers and memory model .

Let's look at registers first. The CPU itself is only responsible for calculations, not for storing data. Data is generally stored in memory, and the CPU reads and writes data to memory when it needs it.

However, the CPU's computing speed is much higher than the memory's read and write speed. In order to avoid being slowed down, the CPU has its own L1 cache and L2 cache. Basically, the CPU cache can be regarded as a memory with faster read and write speeds.

However, the CPU cache is still not fast enough. In addition, the address of the data in the cache is not fixed, and the CPU has to seek the address every time it reads and writes, which slows down the speed.

Therefore, in addition to the cache, the CPU also has its own registers to store the most frequently used data. In other words, the most frequently read and written data (such as loop variables) will be placed in the registers, and the CPU will read and write registers first, and then the registers will exchange data with the memory.

Registers do not distinguish data by address, but by name. Each register has its own name. We tell the CPU which specific register to get data from, which is the fastest. Some people compare registers to the zero-level cache of the CPU.

4. Types of Registers

The early x86 CPUs only had 8 registers, and each had a different purpose. Now there are more than 100 registers, all of which have become general-purpose registers with no specific purpose, but the names of the early registers have been preserved.

Among the above 8 registers, the first seven are general purpose. The ESP register has a specific purpose, which is to store the current Stack address (see the next section for details).

We often see names like 32-bit CPU and 64-bit CPU, which actually refer to the size of the registers. The register size of a 32-bit CPU is 4 bytes.

5. Memory Model: Heap

Registers can only store a small amount of data. Most of the time, the CPU has to command registers to exchange data directly with memory. Therefore, in addition to registers, you must also understand how memory stores data.

When a program is running, the operating system will allocate a section of memory to it to store the program and the data generated by the operation. This section of memory has a starting address and an ending address, such as from 0x1000 to 0x8000 , the starting address is the smaller address, and the ending address is the larger address.

During the program running, for dynamic memory usage requests (such as creating new objects or using malloc the command), the system will allocate a portion of the pre-allocated memory to the user. The specific rule is to start the allocation from the starting address (in fact, there will be a static data at the starting address, which is ignored here).

For example, if the user requests 10 bytes of memory, 0x1000 it will be allocated from the starting address until the address is reached 0x100A . If the user requests 22 bytes, it will be allocated 0x1020 .

This memory area that is divided due to the user's active request is called a heap . It starts from the starting address and grows from low bits (addresses) to high bits (addresses). An important feature of the heap is that it will not disappear automatically and must be released manually or recycled by the garbage collection mechanism.

6. Memory Model: Stack

In addition to the Heap, the other memory usage is called the Stack . Simply put, the Stack is a memory area temporarily occupied by a function.

Please see the example below.

int main() 
{
    int a = 2;
    int b = 3;
}

In the above code, when the system starts to execute the function, it will create a frame main for it in the memory , and all the internal variables of the function (such as and ) are stored in this frame. After the function is executed, the frame will be recycled, all internal variables will be released, and no space will be occupied. mainabmain

What happens if a function calls other functions?

int main() 
{
   int a = 2;
   int b = 3;
   return add_a_and_b(a, b);
}

In the above code, the function main is called inside the function add_a_and_b . When this line is executed, the system will also add_a_and_b create a new frame for to store its internal variables. In other words, there are two frames at the same time: main and add_a_and_b . Generally speaking, there are as many frames as there are layers in the call stack.

When add_a_and_b the execution is finished, its frame will be recycled, and the system will return to main the place where the function was interrupted and continue to execute. Through this mechanism, the function can be called layer by layer, and each layer can use its own local variables.

All frames are stored in the Stack. Since frames are stacked layer by layer, the Stack is called a stack. Generating a new frame is called "pushing into the stack", which is called "pushing" in English; recycling the stack is called "popping out of the stack", which is called "popping" in English. The characteristic of the Stack is that the frame pushed into the stack the latest is popped out the earliest (because the innermost function call ends the execution first), which is called the "last in, first out" data structure.

Each time a function is executed, a frame is automatically released. When all functions are executed, the entire Stack is released.

The stack is allocated from the end address of the memory area, from high to low. For example, if the end address of the memory area is 0x8000 , the first frame is assumed to be 16 bytes, then the next allocated address will 0x7FF0 start from ; the second frame is assumed to require 64 bytes, then the address will be moved to 0x7FB0 .

7. CPU Instructions

7.1 An Example

After understanding the register and memory model, we can look at what assembly language is. Below is a simple program example.c .

int add_a_and_b(int a, int b) {
   return a + b;
}

int main() {
   return add_a_and_b(2, 3);
}

gcc converts this program into assembly language.

$ gcc -S example.c

After the above command is executed, a text file will be generated example.s , which contains assembly language and dozens of lines of instructions. Let's put it this way, a simple operation in a high-level language may consist of several or even dozens of CPU instructions at the bottom. The CPU executes these instructions in sequence to complete this operation.

example.s After simplification, it looks like this.

_add_a_and_b:
   push   %ebx
   mov    %eax, [%esp+8] 
   mov    %ebx, [%esp+12]
   add    %eax, %ebx 
   pop    %ebx 
   ret  

_main:
   push   3
   push   2
   call   _add_a_and_b 
   add    %esp, 8
   ret

It can be seen that the two functions of the original program add_a_and_b and main , correspond to two labels _add_a_and_b and _main . Each label contains the CPU operation process converted from the function.

Each line is an operation performed by the CPU. It is divided into two parts, and we will take one of the lines as an example.

push   %ebx

This line push contains the CPU instruction and the operators %ebx used by the instruction . A CPU instruction can have zero or more operators.

I will explain this assembly program line by line below . It is recommended that readers copy this program in another window to avoid scrolling up the page when reading.

7.2 push instruction

According to the convention, the program starts executing from _main the label, at which time a frame will be created for on the Stack main , and the address pointed to by the Stack will be written to the ESP register. If there is data to be written to this frame later main , it will be written to the address saved by the ESP register.

Then, start executing the first line of code.

push   3

push The instruction is used to put the operator into the Stack, which is to 3 write into main this frame.

Although it looks simple, push the instruction actually has a prefix operation. It first takes the address in the ESP register, subtracts 4 bytes from it, and then writes the new address into the ESP register.

Subtraction is used because Stack develops from high to low, and 4 bytes are used because 3 the type of int occupies 4 bytes. After getting the new address, 3 will be written to the four bytes starting from this address.

push   2

The second line is the same, push the instruction will 2 write into main this frame, right next to the previous one 3 . At this point, the ESP register will be subtracted by another 4 bytes (a total of 8).

7.3 call instruction

The instruction on the third line call is used to call the function.

call   _add_a_and_b

The above code means calling add_a_and_b the function. At this time, the program will find _add_a_and_b the label and create a new frame for the function.

Now let’s start executing _add_a_and_b the code.

push   %ebx

This line means writing the value in the EBX register into _add_a_and_b this frame. This is because the register will be used later, so the value in it is taken out first and then written back after use.

At this time, push the instruction will subtract 4 bytes from the address in the ESP register (a total of 12).

7.4 mov instruction

mov Instructions are used to write a value into a register.

mov    %eax, [%esp+8]

This line of code means first adding 8 bytes to the address in the ESP register to get a new address, and then taking data from the Stack according to this address. Based on the previous steps, we can infer that what is taken out here 2 is then 2 written into the EAX register.

The next line of code does the same thing.

mov    %ebx, [%esp+12]

The above code adds 12 bytes to the value of the ESP register, and then takes out the data from the Stack according to this address. This time, it takes out the data 3 and writes it to the EBX register.

7.5 add instruction

add The instruction is used to add two operands and write the result to the first operand.

add    %eax, %ebx

The above code adds the value of the EAX register (ie 2) to the value of the EBX register (ie 3), obtaining the result 5, and then writes this result to the first operator EAX register.

7.6 pop instruction

pop The instruction is used to retrieve the most recently written value in the Stack (i.e. the value at the lowest address) and write this value to the location specified by the operator.

pop    %ebx

The above code means taking out the most recently written value of the Stack (that is, the original value of the EBX register) and then writing this value back to the EBX register (because the addition has been completed, the EBX register is not used).

Note that pop the instruction also adds 4 to the address in the ESP register, which means reclaiming 4 bytes.

7.7 ret instruction

ret The instruction is used to terminate the execution of the current function and return the execution right to the upper function. In other words, the frame of the current function will be recycled.

ret

As you can see, this instruction has no operators.

As add_a_and_b the function terminates execution, the system returns to main the point where the function was interrupted and continues execution.

add    %esp, 8

The above code indicates that the address in the ESP register is manually added with 8 bytes, and then written back to the ESP register. This is because the ESP register is the write start address of the Stack. The previous pop operation has recycled 4 bytes, and recycling 8 bytes here is equivalent to recycling all.

ret

Finally, main the function ends and ret the instruction exits program execution.

8. Reference Links

Introduction to reverse engineering and Assembly, by Youness Alaoui
x86 Assembly Guide, by University of Virginia Computer Science

Copyright Statement: This article is reproduced from the Internet. The copyright belongs to the original author. If there is any infringement, please contact us to delete it!

Source: http://www.ruanyifeng.com/blog

end

A bite of Linux

Follow and reply【 1024 】 to get a large amount of Linux information

Collection of wonderful articles

Latest articles about

■There must be a teacher among three people. I recommend several embedded experts with good technology.

■LAN transmission artifact—LocalSend

■A good summary of shell scripts

■C language example_parsing GPS source data

■How to emulate mdio communication using gpio?

■Kingsoft C++ first round, full of intensity! With answers, recommended to save! !

■Fans asked: What should I do if the network packet loss is serious? Actually, it's just like that!

■Are you still using top htop? Switch to btop now, it’s awesome!

■Still worried about slow data transfer? Linux zero-copy technology can help you!

■Hitting the pain points! Huawei Linux experts carefully compiled the interview questions, just one click to get them!