All data in the computer must be placed in the memory. Different types of data occupy different numbers of bytes. For example, int occupies 4 bytes and char occupies 1 byte. In order to access this data correctly, each byte must be numbered, just like a house number or ID number. Each byte is uniquely numbered, and a certain byte can be accurately found based on the number.
The following figure shows the number of each byte in 4G memory (expressed in hexadecimal):
We call the number of bytes in memory an address or a pointer . The address increases from 0. For a 32-bit environment, the program can use 4GB of memory, the smallest address is 0, and the largest address is 0XFFFFFFFF.
The following code demonstrates how to output an address:
- #include <stdio.h>
- int main(){
- int a = 100;
- char str[20] = "c.biancheng.net";
- printf("%#X, %#X\n", &a, str);
- return 0;
- }
Running results:
0X28FF3C, 0X28FF10
%#X means output in hexadecimal format with a prefix of 0X. a is a variable used to store integers. You need to add & in front of it to get its address. str itself indicates the first address of the string, so you don't need to add &.
There is a control symbol %p in C language, which is specifically used to output the address in hexadecimal form. However, the output format of %p is not uniform. Some compilers have a 0x prefix, while others do not, so we did not use it here.
Everything is an address. C language uses variables to store data and functions to define a reusable code. They must eventually be placed in memory for use by the CPU.
Data and code are stored in memory in binary form. The computer cannot distinguish whether a block of memory stores data or code from the format. When the program is loaded into the memory, the operating system will assign different permissions to different memory blocks. The memory blocks with read and execute permissions are code, and the memory blocks with read and write permissions (or only read permissions) are data.
The CPU can only obtain the code and data in the memory through the address. During the execution process, the program will tell the CPU the code to be executed and the address of the data to be read and written. If the program makes a mistake accidentally, or the developer does it intentionally, giving the CPU an address of the code area when it wants to write data, a memory access error will occur. This memory access error will be intercepted by the hardware and operating system, forcing the program to crash, and the programmer has no chance to save it. What the
CPU needs to access memory is the address, not the variable name and function name! The variable name and function name are just a mnemonic for the address. When the source file is compiled and linked into an executable program, they will be replaced with the address. An important task in the compilation and linking process is to find the addresses corresponding to these names.
Assuming that the addresses of variables a, b, and c in memory are 0X1000, 0X2000, and 0X3000 respectively, the addition operation c = a + b; will be converted into a form similar to the following:
0X3000 = (0X1000) + (0X2000);
( ) indicates a value-taking operation. The whole expression means to take out the values at addresses 0X1000 and 0X2000, add them together, and assign the result of the addition to the memory at address 0X3000.
The variable name and function name provide us with convenience, allowing us to use easy-to-read and understand English strings in the process of writing code, without having to face binary addresses directly, which is simply a scene that makes people collapse.
It should be noted that although the variable name, function name, string name and array name are essentially the same, they are all mnemonics of addresses, but in the process of writing code, we think that the variable name represents the data itself, while the function name, string name and array name represent the first address of the code block or data block.
|