In C language, please remember to initialize local variables
[Copy link]
How many uninitialized local variables are there in a language?
The answer is often:
It depends on the compiler.
It may be initialized to 0, but it is not guaranteed
. It is undetermined.
In short, all of them are serious metaphysical answers, which is annoying.
Whenever someone talks endlessly about compilers, C libraries, and processor architectures but cannot give you a real scenario to reproduce the problem, this person is probably talking nonsense.
In fact, this question itself is the wrong way to ask it. If we talk about it in full, it can take 100,000 words. As long as we can determine its specific behavior in a specific scenario, it will be OK. Of course, this requires designing a relatively OK experiment.
Before demonstrating an actual code behavior, let's first give a piece of knowledge. The CPU does not recognize variables, let alone recognize the names of variables. The CPU will only take values from or store values in specific memory locations. Therefore, when asking what the value of a variable is, you must know where the value corresponding to this variable is stored.
Let's look at the following code:
#include <stdio.h>
void func1(){ int a; printf("func1:%d\n", a); a = 12345;}
void func2(){ int b; printf("func2:%d\n", b);}
void func4(){ int d; printf("func3:%d\n", d);}
void func3(){ int c; printf("func3:%d\n", c); c = 54321; func4();}
void test_call(){ func3();}
int main(int argc, char **argv){ func1(); func2();
test_call();}
We have a total of 4 functions, func1 to func4, each of which has an uninitialized local variable. What are their values?
For this kind of local variables, their values depend on:
the position of the variable in the stack.
Whether the stack position corresponding to the variable has been stored before.
As you can see, the first point above marks a memory location, and the second point is the behavior of the code, that is, as long as there is code to store the corresponding location, and the subsequent code does not reset the value of the location, the location will retain the value that was originally stored.
Verification is very simple, just try it and you will know:
[root@localhost test]# ./a.outfunc1:0func2:12345func3:0func3:0 According to the changes in the function call stack frame, the local variables a of func1 and the local variables b of func2 are obviously located in the same location. When func1 is called, this is a new memory (there may be a stack frame that has reached this location before entering main). The value of a depends on the initial value of the offset corresponding to the page called into the memory at this location, which depends on the operating system:
the operating system may clear the page to zero page when allocating it to the program page.
Stack allocation does not involve the C library, and obviously does not involve the behavior of the C library here, but memory allocated by malloc does involve the C library.
Printing results show that the value of a is 0, so we think that the operating system returns the zero page to the application. Next, in func1, it is assigned a value of 12345 and the function returns. Next, when func2 is called, the stack frame is rebuilt at the stack frame position where func1 has exited, and the corresponding position is still 12345.
I did not see any code instructions to clear the stack to 0 after the ret operation of func1. Considering efficiency, there should not be such instructions.
Looking at the test_call function again, it is obvious that func3 and func4 do not use the same stack frame for calling, so even if c is assigned a value of 54321 in func3, it will not affect the value d at the corresponding position of the stack frame of func4 above its stack frame. Therefore, the initial values of c and d remain 0.
So, what is the difference between initializing a local variable and not initializing a local variable at the instruction level?
0x0,%eax
4005c4: e8 00 00 00 mov $0x0,%eax 4005c5: 8b 45 fc mov -0x4(%rbp),%eax 4005c6: 8b 45 fc mov -0x4(%rbp),%eax 4005c7: 8b 45 fc mov -0x8(%rbp),%eax 4005c8: 89 c6 mov %eax,%esi 4005ba: bf 90 07 40 00 mov $0x400790,%edi 4005bf: b8 00 00 00 00 mov $0x0,%eax 4005c9: e8 b7 fe ff ff callq 400480 < printf@plt > 4005c9:
c7 45 fc 39 30 00 00 movl $0x3039,-0x4(%rbp) 4005d0: c9 leaveq 4005d1: c3 retqLet's look at the version which initializes local variable a to 2222: // int a = 2222;00000000004005ad <func1>: 4005ad: 55 push %rbp 4005ae: 48 89 e5 mov %rbp,%rbp 4005b1: 48 83 ec 10 sub $0x10,%rsp 4005b5: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 4005bc: 8b 45 fc mov -0x4(%rbp),%eax 4005bf: 89 c6 mov %eax,%esi 4005c1: bf 90 07 40 00 mov $0x400790,%edi 4005c6: b8 00 00 0 0 00 mov $0x0,%eax 4005cb: e8 b0 fe ff ff callq 400480 < printf@plt > 4005d0: c7 45 fc 39 30 00 00 movl $0x3039,-0x4(%rbp) 4005d7: c9 leaveq 4005d8: c3 retq is only one instruction away: 4005b5: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
The initialization operation is completed by real instructions.
In summary, when a function returns and pops out the current stack frame, it does not clean up the data it left in the stack frame. When the next function call reuses the memory of the stack frame again, the uninitialized local variables will be affected by the leftover data and become uncertain!
So, remember to initialize your local variables.
|