C language compilation process

可乐zzZ · Published on 2023-11-28 22:01

C language compilation process [Copy link]

The compilation and linking process of C language is to convert the source code of a C program we wrote into a program (executable code) that can run on hardware, which requires compilation and linking. The process diagram is as follows:

This article explains the work done during the C language compilation process, which is helpful for us to understand the working process of header files, libraries, etc. Moreover, a clear understanding of the compilation and linking process will also be of great help to us in locating errors during programming and mobilizing the compiler's detection errors as much as possible during programming.

Compile

Compilation is to read the source program (character stream), perform lexical and grammatical analysis on it, and convert high-level language instructions into functionally equivalent assembly code. The compilation process of the source file includes two main stages: preprocessing and compilation optimization.

Preprocessing

The first phase is the preprocessing phase , which takes place before the actual compilation phase. The preprocessing phase modifies the contents of the source file based on the preprocessing directives that have been placed in the file. For example, the #include directive is a preprocessing directive that adds the contents of a header file to the .cpp file. This method of modifying the source file before compilation provides a lot of flexibility to accommodate the limitations of different computer and operating system environments. The code required for one environment may be different from the code required for another environment because of the different hardware or operating system available. In many cases, the code for different environments can be placed in the same file and then modified during the preprocessing phase to adapt it to the current environment.

The main processing aspects are as follows:

Macro definition instructions, such as #define a b

For this pseudo-instruction, the pre-compiler replaces all a in the program with b, but a as a string constant is not replaced. There is also #undef, which cancels the definition of a macro so that the string will not be replaced in the future.

Conditional compilation directives, such as #ifdef, #ifndef, #else, #elif, #endif, etc.

The introduction of these pseudo instructions allows programmers to define different macros to determine which codes the compiler will process. The precompiler will filter out unnecessary codes based on the relevant files.

Header files contain directives such as #include "FileName" or #include <FileName>.

In header files, a large number of macros (most commonly character constants) are defined using the pseudo-instruction #define, and declarations of various external symbols are also included. The purpose of using header files is mainly to make certain definitions available for use by multiple different C source programs. Because in the C source program that needs to use these definitions, you only need to add a #include statement, and you don't have to repeat these definitions in this file. The precompiler will add all the definitions in the header file to the output file it generates for the compiler to process. The header files included in the C source program can be provided by the system, and these header files are generally placed in the /usr/include directory. Use angle brackets (< >) to #include them in the program. In addition, developers can also define their own header files, which are generally placed in the same directory as the C source program. In this case, double quotes ("") should be used in #include.

Special symbols. The precompiler can recognize some special symbols.

For example, the LINE symbol that appears in the source program will be interpreted as the current line number (decimal number), and FILE will be interpreted as the name of the C source program currently being compiled. The precompiler will replace these strings that appear in the source program with appropriate values.

The precompiler basically performs the work of "replacing" the source program. After this replacement, an output file is generated without macro definitions, conditional compilation instructions, and special symbols. The meaning of this file is the same as that of the source file that has not been preprocessed, but the content is different. In the next step, this output file will be translated into machine instructions as the output of the compiler.

Compile and optimize

The second stage is the compilation and optimization stage . The output file obtained after pre-compilation contains only constants, such as numbers, strings, variable definitions, and C language keywords, such as main, if, else, for, while, {,}, +, -, *, \, etc.

The job of the compiler is to perform lexical analysis and syntactic analysis, and after confirming that all instructions conform to the grammatical rules, translate them into equivalent intermediate code representation or assembly code.

Optimization is a relatively difficult technology in the compilation system. The issues involved are not only related to the compilation technology itself, but also have a lot to do with the hardware environment of the machine. One part of the optimization is the optimization of the intermediate code. This kind of optimization does not depend on the specific computer. Another kind of optimization is mainly carried out for the generation of the target code.

For the former optimization, the main work is to delete common expressions, loop optimization (code extraction, strength reduction, change loop control conditions, merging of known quantities, etc.), copy propagation, and delete useless assignments, etc.

The latter type of optimization is closely related to the hardware structure of the machine. The most important consideration is how to make full use of the values of the relevant variables stored in the various hardware registers of the machine to reduce the number of memory accesses. In addition, how to make some adjustments to the instructions according to the characteristics of the machine hardware execution instructions (such as pipeline, RISC, CISC, VLIW, etc.) to make the target code shorter and the execution efficiency higher is also an important research topic.

compilation

Assembly actually refers to the process of translating assembly language code into target machine instructions. For each C language source program processed by the translation system, it will eventually go through this process to obtain the corresponding target file. The target file stores the target machine language code equivalent to the source program. The target file consists of segments. Usually there are at least two segments in a target file:

Code segment: This segment mainly contains program instructions.

This segment is normally readable and executable, but normally not writable.

Data segment: mainly stores various global variables or static data used in the program. Generally, data segments are readable, writable, and executable.

There are three main types of object files in the UNIX environment:

Relocatable files

It contains code and data suitable for linking with other object files to create an executable or shared object file.

Shared object files

This file contains code and data suitable for linking in two contexts: the first is that the linker can process it with other relocatable files and shared object files to create another object file; the second is that the dynamic linker combines it with another executable file and other shared object files to create a process image.

Executable files

It contains a file that can be executed by a process created by the operating system. The assembler actually generates the first type of target file. For the latter two, some other processing is required to obtain them, which is the work of the linker.

Linking process

The object file generated by the assembler cannot be executed immediately and may contain many unresolved issues.

For example, a function in a source file may reference a symbol (such as a variable or function call, etc.) defined in another source file; a function in a library file may be called in the program, etc. All these problems need to be processed by the linker to be resolved.

The main task of the linker is to connect related target files to each other, that is, to connect the symbol referenced in one file with the definition of the symbol in another file, so that all these target files become a unified whole that can be loaded and executed by the operating system.

Depending on the linking method specified by the developer for the same library function, the linking process can be divided into two types:

Static Linking

In this linking mode, the code of the function will be copied from the static link library where it is located to the final executable program. In this way, when the program is executed, these codes will be loaded into the virtual address space of the process. The static link library is actually a collection of target files, each of which contains the code of one or a group of related functions in the library.

Dynamic Linking

In this way, the function code is placed in a target file called a dynamic link library or shared object. What the linker does at this time is just to record the name of the shared object and other small amounts of registration information in the final executable program. When this executable file is executed, the entire content of the dynamic link library will be mapped into the virtual address space of the corresponding process at runtime. The dynamic linker will find the corresponding function code based on the information recorded in the executable program.

For function calls in executable files, dynamic linking or static linking can be used. Using dynamic linking can make the final executable file shorter and save some memory when the shared object is used by multiple processes, because only one copy of the shared object code needs to be saved in memory. However, using dynamic linking is not necessarily superior to using static linking. In some cases, dynamic linking may cause some performance damage.

丨The article is organized to spread relevant technologies, the copyright belongs to the original author丨

丨If there is any infringement, please contact us to delete丨

bellwind

Preprocessing is done before compilation

heleijunjie72 · Published on 2023-11-28 22:01

Understanding the program compilation process helps to deepen the understanding and application of programming thinking

C language compilation process [Copy link]

Latest reply