A brief talk about RISC-V GCC: Linker script learning notes (I)

Moiiiiilter

A brief talk about RISC-V GCC: Linker script learning notes (I) [Copy link]

When we use RISC-V GCC for embedded development, we have to deal with startup files and link files. This article records some learning notes related to link scripts.

1. Basic concepts

The main purpose of a linker script is to describe how the sections in the input files should be mapped into the output file and to control the memory layout of the output file. Most linker scripts perform similar functions. However, if necessary, a linker script can also instruct the linker to do many other things using the commands described below.

The linker normally uses a linker script. If one is not provided, the linker will use a default script compiled inside the linker executable. The default linker script can be displayed using the command '--verbose' .

To describe the linker script language, we need to define some basic concepts and vocabulary.

The linker combines many input files into one output file. The output file and each input file have a specific format known as the object file format. Each file is called an object file. The output file is usually called an executable file, but we still call them object files. Each object file has, among other things, a list of segments. The segments of the input files are sometimes called input segments, and similarly, the segments of the output file are called output segments.

Each segment in an object file has a name and size. Most segments also have an associated block of data, called the segment contents. A segment may be marked as loadable, meaning that the segment contents need to be loaded into memory before the output file is run. A segment with no contents may be an allocatable segment, which means that a space is reserved in memory (sometimes cleared to zero). A segment that is neither loadable nor allocatable usually contains some debugging information.

Each loadable or allocatable output section has two addresses. The first address is the VMA , or virtual address. This is the address the section will have when the output file is run. The second address is the LMA , or load memory address. This is the address where the section will be loaded. An example of where they make a difference is when a data section is loaded into ROM and then later copied to RAM when the program starts (this technique is often used to initialize global variables). In this case, the ROM uses the LMA address and the RAM uses the VMA address.

If you want to view the segments in the object file, you can use the '-h' option of the objdump program .

Each object file also has a list of symbols, called a symbol list. A symbol may be defined or undefined. Each symbol has a name, and all defined symbols have an address among other information. If a C or C++ program is compiled into an object file, all defined functions and global and static variables are treated as defined symbols. All undefined functions or global variables referenced by all input files become undefined symbols.

2. Common keywords and usage

ENTRY(symbol) is used to specify the entry point of program execution

MEMORY memory allocation command

The SECTIONS section command describes the memory and layout of the output file.

.text program code segment

.rodata read-only data

.data is readable and writable and needs to be initialized

.bss can read and write zero initialization data

ASSERT

PROVIDE(symbol=expression) defines a symbol

AT followed by the memory area or address defined by MEMORY

ALIGN byte alignment

3. MEMORY

The linker defaults allow all available memory to be allocated. You can override this with the MEMORY command.

The MEMORY command describes the location and size of a memory block in the target. You can use it to describe memory areas that may be used by the linker, and those that must be avoided. You can then place segments into specific memory areas. The linker will set segment addresses based on the memory area and will generate warning messages if the area is becoming full. The linker will not mess up the order of segments to better fit them into memory areas.

A linker script may contain many MEMORY directives, but all defined memory blocks are treated as if they were defined in a single MEMORY directive. The syntax of MEMORY is:

 MEMORY

   {

     name [(attr)] : ORIGIN = origin, LENGTH = len

     ...

   }

name is the name the linker script uses to refer to the memory region. Region names have no meaning outside the linker script. Region names are stored in a separate name space and do not conflict with symbolic names, file names, or section names. Each memory region must have a distinct name in the MEMORY command. However, you can later add aliases to existing memory regions using the REGION_ALIAS command.

The attr string is an optional list of attributes that determine whether an input section that is not explicitly mapped in the script should use a specific memory region. As explained in SECTIONS , if you do not specify an output section for an input section, the linker will create an output section with the same name as the input section. If you define region attributes, the linker will use them to determine the memory region in which the created output section should be placed.

The attr string can only contain the following characters:

' R ' Read-only segment

' W ' read-write segment

' X ' executable segment

' A ' allocatable segment

' I ' Initialized segment

' L ' is similar to ' I '

' ! ' Inverts all the attributes that follow it

If an unmapped section matches one of the above attributes except ' ! ', it will be placed in the memory region. The ' ! ' attribute negates the test, so an unmapped section will be placed in the memory region only if it does not match any of the attributes listed above.

origin is a numeric expression representing the starting address of the memory area. The expression must be equivalent to a constant and cannot contain any symbols. The keyword ORIGIN can be shortened to org or o (but not ORG ).

len is an expression giving the size in bytes of the memory region. Similar to the origin expression, the expression must be a numeric value and must evaluate to a constant. The keyword LENGTH can be abbreviated to len or l .

In the following example, we have specified two allocatable memory regions: one starting at ' 0 ' with 256k bytes, and the other starting at ' 0x40000000 ' with 4 megabytes. The linker places all segments that are not explicitly mapped to a memory region into the ' rom ' memory region. Segments can be read-only or executable. The linker will place other segments that are not explicitly mapped to a memory region into the ' ram ' memory region.

MEMORY

   {

     rom (rx)  : ORIGIN = 0, LENGTH = 256K

     ram (!rx) : org = 0x40000000, l = 4M

   }

Once you have defined a memory region, you can direct the linker to place a particular output section in that memory region using the ' >region ' output section attribute. For example, if you have a memory region called ' mem ', you can use ' >mem ' in the output section definition . See Output Section Region . If no address is given to an output section, the linker will place the address in the first available address in the memory region that meets the requirements. If the combined output section directed to a memory region is larger than the region, the linker will issue an error.

The starting address and length of the memory area can be obtained through the ORIGIN(memory) and LENGTH(memory) functions:

_fstack = ORIGIN(ram) + LENGTH(ram) - 4;

4. Segment Description

4.1 Output Section

The complete output section description is as follows

 section [address] [(type)] :

   [AT(lma)]

   [ALIGN(section_align) | ALIGN_WITH_INPUT]

   [SUBALIGN(subsection_align)]

   [constraint]

   {

     output-section-command

     output-section-command

     ...

   } [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp] [,]

address is an expression for the output segment VMA (virtual address). This address is optional, but if given, the output address is set exactly to the given value.

If the output address is not given, an address is chosen as follows. This address is adjusted to match the alignment requirement of the output. The alignment requirement of the output section is the strictest of the alignment requirements of the input sections.

The output segment address is explored as follows:

If a memory region is set for the segment, the segment is placed in that region, and the segment address is the next free location in the region.

If a list of memory regions is created using the MEMORY command, the first region whose attributes match the segment is chosen to load the segment, and the segment address is the next free location in the region. See MEMORY .

If no memory region is specified, or if no matching segment is found, the output address will be based on the current location counter value.

4.2 Input Segment

The input segment exists in the content of the output segment and is used to specify the location of different input segments in the output segment. Common ones are .text .data .rodat .bss COMMOM , etc. An input segment description consists of an optional file name list enclosed in parentheses following the segment name. Wildcards can also be used, for example

*main.o(.text) or directly *(.text)

The first one represents all .text segments in the main.o file, and the second one represents all .text segments in the linked files. Of course, some files can also be excluded.

EXCLUDE_FILE (* file name.o ) *(.text)

5. Some built-in functions

ABSOLUTE(exp)

Returns the absolute (non-reallocatable, not non-negative) value of expression exp . This is mainly used to assign an absolute value to a symbol within a segment definition, where the symbol values are usually relative to the segment address.

ADDR(section)

Returns the address ( VMA ) of the section named 'section' . Your script must have previously defined the location of this section. In the following example, start_of_output_1, symbol_1, symbol_2 are assigned the same value, except that symbol_1 is relative to section .output1 while the other two are absolute values:

SECTIONS { ...

        .output1 :

          {

          start_of_output_1 = ABSOLUTE(.);

          ...

          }

        .output :

          {

          symbol_1 = ADDR(.output1);

          symbol_2 = start_of_output_1;

          }

      ... }

LENGTH(memory)

Returns the length of the memory named memory .

MAX(exp1, exp2)

Returns the maximum of exp1 and exp2

MIN(exp1, exp2)

Returns the minimum of exp1 and exp2 .

ORIGIN(memory)

Returns the starting address of the memory area named memory .

SIZEOF(section)

Returns the number of bytes in the named section . If the section has not yet been allocated and this function is evaluated, an error will occur.

freebsder

Thank you for sharing, gcc's linker script is very powerful

Jacktang

Proficiency in linking scripts is a must for experts

bigbat · Published on 2021-11-13 09:10

Thanks for sharing. The basic connection is really the basic skill of the underlying system.

lugl4313820

It looks a bit boring, it seems I need to study harder!

le062

Just in time, thank you

A brief talk about RISC-V GCC: Linker script learning notes (I) [Copy link]

Latest reply

Visited sections