Learning experience of memory management under Linux (Part 2)-EEWORLD

Collect

Following the above knowledge, this article mainly summarizes the following aspects of knowledge:

1. Concepts of physical address pages, zones, etc.

2. Functions of kernel using memory

3. Allocating bytes and allocating pages

1. Area and Page

The concept of page has been discussed in the previous lesson on memory management in Linux (Part 1). The kernel uses physical page (page frame) as the basic unit of allocation. The kernel uses struct page to represent physical pages in the system. The structure indicates whether the page is locked in memory, whether it is a dirty page, how many times the page is referenced, and the virtual address of the page. The key point is that the page structure is only related to physical pages and not virtual pages. The structure is only intended to describe the physical memory itself, not the data contained in the physical page.

The following explains how to address the physical page:

The memory addresses that appear in machine language instructions are all logical addresses, which need to be converted into linear addresses and then converted into physical addresses by the MMU (memory management unit in the CPU) before they can be accessed.

Let's write a simplest hello world program, compile it with gccs, and then decompile it to see the following instructions:

mov 0x80495b0, %eax

The memory address 0x80495b0 here is a logical address, which must be added with the base address of the implicit DS data segment to form a linear address. In other words, 0x80495b0 is the offset within the DS data segment of the current task.

In x86 protected mode, the segment information (segment base linear address, length, permissions, etc.) or segment descriptor occupies 8 bytes, and the segment information cannot be directly stored in the segment register (the segment register only has 2 bytes). Intel's design is that the segment descriptor is centrally stored in the GDT or LDT, and the segment register stores the index value (index) of the segment descriptor in the GDT or LDT.

In Linux, the logical address is equal to the linear address. Why do we say that? Because the linear addresses of all Linux segments (user code segment, user data segment, kernel code segment, kernel data segment) start from 0x00000000 and are 4G in length, so the linear address = logical address + 0x00000000, which means that the logical address is equal to the linear address.

As mentioned above, the logical address in Linux is equal to the linear address, so how does the linear address correspond to the physical address? As we all know, it is through the paging mechanism, specifically, it corresponds to the physical address through page table lookup.

To be precise, paging is a mechanism provided by the CPU. Linux only uses it to implement memory management based on the rules of this mechanism.

In protected mode, the highest bit PG of the control register CR0 controls whether the paging management mechanism is effective. If PG=1, the paging mechanism is effective, and the linear address can be converted to the physical address through a page table search. If PG=0, the paging mechanism is invalid, and the linear address is directly used as the physical address.

The basic principle of paging is to divide the memory into several units of fixed size, each unit is called a page, and each page contains 4k bytes of address space (to simplify the analysis, we do not consider the case of extended paging). In this way, the starting address of each page is 4k bytes aligned. In order to convert it into a physical address, we need to provide the CPU with a lookup table for the current task's linear address to the physical address, namely the page table. Note that in order to achieve a flat virtual memory for each task, each task has its own page directory and page table.

In order to save the memory space occupied by the page table, x86 converts the linear address into a physical address through two levels of lookup: the page directory and the page table.

The 32-bit linear address is divided into three parts:

The highest 10 bits are the Directory page directory table offset, the middle 10 bits are the Table page table offset, and the lowest 12 bits are the Offset byte offset within the physical page.

The size of the page directory table is 4k (just the size of a page), containing 1024 items, each of which is 4 bytes (32 bits). The content stored in the item is the physical address of the page table. If the page table in the page directory table has not been allocated, the physical address is filled with 0.

The size of the page table is also 4k, and it also contains 1024 entries, each of which is 4 bytes, and the content is the physical memory starting address of the final physical page.

Each active task must first be assigned a page directory table and the physical address of the page directory table is stored in the cr3 register. The page table can be allocated in advance or allocated when it is used.

Let's take the address in mov 0x80495b0, %eax as an example to analyze the process of converting linear address to physical address.

As mentioned earlier, the logical address in Linux is equal to the linear address, so the linear address we want to convert is 0x80495b0. The conversion process is automatically completed by the CPU, and all Linux has to do is prepare the page directory and page table required for the conversion (assuming they are already prepared, the process of allocating physical memory to the page directory and page table is very complicated, which will be analyzed later).

The kernel first fills the cr3 register with the physical address of the page directory table of the current task.

The linear address 0x80495b0 is converted to binary as 0000 1000 0000 0100 1001 0101 1011 0000. The decimal value of the top 10 bits 0000 1000 00 is 32. The CPU looks at the 32nd item in the page directory table, which stores the physical address of the page table. The decimal value of the middle 10 bits of the linear address 00 0100 1001 is 73. The 73rd item in the page table stores the physical start address of the final physical page. The physical page base address plus the lowest 12 bits of the linear address offset, the CPU finds the physical memory unit that the linear address finally corresponds to.

We know that the addressable range of user process linear addresses in Linux is 0 - 3G, so do we need to build the page table of this 3G virtual memory in advance? Generally speaking, the physical memory is much smaller than 3G, and there are many processes running at the same time, so it is impossible to build a 3G linear address page table for each process in advance. Linux uses a mechanism of the CPU to solve this problem. After the process is created, we can fill the table value of the page directory table with 0. When the CPU searches the page table, if the content of the table entry is 0, it will cause a page fault exception, and the process will be suspended. At this time, the Linux kernel can allocate a physical page through a series of complex algorithms, and fill the address of the physical page into the table entry, and the process will resume execution. Of course, the process is blinded in this process, and it still feels that it has accessed the physical memory normally.

However, pages can be used in different ways, so the kernel introduces the concept of zones and divides pages into different zones:

Linux mainly uses four areas:

ZONE_DMA: This zone contains pages that can be used to perform DMA operations.

ZONE_DMA32: Similar to ZONE_DMA, but can only be accessed by 32-bit devices.

ZONE_NORMAL: This zone contains pages that can be mapped normally.

ZONE_HIGHEM: Contains "high memory".

In this way, Linux divides the system into zones, forming different memory pools for different purposes. Note: The concept of zones is just the allocation of the kernel, and the physical memory itself cannot be allocated in this way.

[page]

2. Page usage API

1. Get page:

struct page* alloc_pages(gfp_t gfp_mask,unsigned int order);

This function allocates 2^order consecutive physical pages and returns a pointer to the page structure of the first page.

If you want to get the logical address of the page, you can use void* page_address (struct page* page);

Or directly apply for a page to return the logical address of the first page:

unsigned long __get_free_pages(gfp_t gfp_mask,unsigned int order);

Release page:

void __free_pages(struct page* page,unsigned int order);

void free_pages(unsigned long addr,unsigned int order);

void free_page(unsigned long addr);

Note: Freeing a page can only free your own page. The above are used to apply for memory with page size. If you apply for memory with byte size, you need to use the following

kmalloc()

Prototype: void* kmalloc (size_t size, gfp_t flags); This allocation is physically continuous. Note: We know that the basic unit of memory allocation is page. Why is it byte here? This requires the following content to introduce slab.

The corresponding one is kfree();

Prototype: void kfree(const void *ptr);

vmalloc() function:

Similar to kmalloc, but the virtual addresses allocated by vmalloc are continuous, while the physical addresses are unordered. The pages returned by malloc are continuous in the virtual address space of the process, but this does not guarantee that they are continuous in physical RAM. kmalloc ensures that the virtual addresses are continuous when the pages are physically continuous. Generally, hardware devices must be physically continuous. In terms of performance, kmalloc is generally used because the pages obtained by vmalloc must be mapped one by one (because they are not physically continuous), which requires special page table entries.

The following figure is a diagram of memory allocation in the kernel:

3. Allocating bytes and allocating pages

1. kmalloc() allocates consecutive physical addresses for small memory allocations.
2. __get_free_page() allocates consecutive physical addresses for full page allocations.

As for why the above functions allocate continuous physical addresses and whether the returned addresses are physical addresses or virtual addresses, the following records will explain.
The kmalloc() function itself is implemented based on slabs. Slabs are an efficient mechanism for allocating small memory. However, the slab allocation mechanism is not independent. It also divides finer-grained memory for the caller to use based on the page allocator. In other words, the system first uses the page allocator to allocate continuous physical addresses with pages as the smallest unit, and then kmalloc() divides them according to the needs of the caller.
Regarding the above discussion, we can look at the implementation of kmalloc(). The implementation of the kmalloc() function is in __do_kmalloc(). It can be seen that in the __do_kmalloc() code, __cache_alloc() is finally called to allocate a slab. In fact,
the implementation of kmem_cache_alloc() and other functions also calls this function to allocate new slabs. If we follow the calling path of __cache_alloc() function, we will find that kmem_getpages() function is used in cache_grow() function to allocate a physical page. Alloc_pages_node() called in kmem_getpages() function finally uses __alloc_pages() to return a struct page structure, which is used by the system to describe physical pages. This confirms what was said above, that slab is implemented on the basis of physical pages. kmalloc() allocates physical addresses.
__get_free_page() is the lowest-level memory allocation function provided by the page allocator to the caller. It allocates continuous physical memory. The __get_free_page() function itself is implemented based on buddy. In the physical memory management implemented using buddy, the minimum allocation granularity is in pages. Regarding the above discussion, we can look at the implementation of __get_free_page(). We can see that the __get_free_page() function is just a very simple seal. Its entire function implementation is to unconditionally call the __alloc_pages() function to allocate physical memory. When recording the kmalloc() implementation above, it is also mentioned that slab management is performed on the premise of calling the __alloc_pages() function to allocate physical pages. So how does this function allocate physical pages and in what area is it allocated? To answer this question, we can only look at the relevant implementation. It can be seen that in the __alloc_pages() function, multiple attempts are made to call the get_page_from_freelist() function to obtain the relevant zone from the zonelist, and return an available struct page page from it (some of the call branches here are because of different flags). So far, we can know that the allocation of a physical page is returned from the zone in the zonelist (a zone structure array). So how is the zonelist/zone associated with the physical page and how is it initialized? Let's continue to look at the free_area_init_nodes() function. This function is indirectly called by the zone_sizes_init() function when the system is initialized. The zone_sizes_init() function fills three areas: ZONE_DMA, ZONE_NORMAL, and ZONE_HIGHMEM. And use them as parameters to call free_area_init_nodes(). In this function, a pglist_data structure is allocated, which contains the zonelist/zone structure and a struct page physical page structure. At the end of the function, the free_area_init_node() function is called with this structure as a parameter. In this function, the calculate_node_totalpages() function is first used to mark the pglist_data related area, and then the alloc_node_mem_map() function is called to initialize the struct page physical page in the pglist_data structure. Finally, the free_area_init_core() function is used to associate pglist_data with zonelist. Now, through the above analysis, the process of allocating physical memory by the __get_free_page() function has been clarified. But here comes a few new questions, that is, how is the physical page allocated by this function mapped? Where is it mapped? At this point, we have to look at the boot code related to VMM.
Before looking at the boot code related to VMM, let's take a look at the two functions virt_to_phys() and phys_to_virt. As the name suggests, it is the conversion from virtual address to physical address and from physical address to virtual address. The function implementation is very simple. The former calls __pa(address) to convert the virtual address to the physical address, and the latter calls __va(addrress) to convert the physical address to the virtual address. Let's take a look at what the two macros __pa __va do.
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
From the above, we can see that it is just adding or subtracting PAGE_OFFSET to the address, and PAGE_OFFSET is defined as 0xC0000000 in x86. This raises another question. Anyone who has written a driver under Linux knows that after using kmalloc() and
__get_free_page() to allocate the physical address, if you want to get the correct physical address, you need to use virt_to_phys() to convert it. So why is there this step? Isn't it the physical address that we don't allocate? Why do we need to convert it after the allocation is completed? If the returned address is a virtual address, then according to the above analysis of virt_to_phys(), why can the address conversion be achieved by just operating PAGE_OFFSET? Doesn't the conversion between virtual addresses and physical addresses require checking the page table? With the above questions in mind, let's look at the VMM-related boot code.
Directly search for VMM related content from the kernel boot part of start_kernel(). You can see that the first function you should pay attention to is setup_arch(), in which the paging_init() function is used to initialize and map the hardware page table (8M
memory has been mapped before initialization, which is not recorded here), and paging_init() calls pagetable_init() to complete the mapping of the kernel physical address and the initialization of related memory. In the pagetable_init() function, the first thing is some PAE/PSE/PGE related judgments
And set, and then use kernel_physical_mapping_init() function to realize kernel physical memory mapping. In this function, we can clearly see that pgd_idx is mapped with PAGE_OFFSET as the starting address, that is to say, the loop initialization of all physical addresses is based on PAGE_OFFSET as the starting point. Continuing to observe, we can see that after PMD is initialized, all address calculations are incremented with PAGE_OFFSET as a marker. It is obvious from the analysis here that the physical address is mapped to the
virtual address space starting with PAGE_OFFSET. In this way, all the above questions have answers. The physical pages allocated by kmalloc() and __get_free_page() are mapped to the virtual addresses starting at PAGE_OFFSET, which means that there is a one-to-one correspondence between the actual physical address and the virtual address.
It is because of this mapping relationship that the allocation of virtual addresses starting with PAGE_OFFSET is also the allocation of physical addresses (of course, there is a certain range, which should be between PAGE_OFFSET and VMALLOC_START, the latter is the starting address of memory allocation by vmalloc() function). This also explains why the implementation of virt_to_phys() and phys_to_virt() functions is just adding/subtracting PAGE_OFFSET to convert between virtual addresses and physical addresses. It is because of this mapping, which is fixed and unchanged, that there is no need to check the page table for conversion. This also answers the question at the beginning, that is, kmalloc() / __get_free_page() allocates physical addresses and returns virtual addresses (although this sounds a bit awkward). Because of this mapping relationship, it is necessary to subtract PAGE_OFFSET from their return address to get the real physical address. (Reference here: http://linux.chinaunix.net/techdoc/develop/2007/04/17/955506.shtml).

Keywords：linux Reference address：Learning experience of memory management under Linux (Part 2)

Previous article：A closer look at the 2812's built-in ADC
Next article：ARM rookie's growth story - Part 3

Recommended ReadingLatest update time:2024-11-16 17:46

ARM-Linux driver--ADC driver (interrupt mode)

Hardware platform: FL2440 Kernel version: 2.6.28 Host platform: Ubuntu 11.04 Kernel version: 2.6.39 Original work, please indicate the source when reprinting: http://blog.csdn.net/yming0221/archive/2011/06/26/6568937.aspx This driver took a long time to write, because the touch screen driver was compiled i

[Microcontroller]

ARM-Linux driver--ADC driver (interrupt mode)

OK6410A Development Board (VIII) 93 linux-5.11 OK6410A Interpretation of binary files from 0 to APP startup

The binary file mentioned here refers to the code binary file, not the data binary file The code binary file has a format. The linker generates the code binary file according to the format, and the loader parses the code binary file according to the format. Binary has a different history on different systems 1. un

[Microcontroller]

Detailed analysis of flush_dcache_all function in armv8 (aarch64) linux kernel

/* * __flush_dcache_all() * Flush the wholeD-cache. * Corrupted registers: x0-x7, x9-x11 */ ENTRY(__flush_dcache_all) // Ensure the order of previous memory access instructions etc //读cache level id register mrs x0, clidr_el1 // read clidr //取b

[Microcontroller]

Detailed analysis of flush_dcache_all function in armv8 (aarch64) linux kernel

Linux Device Driver Development - LCD Device Driver Analysis

1. S3C6410 LCD driver bare metal code LCD controller initialization: 1 unsigned long VideoBuffer = {0}; 2 void lcd_init(void) 3 { 4 /* 1. Initialize IO port as LCD port*/ 5 /* GPIO configure */ 6 GPICON = 0xAAAAAAAA; 7 GPJCON = 0x00AAAAAA; 8 9 /* 2. Enable LCD

[Microcontroller]

Linux Device Driver Development - LCD Device Driver Analysis

Micro2440 + Linux 2.6.39 to establish NFS

After completing the porting of Linux 2.6.39, we will first create an NFS file system, using the file system created in the document "Building a Minimal Root File System.doc" (201407090326_fs_mini.tar.bz2 http://pan.baidu.com/s/1gdorX9h) as the NFS root file system. This step can be referred to the relevant do

[Microcontroller]

OK6410A Development Board (VIII) 37 linux-5.11 OK6410A Memory Management Phase 5

vmalloc vmalloc_init for_each_possible_cpu(i) { // Linked list of vfree_deferred type variables and work task free_work // free_work is used to delay the asynchronous release of vmalloc memory during vfree execution struct vfree_deferred *p; ... // Linked list and lock of vmap_block_queue type variables

[Microcontroller]

How to save data parameters in embedded Linux software

Most software development involves the saving and reading of data parameters, from the software of the running microcontroller to the software at the operating system level (such as Linux, Windows, Mac), there will be special subroutines or modules to save and read parameters. There will be certain differences in the

[Microcontroller]

TQ2440 Study Notes —— 11. Basic knowledge of embedded programming [arm-linux-objcopy, objdump options]

1. arm-linux-objcopy option arm-linux-objcopy is used to copy the contents of a target file to another file. The destination file can be output in a format different from the source file, that is, format conversion can be performed. arm-linux-objcopy is often used to convert executable files in ELF format into binar

[Microcontroller]

TQ2440 Study Notes —— 11. Basic knowledge of embedded programming [arm-linux-objcopy, objdump options]

Popular Resources
Popular amplifiers