CPU and Cache of ARM architecture in Linux system-EEWORLD

Collect

1) Although different processes each map pages to the same physical memory segment, the virtual address space of the processes is the same. The same virtual address corresponds to multiple page table entries, but the TLB is not updated, resulting in the page table entry found by the TLB not being the mapping of the current process.

That is, the same virtual address corresponds to different physical addresses, causing TLB errors.

When the Linux kernel starts, the base address of the page global directory, i.e. TTB, is specified for the kernel-mode linear address. And different TTBs are used when the process context switches. TTB is defined as follows:

* swapper_pg_dir is the virtual address of the initial page table.

* We place the page tables 16K below KERNEL_RAM_VADDR. Therefore, we must

* make sure that KERNEL_RAM_VADDR is correctly set. Currently, we expect

* the least significant 16 bits to be 0x8000, but we could probably

* relax this restriction to KERNEL_RAM_VADDR >= PAGE_OFFSET + 0x4000.

/* Such as 0x80008000 or 0xc0008000 */

#define KERNEL_RAM_VADDR (PAGE_OFFSET + TEXT_OFFSET)

#define PG_DIR_SIZE 0x4000

#define PMD_ORDER 2

1) The base address of the kernel page global directory swapper_pg_dir is in the form of 0x80004000 or 0xc0004000

2) The contents of the page global directory address (task_struct->pgd) of each process, the kernel state part is copied from the swapper_pg_dir address.

.globl swapper_pg_dir

.equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE

In fact, after this analysis, the author denies his own inference, because it is clear that an important function of MMU is to deal with the above doubts. Even so, the following experiment was conducted, and the TLB was flushed once during the process of remap_pfn_range physical address:

int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,

unsigned long pfn, unsigned long size, pgprot_t prot)

{

pgd_t *pgd;

unsigned long next;

unsigned long end = addr + PAGE_ALIGN(size);

struct mm_struct *mm = vma->vm_mm;

int err;

do {

next = pgd_addr_end(addr, end);

err = remap_pud_range(mm, pgd, addr, next,

pfn + (addr >> PAGE_SHIFT), prot);

if (err)

break;

} while (pgd++, addr = next, addr != end);

/*Add flush tlb operation here*/

flush_tlb_all();

return err;

}

The result was consistent with the expectation, and the cause of the problem was not found. However, it was still recorded as an experiment.

2 Introduction to MMU's TLB process:

ps:

1 The tlb process uses the ttb in the above text as the base address and the highest few bits (such as 12 bits) of the virtual address we operate as the index to find the physical address of the page-level directory descriptor corresponding to the specified virtual address:

2 The page level 1 directory descriptor address stores the page level 2 directory descriptor base address, and then finds the physical address of the page level 2 directory descriptor of the specified address based on the index page of the virtual address; and so on, finally finds the physical address corresponding to the specified virtual address.

This is just a brief introduction. If you are interested, you can refer to the arm manual.

[Analysis 4] CPU and L1 cache

1. L1-cache is also called internal cache, which consists of 32k instruction cache and 32k data cache, and is controlled by CP15. The following takes the s2 platform as an example to introduce the cache access process.

For a better understanding, let's re-clarify several concepts of cache:

写通cache（write-through）：

When the CPU sends a write signal, it also writes to the main memory to ensure that the main memory can be updated synchronously. The advantage is that the operation is simple, but the system performance is reduced due to the slow access speed to the main memory.

Write-back:

When the CPU sends a write signal, the data is generally written to the cache, and is only written back to the memory when the dirty flag in the cache is set.

Rotation strategy and random strategy:

When a cache access fails, the cache controller will take a cache line from the current valid line to store the information obtained from the main memory. The cache line selected for replacement will be written back to the main memory if the drity bit is 1, and the replacement policy determines which cache line will be replaced. The rotation policy is to take the next line of the current cache line for replacement; the random policy is that the controller randomly selects.

Cache type:

1 PIPT is generally used for D cache, which is data cache relative to instruction cache.

This type avoids flushing the cache when switching contexts, but because physical addresses are used, address conversion must be performed every time a hit is detected, which is slower.

2 VIVT Old-Style Cache

Avoid address translation after a cache hit, but the address mapping changes after a context switch, and the cache must be flushed, which is inefficient.

3 VIPT new cache

When the cache finds the correct set through index query, TLB can complete the virtual address to physical address conversion. When the cache compares the tag, the physical address is ready, which means that the physical tag can work in parallel with the cache. Although the delay is not as VIVT, there is no need to flush the cache when the context switches.

To verify the problem, try to disable the cache after mmap. Note that it does not mean invalid cache, because invalid only clears the valid bits of the current cache line and does not actually turn off the cache.

2: Turn off l1cache

1) The reference code is as follows, which mainly implements closing l1cache

void disabel_l1_cache()

{

u32 value=0;

flush_cache_all();

asm("mrc p15,0,%0,c0,c0,1 @get CR":"=r"(value)::"cc");

Get cache type (0x83338003)

__asm__ __volatile__(

"mrc p15,0,%0,c1,c0,0":"=r"(value):);

Experiment: turn off icache and dcache of l1cache;

bit12：icache enable/disable = 1/0;

bit2dcache enable/disable=1/0

value &=(~0x1004);

__asm__ __volatile__(

"mcr p15,0,%0,c1,c0,0":"=r"(value):);

flush_cache_all();

isb();

/* Determine whether the above mcr is successful*/

__asm__ __volatile__(

"mrc p15,0,%0,c1,c0,0":"=r"(value):);

}

Note that the above experiments are performed according to cp15 register1.

Conclusion: The cache cannot be completely turned off. Once Dcache is turned off, the system will freeze.

2) cp15 register1 description, the author uses this register to turn off l1cache.

Refer to disable_l1_cache, which actually controls the L1 cache shutdown implemented by bit2 and bit13.

3) Cp15-register0: cache type description.

This section mainly introduces the format of the cache type register, which can be used to obtain information such as cache type, size, cache line length, etc. This section is very simple, and all the information is found directly from the arm manual, and is posted here for your reference.

A key field definition read from disable_l1_cache (0x83338003):

2. Cache type register format:

It is easy to see that bits 25-28 are cache types, and the literal ctype field is 0b0001, which means that l1cache is write-back type;

At the same time, we can also obtain the size and associativity of the cache. Bits 0-11 and 12-23 in the cache type register are the related attributes of icache and dcache.

Cache associativity attribute bit3-5:

Cache associativity attribute bit6-9:

Cacheline length bit0-1:

The CPU performs cache matching based on cache size, cacheline length, and cache associativity.

For the matching method, refer to l2-cache.

[Analysis 5] CPU and L2cache

1) L2-cache is also called external cache and is controlled through AXI bus. The following example introduces the cache access process.

/* Get l2-cache information */

void l2x0_init(void __iomem *base, u32 aux_val, u32 aux_mask)

/* l2-cache register access */

readl_relaxed(l2x0_base + L2X0_AUX_CTRL);

/* l2-cache information */

Take the Auxiliary control register as an example:

* L310 cache controller enabled

* l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 0x02050000,

* Cache size: 512k ;

2) AUX_CTRL = 0x02050000, the binary is as follows:

3) Compare register cache way sizes:

bit[19:17]=010 means the size of each cache way is 32k;

4) Comparing registers, cache associativity is as follows:

bit16=1: indicates 16-way main associative cache.

5) The definition of associativity is:

Fully associative cache: Any VA can be cached in any cache line. To search the cache, all cache lines must be checked.

Direct-mapped cache: The characteristic is that VA is only allowed to be cached in a certain cacheline. The search process blocks and checks whether the TAG corresponding to the VA that should be cached is consistent with the TAG in the cacheline. If they are consistent, it is a cache hit, otherwise it is a cache miss.

Set-associative cache: Fully associative cache and direct-mapped cache each have their own advantages and disadvantages. Fully associative cache is slow to search, but has no jitter problem, while direct-mapped cache is just the opposite. The actual CPU cache design takes a compromise between the two, dividing all cache lines into several groups, each group of n cache lines, called n-way set-associative cache.

6) Analyze the cache hit process in the above example:

First, to help understand the relationship between cache set, cache way and cache line, here is a more vivid picture from the Internet:

Refer to the above figure and analyze the example:

The l2cache information obtained in this example: cache way: 16ways; way-size=32k; cache line length 32byte; l2 cache size is 512k.

16way/32B/512k <-> Index=1024;cacheline=32B

31 14 11 5 4 0

in:

1) way=16, each way has 512k/16/32=1024 cache lines, and each way is directly mapped.

2) way=16 means that 16-way groups are connected, each group has 16 cache lines, and each group is fully associative. Each group is called a cache set.

3) Take VA=0x76bb9610 as an example: index=0xb0 find the corresponding cache set;

[1] [2] [3]

Keywords：CPU Cache Reference address：CPU and Cache of ARM architecture in Linux system

Previous article：ARM-I/Dcache, MMU relationship
Next article：ARM Basic Learning-Cache and Write Buffer

Recommended ReadingLatest update time:2024-11-23 19:42

Real-time photoelectric image recognition system based on dual CPU

introduction 　　Optoelectronic hybrid pattern recognition has become an important way to realize the practical and real-time pattern recognition with its advantages of high-speed parallel processing and no crosstalk. It has been widely studied and applied in the fields of target recognition, fingerprint recognition, o

[Microcontroller]

Real-time photoelectric image recognition system based on dual CPU

Not just CPU, details on mobile GPU (Part 2)

ARM Mali - "the son" 　　Development History: 　　As the core of the entire ARM ecosystem, ARM plays a decisive role in the development of mobile SOC CPUs. However, ARM is not so important in the development of mobile GPUs. In the early days, ARM did not even have a GPU part. It was not until 2006, after ARM acquired

[Analog Electronics]

Not just CPU, details on mobile GPU (Part 2)

Milchip D9 is a powerful domestically produced CPU that can run Android, Linux, and RTOS

Mil can run Android, Linux, RTOS domestic core board development board Do you still remember the days of chip shortages and price increases? In recent years, due to trade wars and technological suppression, localization of chips has become a trend. Today I recommend a development board that

[Embedded]

Milchip D9 is a powerful domestically produced CPU that can run Android, Linux, and RTOS

PIC16C71 single-chip microcomputer key to wake up the CPU source program

; p=pic16c71,xt=40000hz LIST P=16c71 ; Z EQU 2 RBPU EQU 7 TEMP EQU 10H OPTIONREG EQU 1H F EQU 1 PORT_B EQU 06H ; INCLUDE LIST ; ORG 0 ; reset address GOTO START ; ORG 4 ; interrupt vector GOTO SERVICEINTERRUPT ; START CALL INITPORT_B ; initialize port B LOOP SLEEP ; save pow

[Microcontroller]

Production of ARM Linux Root Filesystem

Introduction: Introduces the composition of the root file system: directory, shell, library, and script. Table of contents The root file system must contain these required directories: /dev, /bin, /usr, /sbin, /lib, /etc, /proc, /sys /dev is the mount point of devfs (device file system) or udev. If there is no /de

[Microcontroller]

AVR Notes 2: Define F_CPU

1.warning: #warning "F_CPU not defined for " 2.warning: "F_CPU" redefined 3.c:/winavr-20100110/lib/gcc/../../avr/include/util/delay.h:86:1: warning: this is the location of the previous definition The solution to the above three errors is to #define F_CPU 1000000 Put it before the #include util/delay.h

[Microcontroller]

What impact does CPU branch prediction have on your code?

The English name of branch prediction is "Branch Prediction" You can search this keyword on Google and you can see a lot about branch prediction, but understanding how branch prediction works is the key to the problem. The impact of branch prediction on programs Let’s take a look at the following two pieces of c

[Microcontroller]

Dual CPU digital signal processor with ARM core

Abstract: This article mainly introduces the structure, function and characteristics of TMS320VC5470, the latest fixed-point digital signal processor launched by American TI Company. The TMS320C54x digital signal processor and ARM7TDMI RISC MCU integrated into the device and their connections are introduced respe

[Embedded]

Popular Resources
Popular amplifiers

Latest Microcontroller Articles

Naxin Micro and Xinxian jointly launched the NS800RT series of real-time control MCUs
On November 20, Naxin Micro announced that it would launch the NS800RT series of real-time control MCUs in cooperation with ChipSine. This series of MCUs has more efficient and powerful real-time control capabilities and rich ...
How to learn embedded systems based on ARM platform
1. The concept of embedded system focuses on understanding the concept of "embedded" from three aspects: 1. From the hardware perspective, the CPU-based peripheral devices are integrated into the CPU chip, such as the early X86-based ...
Summary of jffs2_scan_eraseblock issues
Summarize the problems encountered before: 1 Similar: mtd->read(0x44 bytes from 0x68cf44) returned ECC errorjffs2_get_inode_nodes(): CRC failed ...
Application of SPCOMM Control in Serial Communication of Delphi7.0
Abstract: Using Delphi to develop industrial control system software has become the choice of more and more developers, and serial port communication is one of the problems that must be solved in this process. ...
Using TComm component to realize serial communication in Delphi environment
Abstract: Using Delphi to develop industrial control system software has become the choice of more and more developers, and serial port communication is one of the problems that must be solved in this process. ...
Bar chart code for embedded development practices
Embedded Development Learning (10)
Embedded Development Learning (8)
Embedded Development Learning (6)

He Limin Column Microcontroller and Embedded Systems Bible

Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like