1) Although different processes each map pages to the same physical memory segment, the virtual address space of the processes is the same. The same virtual address corresponds to multiple page table entries, but the TLB is not updated, resulting in the page table entry found by the TLB not being the mapping of the current process.
That is, the same virtual address corresponds to different physical addresses, causing TLB errors.
When the Linux kernel starts, the base address of the page global directory, i.e. TTB, is specified for the kernel-mode linear address. And different TTBs are used when the process context switches. TTB is defined as follows:
/*
* swapper_pg_dir is the virtual address of the initial page table.
* We place the page tables 16K below KERNEL_RAM_VADDR. Therefore, we must
* make sure that KERNEL_RAM_VADDR is correctly set. Currently, we expect
* the least significant 16 bits to be 0x8000, but we could probably
* relax this restriction to KERNEL_RAM_VADDR >= PAGE_OFFSET + 0x4000.
*/
/* Such as 0x80008000 or 0xc0008000 */
#define KERNEL_RAM_VADDR (PAGE_OFFSET + TEXT_OFFSET)
#define PG_DIR_SIZE 0x4000
#define PMD_ORDER 2
/*
1) The base address of the kernel page global directory swapper_pg_dir is in the form of 0x80004000 or 0xc0004000
2) The contents of the page global directory address (task_struct->pgd) of each process, the kernel state part is copied from the swapper_pg_dir address.
*/
.globl swapper_pg_dir
.equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE
In fact, after this analysis, the author denies his own inference, because it is clear that an important function of MMU is to deal with the above doubts. Even so, the following experiment was conducted, and the TLB was flushed once during the process of remap_pfn_range physical address:
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
pgd_t *pgd;
unsigned long next;
unsigned long end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma->vm_mm;
int err;
do {
next = pgd_addr_end(addr, end);
err = remap_pud_range(mm, pgd, addr, next,
pfn + (addr >> PAGE_SHIFT), prot);
if (err)
break;
} while (pgd++, addr = next, addr != end);
/*Add flush tlb operation here*/
flush_tlb_all();
return err;
}
The result was consistent with the expectation, and the cause of the problem was not found. However, it was still recorded as an experiment.
2 Introduction to MMU's TLB process:
ps:
1 The tlb process uses the ttb in the above text as the base address and the highest few bits (such as 12 bits) of the virtual address we operate as the index to find the physical address of the page-level directory descriptor corresponding to the specified virtual address:
2 The page level 1 directory descriptor address stores the page level 2 directory descriptor base address, and then finds the physical address of the page level 2 directory descriptor of the specified address based on the index page of the virtual address; and so on, finally finds the physical address corresponding to the specified virtual address.
This is just a brief introduction. If you are interested, you can refer to the arm manual.
[Analysis 4] CPU and L1 cache
1. L1-cache is also called internal cache, which consists of 32k instruction cache and 32k data cache, and is controlled by CP15. The following takes the s2 platform as an example to introduce the cache access process.
For a better understanding, let's re-clarify several concepts of cache:
写通cache(write-through):
When the CPU sends a write signal, it also writes to the main memory to ensure that the main memory can be updated synchronously. The advantage is that the operation is simple, but the system performance is reduced due to the slow access speed to the main memory.
Write-back:
When the CPU sends a write signal, the data is generally written to the cache, and is only written back to the memory when the dirty flag in the cache is set.
Rotation strategy and random strategy:
When a cache access fails, the cache controller will take a cache line from the current valid line to store the information obtained from the main memory. The cache line selected for replacement will be written back to the main memory if the drity bit is 1, and the replacement policy determines which cache line will be replaced. The rotation policy is to take the next line of the current cache line for replacement; the random policy is that the controller randomly selects.
Cache type:
1 PIPT is generally used for D cache, which is data cache relative to instruction cache.
This type avoids flushing the cache when switching contexts, but because physical addresses are used, address conversion must be performed every time a hit is detected, which is slower.
2 VIVT Old-Style Cache
Avoid address translation after a cache hit, but the address mapping changes after a context switch, and the cache must be flushed, which is inefficient.
3 VIPT new cache
When the cache finds the correct set through index query, TLB can complete the virtual address to physical address conversion. When the cache compares the tag, the physical address is ready, which means that the physical tag can work in parallel with the cache. Although the delay is not as VIVT, there is no need to flush the cache when the context switches.
To verify the problem, try to disable the cache after mmap. Note that it does not mean invalid cache, because invalid only clears the valid bits of the current cache line and does not actually turn off the cache.
2: Turn off l1cache
1) The reference code is as follows, which mainly implements closing l1cache
void disabel_l1_cache()
{
u32 value=0;
flush_cache_all();
asm("mrc p15,0,%0,c0,c0,1 @get CR":"=r"(value)::"cc");
/*
Get cache type (0x83338003)
*/
__asm__ __volatile__(
"mrc p15,0,%0,c1,c0,0":"=r"(value):);
/*
Experiment: turn off icache and dcache of l1cache;
bit12:icache enable/disable = 1/0;
bit2dcache enable/disable=1/0
*/
value &=(~0x1004);
__asm__ __volatile__(
"mcr p15,0,%0,c1,c0,0":"=r"(value):);
flush_cache_all();
isb();
/* Determine whether the above mcr is successful*/
__asm__ __volatile__(
"mrc p15,0,%0,c1,c0,0":"=r"(value):);
}
Note that the above experiments are performed according to cp15 register1.
Conclusion: The cache cannot be completely turned off. Once Dcache is turned off, the system will freeze.
2) cp15 register1 description, the author uses this register to turn off l1cache.
Refer to disable_l1_cache, which actually controls the L1 cache shutdown implemented by bit2 and bit13.
3) Cp15-register0: cache type description.
This section mainly introduces the format of the cache type register, which can be used to obtain information such as cache type, size, cache line length, etc. This section is very simple, and all the information is found directly from the arm manual, and is posted here for your reference.
A key field definition read from disable_l1_cache (0x83338003):
2. Cache type register format:
It is easy to see that bits 25-28 are cache types, and the literal ctype field is 0b0001, which means that l1cache is write-back type;
At the same time, we can also obtain the size and associativity of the cache. Bits 0-11 and 12-23 in the cache type register are the related attributes of icache and dcache.
Cache associativity attribute bit3-5:
Cache associativity attribute bit6-9:
Cacheline length bit0-1:
The CPU performs cache matching based on cache size, cacheline length, and cache associativity.
For the matching method, refer to l2-cache.
[Analysis 5] CPU and L2cache
1) L2-cache is also called external cache and is controlled through AXI bus. The following example introduces the cache access process.
/* Get l2-cache information */
void l2x0_init(void __iomem *base, u32 aux_val, u32 aux_mask)
/* l2-cache register access */
readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
/* l2-cache information */
Take the Auxiliary control register as an example:
* L310 cache controller enabled
* l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 0x02050000,
* Cache size: 512k ;
2) AUX_CTRL = 0x02050000, the binary is as follows:
3) Compare register cache way sizes:
bit[19:17]=010 means the size of each cache way is 32k;
4) Comparing registers, cache associativity is as follows:
bit16=1: indicates 16-way main associative cache.
5) The definition of associativity is:
Fully associative cache: Any VA can be cached in any cache line. To search the cache, all cache lines must be checked.
Direct-mapped cache: The characteristic is that VA is only allowed to be cached in a certain cacheline. The search process blocks and checks whether the TAG corresponding to the VA that should be cached is consistent with the TAG in the cacheline. If they are consistent, it is a cache hit, otherwise it is a cache miss.
Set-associative cache: Fully associative cache and direct-mapped cache each have their own advantages and disadvantages. Fully associative cache is slow to search, but has no jitter problem, while direct-mapped cache is just the opposite. The actual CPU cache design takes a compromise between the two, dividing all cache lines into several groups, each group of n cache lines, called n-way set-associative cache.
6) Analyze the cache hit process in the above example:
First, to help understand the relationship between cache set, cache way and cache line, here is a more vivid picture from the Internet:
Refer to the above figure and analyze the example:
The l2cache information obtained in this example: cache way: 16ways; way-size=32k; cache line length 32byte; l2 cache size is 512k.
16way/32B/512k <-> Index=1024;cacheline=32B
31 14 11 5 4 0
| TAG | index | |cacheline|
in:
1) way=16, each way has 512k/16/32=1024 cache lines, and each way is directly mapped.
2) way=16 means that 16-way groups are connected, each group has 16 cache lines, and each group is fully associative. Each group is called a cache set.
3) Take VA=0x76bb9610 as an example: index=0xb0 find the corresponding cache set;
Previous article:ARM-I/Dcache, MMU relationship
Next article:ARM Basic Learning-Cache and Write Buffer
Recommended ReadingLatest update time:2024-11-23 19:42
- Popular Resources
- Popular amplifiers
- Siemens PLC Programming Technology and Application Cases (Edited by Liu Zhenquan, Wang Hanzhi, Yang Kun, etc.)
- Siemens PLC from Beginner to Mastery with Color Illustrations (Yang Rui)
- Experience and skills in using Siemens S7-200PLC (Shang Baoxing)
- Siemens S7-1200-PLC Programming and Application Tutorial (3rd Edition) (Edited by Shi Shouyong)
- Naxin Micro and Xinxian jointly launched the NS800RT series of real-time control MCUs
- How to learn embedded systems based on ARM platform
- Summary of jffs2_scan_eraseblock issues
- Application of SPCOMM Control in Serial Communication of Delphi7.0
- Using TComm component to realize serial communication in Delphi environment
- Bar chart code for embedded development practices
- Embedded Development Learning (10)
- Embedded Development Learning (8)
- Embedded Development Learning (6)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- STMicroelectronics discloses its 2027-2028 financial model and path to achieve its 2030 goals
- 2024 China Automotive Charging and Battery Swapping Ecosystem Conference held in Taiyuan
- State-owned enterprises team up to invest in solid-state battery giant
- The evolution of electronic and electrical architecture is accelerating
- The first! National Automotive Chip Quality Inspection Center established
- BYD releases self-developed automotive chip using 4nm process, with a running score of up to 1.15 million
- GEODNET launches GEO-PULSE, a car GPS navigation device
- Should Chinese car companies develop their own high-computing chips?
- Infineon and Siemens combine embedded automotive software platform with microcontrollers to provide the necessary functions for next-generation SDVs
- Continental launches invisible biometric sensor display to monitor passengers' vital signs
- Using proteus to learn ARM (LPC2103): familiar with the development environment
- When using DSP algorithms on FPGA, can only the provided hard core be used?
- Where is this resistor generally used?
- About C6678 running NonOS_UART0_POLL routine
- Development of PMSM Direct Torque Control IC Based on CPLD
- Top 10 classic lines from romantic movies (with pictures)
- Application of Fuzzy Control in Data Acquisition and Control System Based on CAN Bus
- Application of Hall Sensor in Watch Tamper Detection
- Lithography machines are mortgaged! Wuhan's 100 billion chip project is suspended
- MSP430F5529 common built-in functions and some instructions