Experimental purpose: Enable MMU, map SDRAM address space, operate virtual address to realize "lighting method", and master the use of MMU. Experimental
environment and description: Hengyi S3C2410 development board H2410. H2410 core board is expanded with 64MB K4S561632 SDRAM (4M*16bit*4BANK), the address range is 0x30000000~0x33FFFFFF. The address range of GPIO port is 0x56000000~0X560000B0.
Experimental idea: After the development board is powered on, the 4K data starting from NandFlash is automatically copied to SRAM, then jumps to address 0 to start execution, and then initializes the storage controller SDRAM, copies the code after 2K from SRAM to SDRAM (stored at 0x30004000, the first 16KB is used to store the page table), sets the page table, starts MMU to implement virtual address mapping GPIO registers and SDRAM, and finally jumps to SDRAM (address 0xB0004000) to run. Reset the stack pointer and jump to the entry point of the light-up code to implement the light-up operation.
Knowledge mastery: MMU address translation, memory access permission check, use of TLB and Cache
1. MMU address translation:
1. First of all, why do we need to use MMU? MMU is the memory management unit. To put it bluntly, it is like the tableware in the cafeteria. When all the students eat together, it is not enough, but the cafeteria does not want to invest in buying new tableware (the reason is obvious: on the one hand, it costs money, and on the other hand, it takes up space. This is like adding memory). So is there any solution? According to past experience, it is impossible for all students in the school to eat in the cafeteria together, so the cafeteria finds a few people to manage the tableware (equivalent to MMU). On the one hand, they distribute tableware to ensure that the students who come have tableware to use, and on the other hand, they recycle the used tableware (this is equivalent to establishing a mapping between virtual addresses and physical addresses. The memory is still the same, but from the perspective of any single program, it seems that it can't be used up). Of course, if a student takes several sets of tableware, it is definitely not allowed (this is equivalent to memory permission check). MMU involves three types of addresses in the address conversion process: (VA---Virtual Address)---This is equivalent to the place where tableware is stored (everyone can get tableware). The CPU core sees and uses only the virtual address VA. As for VA, if it corresponds to the physical address PA, the CPU core ignores it, and no one will care about how many tableware there are in total; (MVA---Modified Virtual Address)---This is equivalent to when there are few people during the holidays, just distribute tableware, and don't recycle the used ones first, saving personnel. Caches and MMU cannot see VA, they use MVA to convert PA, so that people who recycle tableware do not need to keep looking for used tableware during holidays; (PA---Physical Address)---the actual amount of tableware, that's it. The actual device cannot see VA and MVA, and the physical address PA is used to read and write them. Students usually receive tableware when they eat.
2. The conversion process from virtual address to physical address. ARM uses page tables for conversion. S3C2410 will use at most two levels of page tables. When converting in the form of segments (Section, 1M), only one level of page tables is used, and when converting in the form of pages (Page), two levels of page tables are used. There are three sizes of pages: large pages (64KB), small pages (4KB) and very small pages (1KB). This article only uses the segment address conversion process as an example to explain, and the conversion of pages is similar.
★First, there is a page table base address register (located at register C2 of coprocessor CP15), which contains the address of the first-level page table. By reading it, you can find the starting position of the first-level page table. The address of the first-level page table is 16K aligned (so [13:0] is 0, and [31:14] is used to store the page table base address). The first-level page table uses 4096 descriptors to represent 4GB space, so each descriptor corresponds to a 1MB virtual address, stores the starting address of the corresponding 1MB physical space, or stores the address of the next-level page table. Use MVA[31:20] to index the first-level page table (31-20 is a total of 12 bits, 2^12=4096, so there are 4096 descriptors), and get a descriptor, each descriptor occupies 4 bytes.
★ When the last two bits of the descriptor are 0B10, it is mapped in segment mode. [31:20] is the segment base address. After the lower 20 bits of this descriptor are filled with 0, it is the starting address of a 1MB physical address space. MVA[19:0] is used to address in this 1MB space. Bits [31:20] of the descriptor and MVA[19:0] constitute the physical address corresponding to this virtual address MVA. When mapping in a segment manner, the conversion process of the virtual address MVA to the physical address PA is as follows: ① The page table base address register bits [31:14] and MVA[31:20] form a 32-bit address with the lower two bits being 0, and the MMU uses this address to find the segment descriptor; ② Take out bits [31:20] (segment base address) of the segment descriptor, which together with MVA[19:0] form a 32-bit physical address (this is the PA corresponding to the MVA).
'700')this.width='700';if(this.offsetHeight>'700')this.height='700';" src="http://www.arm79.com/attachment/Mon_1005/73_67_c19a93f3ccea9b3.jpg" onclick="if(this.width>=700) window.open('http://www.arm79. com/attachment/Mon_1005/73_67_c19a93f3ccea9b3.jpg');" border="0" width="700">
'700')this.width='700';if(this.offsetHeight>'700')this.height='700';" src="http://www.arm79.com/attachment/Mon_1005/73_67_ba0dd29d824d17a.jpg" onclick="if(this.width>=700) window.open('http://www.arm79.com/attachment/Mon_1005/73_67_ba0dd29d824d17a. jpg');" border="0" width="700">
2. Memory access permission check
The memory access permission check determines whether a block of memory is allowed to be read/written. This is determined by CP15 register C3 (domain access control), the domain of the descriptor, the R/S/A bit of CP15 register C1 and the AP bit of the descriptor. "Domain" determines whether to perform permission check on a block of memory, and "AP" determines how to perform permission check on a block of content. S3C2440 has 16 domains. Each two bits in CP15 register C3 corresponds to a domain (32 bits in total), which is used to indicate whether this domain is subject to permission check.
The meaning of each two bits of data: 00---No access permission (any access will result in "Domain fault" exception); 01---client mode (use segment descriptors and page descriptors for permission check); 10---reserved (reserved, currently equivalent to "no access rights"); 11---management mode (no permission check, any access is allowed). "Domain" occupies 4 bits and is used to indicate which domain 0-15 the memory belongs to.
3. TLB and Cache
First of all, both use the principle of locality of program access to improve performance by setting high-speed, small-capacity memory.
1. (TLB---Translation Lookaside Buffers): Since the conversion from MVA to PA requires multiple memory accesses, which greatly reduces the performance of the CPU, the TLB method is proposed for improvement. When the CPU issues a virtual address, the MMU first accesses the TLB. If the TLB contains a descriptor that can convert this virtual address, this descriptor is directly used for address conversion and permission check. Otherwise, the MMU accesses the page table to find the descriptor and then performs address conversion and permission check, and fills this descriptor into the TLB. The next time this virtual address is used, the descriptor used by the TLB is directly used. When using the TLB, it is necessary to ensure that the content in the TLB is consistent with the page table. Before starting the MMU, it is especially important to pay attention to it after the content in the page table changes. The general practice is to invalidate the entire TLB before starting the MMU, and when changing the page table, invalidate the entry in the TLB corresponding to the virtual address involved.
2. (Cache): To increase the running speed of the program, a high-speed, relatively small memory is set between the main memory and the general register of the CPU, and a part of the instructions or data near the address of the instruction being executed is transferred from the main memory to this memory for the CPU to use within a period of time.
★ Two ways to write data: ① (Write Through)---When any CPU sends a write signal to the Cache, it also writes to the main memory to ensure that the data in the main memory is updated synchronously. The advantage is simple operation, but due to the slow speed of the main memory, the write speed of the system is reduced and the bus time is occupied. ② (Write Back)---Data is generally only written to the Cache, so that the data in the Cache may be updated while the data in the main memory remains unchanged (data is stale). At this time, a flag address and data stale information can be set in the Cache. Only when the data in the Cache is swapped out or forced to perform a "clear" operation, the original updated data is written to the corresponding unit of the main memory, ensuring the consistency of the data in the Cache and the main memory.
★Cache has the following two operations: ①(Clean, clear)---Write the dirty (modified but not written to the main memory) data in the cache or write buffer to the main memory. ②(Invalidate, invalidate)---Make it unusable and do not write the dirty data to the main memory.
★S2C2440 has built-in (ICaches, instruction cache), (DCaches, data cache) and (Write buffer, write cache). The C bit (Ctt) and B bit (Btt) in the descriptor are required for operation. ①(ICaches, instruction cache)---When the system is just powered on or reset, the content in ICaches is invalid, and the ICaches function is turned off. Writing 1 to the Icr bit (bit 12 of register 1 in the CP15 coprocessor) can start ICaches, and writing 0 to stop ICaches. ICaches are generally used after the MMU is turned on. At this time, the C bit of the descriptor is used to indicate whether a section of memory can be cached. If Ctt=1, cache is allowed, otherwise it is not allowed. If the MMU is not enabled, ICaches can also be used. At this time, all the memory involved in the CPU reading instructions is considered as cache-enabled. When ICaches are disabled, the CPU must read the main memory every time it fetches instructions, which results in low performance. Therefore, ICaches are usually enabled as early as possible. After ICaches are enabled, the CPU will first check whether it can find the instruction in ICaches every time it fetches instructions, regardless of whether Ctt is 0 or 1. If it is found, it is a cache hit. If it is not found, it is a cache miss. After ICaches are enabled, the CPU's instruction fetch has the following three situations: when the cache hits and Ctt is 1, the instruction is fetched from ICaches and returned to the CPU; when the cache misses and Ctt is 1, the CPU fetches the instruction from the main memory and caches the instruction in the cache; when Ctt is 0, the CPU fetches the instruction from the main memory. ② (DCaches, data cache)---Similar to ICaches, when the system is just powered on or reset, the content in DCaches is invalid, and the DCaches function is disabled, and the content in the Write buffer is also discarded. Writing 1 to the Ccr bit (the second bit of register 1 in the CP15 coprocessor) starts DCaches, and writing 0 stops DCaches. Write buffer and DCaches are closely integrated, and there is a special control to start and stop it. Unlike ICaches, the DCaches function can only be used after the MMU is turned on. When DCaches are turned off, the CPU goes to the memory to fetch data every time. After DCaches are turned on, the CPU will first check whether it can find the required data in DCaches every time it reads or writes data. Regardless of whether Ctt is 0 or 1, it is called a cache hit if it is found, and it is called a cache miss if it is not found.
★ When using Cache, it is necessary to ensure that the contents of Cache and Write buffer are consistent with the contents of main memory, and to ensure the following two principles: ① Clear DCaches to update the main memory data. ② Invalidate ICaches to make the CPU re-read the main memory when fetching instructions.
When actually writing programs, pay attention to the following points: ① Before turning on MMU, invalidate ICaches, DCaches and Write buffers. ② Before turning off the MMU, clear the ICaches and DCaches, that is, write the "dirty" data to the main memory. ③ If the code changes, invalidate the ICaches so that the CPU will read the main memory again when fetching instructions. ④ When using DMA to operate cacheable memory: when sending data from the memory, clear the cache; when reading data from the memory, invalidate the cache. ⑤ When changing the address mapping relationship in the page table, you must also consider carefully. ⑥ When turning on ICaches or DCaches, consider whether the content in the ICaches or DCaches is consistent with the main memory. ⑦ For the I/O address space, do not use the Cache and Write buffer.
IV. Control instructions for MMU, TLB and Cache
In addition to the ARM920T CPU core, the S3C2410 also has several coprocessors to help the main CPU complete some special functions. Operations on MMU, TLB and Cache involve coprocessors. The format is as follows:
MRC //Get data from the coprocessor and pass it to the ARM920T CPU core register
MCR //Data is passed from the ARM920T CPU core register to the coprocessor
{cond} //Execution condition, unconditional execution when omitted
p# //Coprocessor serial number
Rd //Registers
cn and cm of the ARM920T CPU core //Registers in the coprocessor
Among them,
Sample code analysis:
Enable MMU and map virtual address 0xA0000000~0xA0100000 to physical address 0x56000000~0x56100000 (GPFCON physical address is 0x56000050, GPF DAT physical address is 0x56000054); map virtual address 0xB0000000~0xB3FFFFFF to physical address 0x30000000~0x33FFFFFF. This example maps addresses in a segment manner and only uses the first-level page table. From the above content, we can see that the first-level page table uses 4096 descriptors to represent 4G space (each descriptor corresponds to 1MB), each descriptor occupies 4 bytes, so the first-level page table occupies 16KB. The first 16KB of SDRAM is used to store the first-level page table, so the remaining memory start address is 0x30004000, which will eventually correspond to the virtual address 0xB0004000 (so the code running address is 0xB0004000).
★ Sample code for the main process of program execution.
.text
.global _start
_start:
bl disable_watch_dog @ Turn off WATCHDOG, otherwise the CPU will restart continuously
bl mem_control_setup @ Set up the memory controller to use SDRAM
ldr sp, =4096 @ Set the stack pointer. The following is the stack that needs to be set before calling the C function
bl copy_2th_to_sdram @ Copy the second part of the code to SDRAM
bl create_page_table @ Set up the page table
bl mmu_init @ Start MMU. After startup, the following codes will use virtual addresses
ldr sp, =0xB4000000 @ Reset the stack pointer to point to the top of SDRAM (using virtual address)
ldr pc, =0xB0004000 @ Jump to SDRAM and continue executing the second part of the code
halt_loop:
b halt_loop
★Set up the page table.
void create_page_table(void)
{
/*
* Some macro definitions for segment descriptors: [31:20] segment base address, [11:10] AP, [8:5] Domain, [3] C, [2] B, [1:0] 0b10 is the segment descriptor
*/
#define MMU_FULL_ACCESS (3 << 10) /* access permission AP */
#define MMU_DOMAIN (0 << 5) /* which domain Domain it belongs to*/
#define MMU_SPECIAL (1 << 4) /* must be 1 */
#define MMU_CACHEABLE (1 << 3) /* cacheable C bit*/
#define MMU_BUFFERABLE (1 << 2) /* bufferable B bit*/
#define MMU_SECTION (2) /* indicates that this is a segment descriptor*/
#define MMU_SECDESC (MMU_FULL_ACCESS | MMU_DOMAIN | MMU_SPECIAL | MMU_SECTION)
#define MMU_SECDESC_WB (MMU_FULL_ACCESS | MMU_DOMAIN | MMU_SPECIAL | MMU_CACHEABLE | MMU_BUFFERABLE | MMU_SECTION)
#define MMU_SECTION_SIZE 0x00100000 /*Each segment descriptor corresponds to 1MB of space*/
unsigned long virtuladdr, physicaladdr;
unsigned long *mmu_tlb_base = (unsigned long *)0x30000000; /*SDRAM start address for storing page table*/
/*
* The starting physical address of Steppingstone is 0, and the starting running address of the first part of the program is also 0. In order to run the first part of the program after turning on the MMU, map the virtual addresses of 0 to 1M to the same physical address
*/
virtuladdr = 0;
physicaladdr = 0;
/*The virtual address [31:20] is used to index the first-level page table and find its corresponding descriptor, corresponding to (virtualaddr>>20). [31:20] in the segment descriptor stores the physical address of the segment, corresponding to (physicaladdr & 0xFFF00000)*/
*(mmu_tlb_base + (virtuladdr >> 20)) = (physicaladdr & 0xFFF00000) | MMU_SECDESC_WB;
/*
* 0x56000000 is the starting physical address of the GPIO register. The physical addresses of the two registers GPBCON and GPBDAT are 0x56000010 and 0x56000014. In order to operate GPBCON and GPBDAT with addresses 0xA0000010 and 0xA0000014 in the second part of the program,
* map the 1M virtual address space starting from 0xA0000000 to the 1M physical address space starting from 0x56000000
*/
virtuladdr = 0xA0000000;
physicaladdr = 0x56000000;
*(mmu_tlb_base + (virtuladdr >> 20)) = (physicaladdr & 0xFFF00000) | MMU_SECDESC;
/*
* The physical address range of SDRAM is 0x30000000~0x33FFFFFF, map the virtual address 0xB0000000~0xB3FFFFFF to the physical address 0x30000000~0x33FFFFFF, a total of 64M, involving 64 segment descriptors
*/
virtuladdr = 0xB0000000;
physicaladdr = 0x30000000;
while (virtuladdr < 0xB4000000)
{
*(mmu_tlb_base + (virtuladdr >> 20)) = (physicaladdr & 0xFFF00000) | MMU_SECDESC_WB;
virtuladdr += MMU_SECTION_SIZE;
physicaladdr += MMU_SECTION_SIZE;
}
}
★ Start MMU.
void mmu_init(void)
{
unsigned long ttb = 0x30000000;
__asm__(
"mov r0, #0\n"
"mcr p15, 0, r0, c7, c7, 0\n" /* invalidate ICaches and DCaches */
"mcr p15, 0, r0, c7, c10, 4\n" /* drain write buffer on v4 */
"mcr p15, 0, r0, c8, c7, 0\n" /* invalidate instruction and data TLB */
"mov r4, %0\n" /* r4 = page table base address */
"mcr p15, 0, r4, c2, c0, 0\n" /* set page table base register */
"mvn r0, #0\n"
"mcr p15, 0, r0, c3, c0, 0\n" /* Set the domain access control register to 0xFFFFFFFF, no permission check is performed*/
/*
* For the control register, read its value first, modify the bits of interest based on this, and then write it
*/
"mrc p15, 0, r0, c1, c0, 0\n" /* Read the value of the control register*/
/* The lower 16 bits of the control register mean: .RVI ..RS B... .CAM
* R : Indicates the algorithm used when swapping out entries in the cache, 0 = Random replacement; 1 = Round robin replacement
* V : Indicates the location of the exception vector table, 0 = Low addresses = 0x00000000; 1 = High addresses = 0xFFFF0000
* I : 0 = Disable ICaches; 1 = Enable ICaches
* R, S : Used together with the descriptor in the page table to determine memory access permissions
* B : 0 = CPU is little endian; 1 = CPU is big endian
* C : 0 = Disable DCaches; 1 = Enable DCaches
* A : 0 = Do not perform address alignment check on data access; 1 = Perform address alignment check on data access
* M : 0 = Disable MMU; 1 = Enable MMU
*/
/*
* Clear unneeded bits first, then reset them if needed later
*/
/* .RVI ..RS B... .CAM */
"bic r0, r0, #0x3000\n" /* ..11 .... .... .... Clear V, I bits*/
"bic r0, r0, #0x0300\n" /* .... ..11 .... .... Clear R, S bits*/
"bic r0, r0, #0x0087\n" /* .... .... 1... .111 Clear B/C/A/M */
/*
* Set needed bits
*/
"orr r0, r0, #0x0002\n" /* .... .... .... ..1. Enable alignment check */
"orr r0, r0, #0x0004\n" /* .... .... .... .1.. Enable DCaches */
"orr r0, r0, #0x1000\n" /* ...1 .... .... .... Enable ICaches */
"orr r0, r0, #0x0001\n" /* .... .... .... ...1 Enable MMU */
"mcr p15, 0, r0, c1, c0, 0\n" /* Write the modified value to the control register */
: /* No output */
: "r" (ttb) );
}
Previous article:S3C2440 reads and writes large capacity SD cards
Next article:ARM development step by step: SDRAM programming example
Recommended ReadingLatest update time:2024-11-16 21:57
- Popular Resources
- Popular amplifiers
- Software and Hardware Fusion (by Huang Chaobo)
- In-depth exploration of embedded operating system design, architecture and development from scratch (written by Peng Dong)
- Chip Manufacturing: A Practical Tutorial on Semiconductor Process Technology (Sixth Edition)
- Writing embedded operating systems step by step--ARM programming methods and practices
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- CCS configures include path and lib path
- IKS01A3 driver transplantation, LPS22HH pressure and temperature detection for STM32G474RE
- Problems with TPA series amplifier chips
- Has the 38 Degree Fever forum been blocked?
- MSP432 learning experience: system tick timer
- [Perf-V Review] First Look at the Perf-V Development Board
- [RVB2601 Creative Application Development] Experience Sharing 4: ADC Multi-channel Test + LVGL Display
- Shenmu 51 single-chip microcomputer graphical programming software: Due to limited personal energy,...
- APWM routine
- Solution to the problem that the STM32 ST-LINK Utility cannot connect to the microcontroller "Can not connect to target!"