Cache Coherence

fish001

Cache Coherence [Copy link]

1. Configure cache
> Configure L1 Cache:
CACHE_L1pSetSize(); CACHE_L1dSetSize();

>Configure L2 cache:
By default, L2 cache is disabled at startup, and all L2 is SRAM. If DSP/BIOS is enabled, L2 cache is automatically enabled; otherwise, you can enable L2 cache by calling CSL command: CACHE_L2SetSize().

> Cacheability of external memory
L1D and L2 can control the cacheability of external memory segments by calling the CSL command CACHE_enableCaching(CACHE_MARi) to modify the corresponding MAR-bit; for L1P, external memory is always cacheable, regardless of MAR.

Note: The address segment configured as cache cannot contain data and code, because the cache address segment is not included during linking. If you want to use L1D SRAM or L1P SRAM, you should reduce the cache segment size accordingly.

>example for C64x+ Linker Command File
MEMORY
{
L2SRAM : origin = 00800000h length = 001C0000h
CE0 : origin = 80000000h length = 01000000h (DDR第一个16M配置成可cache)
}
SECTIONS
{
.cinit > L2SRAM
.text > L2SRAM
.stack > L2SRAM
.bss > L2SRAM
.const > L2SRAM
.data > L2SRAM
.far > L2SRAM
.switch > L2SRAM
.sysmem > L2SRAM
.tables > L2SRAM
.cio > L2SRAM
.external > CE0
}

>example for C64x+ CSL Command Sequence to Enable Caching
[
#include <csl.h>
#include <csl_cache.h>
...

CACHE_enableCaching(CACHE_CE00);
CACHE_setL2Size(CACHE_256KCACHE);
]

2. Cache consistency issues

If the memory is shared, accessible by cache, and modified, then there will be a problem with cache consistency maintenance for this memory. For our current simple codec porting, there is sharing between ARM and DSP, but there is no problem of simultaneous modification, so there is no need for consistency maintenance. For C64x+ DSPs, its cache controller can automatically maintain the consistency of data accessed by CPU EDMA/IDMA based on the snoop command using the hardware cache consistency protocol. The consistency maintenance mechanism is activated when DMA initiates read and write commands. When DMA reads L2 SRAM cache memory, the data is submitted directly from the L1D cache to DMA without being updated in L2 SRAM; when DMA writes, the data is submitted directly to the L1D cache and the L2 SRAM is updated.

>In addition to consistency operations, for DMA buffers, it is best to align them according to the L2 cacheline and ensure that they are an integer multiple of the cachelines size. This can be achieved by:
#pragma DATA_ALIGN(InBuffA, CACHE_L2_LINESIZE) #pragma DATA_ALIGN
(InBuffB, CACHE_L2_LINESIZE)
#pragma DATA_ALIGN(OutBuffA,CACHE_L2_LINESIZE)
#pragma DATA_ALIGN(OutBuffB,CACHE_L2_LINESIZE)

unsigned char InBuffA [N*CACHE_L2_LINESIZE];
unsigned char OutBuffA[N*CACHE_L2_LINESIZE];
unsigned char InBuffB [N*CACHE_L2_LINESIZE];
unsigned char OutBuffB[N*CACHE_L2_LINESIZE];
We can also call the CSL macro CACHE_ROUND_TO_LINESIZE(cache, element count,element size) to define an array to complete the above function. The first parameter is the cache type, which can be L1D, L1P, or L2:
unsigned char InBuffA [CACHE_ROUND_TO_LINESIZE(L2, N, sizeof(unsigned char)];
unsigned char OutBuffA[CACHE_ROUND_TO_LINESIZE(L2, N, sizeof(unsigned char)];
unsigned char InBuffB [CACHE_ROUND_TO_LINESIZE(L2, N, sizeof(unsigned char)];
unsigned char OutBuffB[CACHE_ROUND_TO_LINESIZE(L2, N, sizeof(unsigned char)];

>A line cached in L1D/L1P may not be cached in L2; a line may be sacrificed from L2 but still be stored in L1P/L1D cache.

3. Changing Cache Configuration in Run-time

>Disabling External Memory caching

This is not usually necessary. If you want to do it, you should consider the following: If MAR is changed from 1 to 0, the addresses originally cached in the external memory are still in the cache (cache copy), so accessing these external addresses in cache mode will still hit. Only when L2 misses, or L2 is full SRAM (this situation can also be understood as L2 miss), when accessing external memory, the changed MAR bit will take effect.

>changing cache sizes during RUN-TIME

Take L2 application as an example. There are two tasks: A/B. For A, 64K L2 SRAM is the best; for B, 32K L2 cache/32K L2 SRAM is the best. Divide 64K L2 into two 32K segments. Assume that the second 32K contains A's program, some global variables (which need to be saved during B's execution phase), and some variables of A, which are no longer needed after task switching.

Use DMA to move the code and global variables that need to be saved to an area of external memory. In this way, the cache mode can be switched. The cache controller will automatically writeback-invalidate all cache lines before initializing the cache of the new size. Note: Changing the L2 cache size will not cause any evictions of the L1P/L1D cache. The size can be modified by calling CACHE_setL2Size().

After task B is executed, it needs to switch back to configuration A. The 32K L2 cache in B needs to be switched back to SRAM. Before that, the 32K line frames must be written back to external memory and invalidated. When the cache size is switched, these tasks are automatically completed by the cache controller. In this way, the code and some global variables of A can be copied back to the original location from outside the chip.

The above application also applies to L1P/L1D. The corresponding procedure, linker command file and C code example are as follows:

Procedure:
More Cache (SRAM转化成cache)
1. DMA or copy needed code/data out of SRAM addresses to be converted to cache.
2. Wait for completion of step 1.
3. Increase cache size using CACHE_setL1pSize(), CACHE_setL1dSize(),or CACHE_setL2Size()
Less Cache(cache转化成SRAM)
1. Decrease Cache size using CACHE_setL1pSize(),CACHE_setL1dSize(),or CACHE_setL2Size()
2. DMA or copy back any code/data needed.
3. Wait for completion of step 2.

linker command file:
MEMORY
{
L2_1: o = 00800000h l = 00008000h /*1st 32K segment: always SRAM */
L2_2: o = 00808000h l = 00008000h /*2nd 32K segment:Task A-SRAM,Task B-Cache */
CE0 : o = 80000000h l = 01000000h /*external memory */
}
SECTIONS
{
.cinit > L2_1
.text > L2_1
.stack > L2_1
.bss > L2_1
.const > L2_1
.data > L2_1
.far > L2_1
.switch > L2_1
.sysmem > L2_1
.tables > L2_1
.cio > L2_1
.sram_state_A > L2_2
.sram_process_A > L2_2
.sram_local_var_A > L2_2
.external > CE0
}