Why does TI C6000 need cache?

Aguilera

Why does TI C6000 need cache? [Copy link]

Large-capacity memory (such as DRAM) has limited access speed, which is generally much slower than the CPU clock speed; small-capacity memory (such as SRAM) can provide fast access speed. Therefore, many high-performance processors provide a hierarchical storage access architecture.

As shown in Figure 3, the left and right sides are a flat memory architecture and a multi-layer memory architecture with 2-layer cache. In the architecture on the left, even if the CPU can run at 600MHz, since the on-chip/off-chip memory can only run at 300MHz/100MHz, the CPU needs to insert a wait cycle when accessing the memory.

Figure 3 Flat and hierarchical memory architectures

Cache part working status description

Cache hit: For programs/data that have been cached, access will cause a cache hit, and the instructions/data in the cache will be immediately sent to the CPU without waiting.
Cache miss: When a cache miss occurs, the required instructions/data are first read in through EMIF. The instructions/data are stored in the cache while being sent to the CPU. The CPU is suspended during the process of reading the program/data.
Cache flush: Clear the cached data.
Cache freeze: The cache content no longer changes. When a miss occurs, the instruction packets read from the EMIF will not be stored in the cache at the same time.
Cache bypass: The cache contents will not change, and any program/data will be accessed from the memory outside the cache.

C6000 storage architecture

The C6000 series DSP provides two layers of cache, L1 and L2, between the on-chip RAM and the CPU. Each layer of cache is divided into independent program cache and data cache. L1 is fixed, and L2 can be remapped to ordinary on-chip RAM.

When accessing a program or data, the CPU first searches the L1 Cache. If a hit occurs, it accesses the data directly. If a miss occurs, it continues to search the L2 Cache. If a hit occurs, it searches the on-chip RAM or off-chip RAM for the data.

Figure 4 Program/data access flow of C6000 CPU

The rules of access positioning

As shown in Figure 4, to ensure the CPU's storage access efficiency, it is only effective when the CPU only accesses the storage area closest to it. Fortunately, this can be guaranteed according to the law of access positioning. The law of access positioning shows that the program only needs a relatively small size of data and code in a relatively small time window. Two laws of data positioning:

Spatial association: When a piece of data is accessed, its adjacent data is likely to be accessed by subsequent storage.
Time correlation: When a storage area is accessed, it will be accessed again at the next nearby time point.

Optimizing cache performance

Based on the rules of access positioning, some basic principles for optimizing cache performance can be summarized:

Let the function process the data as fully as possible to improve data reuse.
Organize data and code to improve cache hit rates.
Reasonable space division to balance program cache and data cache.
Group functions that operate on the same data in one storage area.

Segment [1,6]

The smallest unit of an object file (.obj) is called a segment, which is a block of code or data occupying a continuous space. One of the functions of the connector is to relocate the segment to the memory map of the target system. All segments can be relocated independently, and the user can place any segment into any specified block of the target memory.

A COFF file contains three default sections: .text, .data, and .bss. Users can also create, name, and connect their own sections, and can continue to divide subsections in each section.

In C/C++ code, there are two precompiled statements that can be used to allocate specific code or data to a specific segment:

CODE_SECTION: Assign a section to the code.
DATA_SECTION: Allocate a segment for data.

Stack and Heap[1,6]

The stack (.stack) and heap (.heap) are two storage areas that provide support for the processor runtime.

The stack is a variable storage area that is allocated by the compiler when needed and automatically cleared when not needed. It is used to store temporary data such as local variables and function parameters.

The heap is used for dynamic memory allocation. The heap is located between the bss area and the stack area in memory. It is usually allocated and released by the programmer. If the programmer does not release it, it may be reclaimed by the OS when the program ends. For example, the malloc() function commonly used in C allocates an area in the heap to store data.