Architecture Design of Embedded DSP Processor μDSP-EEWORLD

Collect

In recent years, my country's electronic information industry and market have grown rapidly, and the demand for DSP chip products has continued to increase. Although some integrated circuit design companies are engaged in the development and application of DSP systems and related products, in the research and development of DSP chips, only some universities and research institutes have done pre-research projects, and have not yet formed their own independent intellectual property technology. Therefore, it has an immeasurable role in the design of DSP processors, and the design of the architecture is the soul of processor design. The design of the processor starts with the design of the architecture. The architecture of the DSP processor has been closely improved and optimized around the continuous development of DSP algorithms and various applications. With the emergence of various parallel processing technologies (VLIW, SIMD, superscalar, multi-processor, etc.), reconfigurable technology and low-power architecture technology, various new DSP processor architectures have emerged continuously, making the performance of today's DSP processors continue to improve, and making them widely used in many embedded real-time fields such as communication, automatic control, radar, meteorology, navigation, and robots. These fields all require processors to be high-speed and low-power. Therefore, based on a comprehensive investigation of the latest developments in the current DSP architecture, we have designed a 16-bit embedded fixed-point DSP processor μDSP architecture, taking into account the requirements of low power consumption, low cost, and high performance, as shown in Figure 1.

The following is a detailed description of the architecture design of μDSP in terms of its bus structure, pipeline design, special instruction system, addressing mode, powerful control components and high-speed computing units.

1. Improved Harvard bus structure Since

DSP processors are mainly used for various data operations, the data throughput is very high. In particular, when performing multiplication and accumulation operations, 2 or even 3 data are required to participate in the operation at the same time. The traditional von Neumann structure cannot meet the needs of data and instruction access. Therefore, DSP processors generally adopt the Harvard bus structure. Although the Harvard structure uses separate program memory and data memory to meet the needs of simultaneous access to data and instruction fetching, this structure lacks the flexibility of instruction and data storage space, so the memory space cannot be flexibly and fully used.

In order to further improve performance, μDSP adopts an improved Harvard structure. While using separate program memory and data memory, it allows data to be stored in the program memory, and the data memory and program memory are uniformly addressed. The size of the program memory is 32K*24bit (the instruction is 24 bits wide), and the size of the data memory is 32K*16bit (the data bit width is 16).

It adopts a 4-bus structure, which are the program address bus (PMA), program data bus (PMD), data address bus (DMA), and data data bus (DMD). The memory uses synchronous SRAM, and uses pipeline mode to access data and instructions. A two-stage pipeline is required to complete one access. In order to further improve the access flexibility, the program bus can fetch instructions and access data, and can access both program memory and data memory. In addition, in order to reduce the access cycle of the memory, an on-chip instruction cache is added to store commonly used instructions. The size of the cache is 64*43bit, which can store 64 instruction-address pairs. The instruction cache adopts a set-associative mapping method and adopts the least recently used (Least Recently Used) replacement strategy.

This improved Harvard structure not only improves the efficiency of instruction and data access, but also improves the utilization of memory. It is a better structure.

2. Six-stage pipeline design

Pipeline design is the core of modern processor design. The design of the pipeline needs to consider many factors, such as the balance of each stage of the pipeline, the throughput of the pipeline, and the structural complexity of the pipeline. We adopt a six-stage pipeline design. The name of each stage and the functions to be completed are as follows:

(1) Look-Ahead Address (LA): In this stage, the program controller selects the address of the instruction entering the pipeline at this clock from various address sources and puts the instruction address on the PMA bus. It is also used to solve the bus conflict problem. Because the PMA bus may be used by both the LA and AD stages at the same time, when a bus conflict occurs, the program controller will check whether the instruction is in the cache. If it hits (Hit), the instruction is taken out of the cache in advance so that AD can use the bus; if it misses (Miss), AD is allowed to use the bus first, and the program controller obtains the PMA bus in the next cycle.

(2) Prefetch Address (PA): In this stage, the instruction address is sent to the memory and instruction fetching begins. Since a synchronous two-stage pipeline SRAM is used, the instruction is not fetched immediately in this stage and the instruction fetching process is not completed until the end of the next cycle.

(3) Fetch Address (FA): In this stage, the instruction is fetched from the memory through the PMD bus. Because the memory requires two cycles to complete the access, that is, it takes two cycles from the address being placed on the bus to the data being obtained, so the instruction fetch that started in the previous cycle ends in this cycle.

(4) Address Decode (AD): Some parts of the instruction are decoded, such as DAG operations. If the instruction requires memory data, the address of this data is placed on the appropriate address bus. At the same time, the undecoded part is sent to the next level.

(5) Instruction Decode (ID): This stage decodes the rest of the instruction. It is also used to wait for memory access, since fetching data also takes two cycles.

(6) Execute (PC): This stage executes the instruction, sets the status flags, and writes the result to the appropriate register.

3. Special instruction system

The design of the processor starts with the design of the instruction system. Different instruction systems also determine different processor structure designs. The instruction system of μDSP is very rich and can complete the functions required by various DSP algorithms. Generally speaking, the following four categories of instructions need to be designed: program flow control instructions, data movement instructions, operation instructions, and multi-function instructions. The following requirements must be met:

(1) 24-bit instruction width;
(2) High-density instruction encoding;
(3) Provide multi-function instructions so that one instruction can complete multiple operations;
(4) Support double-word instructions;
(5) Provide zero-overhead loop instructions;
(6) Compatible with mainstream DSP (ADI's ADSP219x series).

4. Flexible addressing mode

Due to the uniqueness of DSP algorithms, the ordinary addressing mode of general-purpose processors cannot meet the requirements, so DSP processors generally use many special and flexible addressing modes. There are 6 main addressing modes in μDSP: direct addressing, pre-indexed addressing, post-indexed addressing, circular addressing, bit-reversed addressing, and paging addressing. To implement these addressing modes, a data address generator (DAG) is designed, as shown in Figure 2. Considering that μDSP can access program memory and data memory at the same time, two DAGs are designed. The difference between them is that DAG1 can only generate data memory addresses, but has bit-reversed function; DAG2 can generate data memory addresses and program memory addresses, but has no bit-reversed function. It has 4 register groups: index register group (I Registers), modification register group (M Registers), length register group (L Registers) and base register group (B Registers). Each register group has 4 16-bit registers that can be read and written through the DMD bus. The I register group stores the actual address of the memory access, the M register group is used to store the address offset, and the L register group and the B register group are specifically used for circular addressing. The former stores the length of the data block in the circular addressing, and the latter stores the first address of the circular addressing. The part in the dotted box in the figure is unique to DAG1.

5. Powerful control unit

The control unit is an important component to complete the coordination between the various parts of the entire DSP processor. The control unit is mainly responsible for the generation of instruction addresses, pipeline control, and processing of various related, abnormal, interrupt and other tasks to ensure the normal operation of the data path. The task of the control unit is very arduous. Without a powerful control unit, the entire DSP processor will not work properly. The control unit of μDSP can be roughly divided into instruction address selection logic, pipeline control logic, loop control logic and interrupt controller according to its function.

6. High-speed operation unit

The operation unit is the execution unit of the DSP processor and is the core part of implementing various DSP algorithms. The implementation of all algorithms is composed of the basic functions of the operation unit. All other components such as control components and data paths serve the operation unit and provide various controls and sufficient data for the operation unit. μDSP has 3 powerful high-speed operation units: arithmetic logic unit (ALU), multiplication and accumulation unit (MAC), and shifter.

Processor design is a very complex task. The design of the architecture is the soul of processor design. The task facing the designer is very complex. It is necessary to determine the application target of the processor, determine which features are the most important according to the application target requirements, and then strive for the highest performance within the cost range.

The innovation of this article: mainly elaborates on the architecture design of μDSP, improves the Harvard bus structure, defines the name and function of the six-stage pipeline, points out the requirements that the special instruction system should meet, gives the structure diagram of the architecture of μDSP, and introduces the design of a data address generator in detail.

Reference address：Architecture Design of Embedded DSP Processor μDSP

Previous article：Design of MIG arc welding power supply control system based on DSC+MCU
Next article：FPGA Implementation of OFDM Equalizer in High-Speed Mobility

Popular Resources
Popular amplifiers