Digital Signal Processing As a case study, let's consider one of the most common functions in the digital world: filtering. Simply put, filtering is the processing of a signal to improve its characteristics. For example, filtering can remove noise or static interference from a signal, thereby improving its signal-to-noise ratio. Why use a microprocessor instead of analog devices to filter signals? Let's look at the advantages: The performance of analog filters (or more generally, analog circuits) depends on environmental factors such as temperature. Digital filters, on the other hand, are basically unaffected by the environment. Digital filtering is easy to replicate within very small tolerances because its performance does not depend on the combination of components . Once an analog filter is manufactured, its characteristics (such as the passband frequency range) are not easy to change. Using a microprocessor to implement a digital filter, the filter characteristics can be changed by reprogramming it. Comparison of signal processing methods Comparison factors Analog method Digital method Modify the design flexibility Modify the hardware design, or adjust the hardware parameters Change the software settings Accuracy Component accuracy A/D bit length and computer word length, algorithm reliability and repeatability Affected by environmental temperature, humidity, noise, electromagnetic fields, etc. Large- scale integration Although there are some analog integrated circuits, they are less in variety, less integrated, and more expensive DSP devices are small in size, powerful, low in power consumption, good in consistency, easy to use, and high in performance/price ratio Real-time Except for the delay introduced by the circuit, the processing is real-time and is determined by the processing speed of the computer High-frequency signal processing can process microwave, millimeter wave and even light wave signals According to the requirements of the Nye criteria, it is limited by S/H, A/D and processing speed Digital Signal Processor Classification of microprocessors General-purpose processors (GPP) use von Neumann structure, and the storage space for programs and data is combined into one 8-bit Apple (6502), NEC PC-8000 (Z80) 8086/286/386/486/Pentium/Pentium II/ Pentium III PowerPc 64-bit CPU (SUN Sparc, DEC Alpha, HP) CISC Complex Instruction Computer, RISC Reduced Instruction Computer Take various methods to improve computing speed, increase clock frequency, high-speed bus, multi-level Cashe, coprocessor, etc. Single Chip Computer/ Micro Controller Unit (MCU) In addition to the ALU and CU of the general CPU, there are also memory (RAM/ROM) registers, clock, counter, timer, serial/parallel port, and some also have A/D, D/A INTEL MCS/48/51/96 (98) MOTOROLA HCS05/011 DSP adopts Harvard structure, program and data are stored separately. A series of measures are adopted to ensure the processing speed of digital signals, such as special optimization of FFT. Simple comparison between MCU and DSP MCU DSP low-end high-end low-end high-end instruction cycle (ns) 600 40 50 5 Multiply-Add Time (ns) 1900 80 50 5 US$/MIPS 1.5 0.5 0.15 0.1 Comparison of DSP Processors and General Purpose Processors Consider an example of digital signal processing, such as a finite impulse response filter (FIR). In mathematical terms, an FIR filter is a series of dot products. It takes an input quantity and an ordinal vector, multiplies the coefficients and a sliding window of input samples , and then adds all the products together to form an output sample. Similar operations are repeated in large quantities in the digital signal processing process, making it necessary for devices designed for this purpose to provide specialized support , which has led to the divergence of DSP devices from general purpose processors (GPPs): 1 Support for intensive multiplication operations GPPs are not designed to do intensive multiplication tasks. Even some modern GPPs require multiple instruction cycles to do a multiplication. DSP processors use specialized hardware to implement single-cycle multiplication. DSP processors also add accumulator registers to handle the sum of multiple products. The accumulator register is usually wider than the other registers, adding extra bits called result bits to avoid overflow. Also, to take full advantage of the dedicated multiply-accumulate hardware, almost all DSP instruction sets include explicit MAC instructions. 2 Memory Architecture Traditionally, GPPs use the von Neumann memory architecture. In this architecture, there is only one memory space connected to the processor core by a set of buses (one address bus and one data bus). Typically, four memory accesses , taking at least four instruction cycles. Most DSPs use the Harvard architecture, which divides the memory space into two, one for program and one for data. They have two sets of buses connected to the processor core, allowing them to be accessed simultaneously. This arrangement doubles the bandwidth of the processor memory and, more importantly, provides the processor core with both data and instructions. With this layout, DSPs are able to implement single-cycle MAC instructions. Another problem is that today's typical high-performance GPPs actually contain two on-chip caches, one for data and one for instructions, which are directly connected to the processor core to speed up runtime access. Physically, this dual memory and bus structure on-chip is almost identical to the Harvard architecture. Logically, however, there are important differences . GPP uses control logic to determine which data and instruction words are stored in the on-chip cache, without the programmer specifying (and possibly not knowing) this. In contrast, DSP uses multiple on-chip memories and multiple sets of buses to ensure multiple memory accesses per instruction cycle. When using DSP, the programmer explicitly controls which data and instructions are stored in the on-chip cache. In the on-chip memory. When programming, programmers must ensure that the processor can use its dual buses effectively. In addition, DSP processors rarely have data caches. This is because the typical data of DSP is a data stream. That is , after the DSP processor calculates each data sample, it is discarded and rarely reused. 3 Zero-overhead loops If you understand a common feature of DSP algorithms, that is, most of the processing time is spent on executing small loops, it is easy to understand why most DSPs have dedicated hardware for zero-overhead loops. The so-called zero-overhead loop means that the processor does not spend time checking the value of the loop counter, conditionally transferring to the top of the loop, and decrementing the loop counter by 1 when executing the loop. In contrast, GPP loops are implemented in software. Some high-performance GPPs use branch prediction hardware to achieve almost the same effect as hardware-supported zero-overhead loops. 4 Fixed-point calculations Most DSPs use fixed-point calculations instead of floating-point. Although DSP applications must pay great attention to numerical accuracy, it should be much easier to do it with floating-point, but for DSPs, cheapness is also very important. Fixed-point machines are cheaper (and faster) than their floating-point counterparts. To avoid using floating-point machines while still maintaining numerical accuracy, DSP processors support saturation, rounding, and shifting in both the instruction set and hardware. 5 Specialized Addressing Modes DSP processors often support specialized addressing modes that are useful for common signal processing operations and algorithms. For example, block (circular) addressing (useful for implementing digital filter delay lines) and bit-reversal addressing (useful for FFTs). These very specialized addressing modes are not often used in GPPs and are implemented only in software. 6 Predicting Execution Time Most DSP applications (such as cell phones and modems) are strictly real-time applications where all processing must be completed within a specified time. This requires the programmer to determine exactly how much processing time is required for each sample, or at least how much time is required in the worst case. If you are planning to use a low-cost GPP for real-time signal processing tasks, predicting execution time will probably not be a problem because low-cost GPPs have a relatively straightforward structure and predictable execution time. However, most real-time DSP applications require processing power that low-cost GPPs cannot provide. The advantage of DSPs over high-performance GPPs is that even in DSPs that use caches, the programmer (not the processor) determines which instructions are placed into the cache, so it is easy to determine whether instructions are read from cache or from memory . DSPs generally do not use dynamic features such as branch prediction and speculative execution. Therefore, it is completely straightforward to predict the execution time required for a given piece of code. This allows the programmer to determine the performance limitations of the chip. 7 Fixed-point DSP instruction sets Fixed- point DSP instruction sets are designed with two goals in mind: to enable the processor to complete multiple operations per instruction cycle, thereby improving the computational efficiency of each instruction cycle. To minimize the memory space required to store the DSP program (this issue is particularly important in cost-sensitive DSP applications because memory has a significant impact on the cost of the entire system ). To achieve these goals, the instruction set of DSP processors usually allows the programmer to specify several parallel operations in a single instruction. For example, a single instruction contains a MAC operation, which is one or two data moves at the same time. In a typical example , a single instruction contains all the operations required to compute one section of an FIR filter. The price paid for this high efficiency is that its instruction set is neither intuitive nor easy to use (compared to the GPP instruction set). GPP programmers usually do not care whether the processor's instruction set is easy to use, because they generally use high-level languages such as C or C++. Unfortunately for DSP programmers, the majority of DSP applications are written in assembly language (at least partially assembly language optimized). There are two reasons for this: First, most widely used high-level languages, such as C, are not suitable for describing typical DSP algorithms. Second, the complexity of the DSP structure, such as multiple memory spaces, multiple buses, irregular instruction sets, and highly specialized hardware, makes it difficult to write efficient compilers for it. Even if a compiler is used to compile C source code into DSP assembly code, the optimization task is still heavy. Typical DSP applications have large computational requirements and strict overhead constraints, making program optimization essential (at least for the most critical parts of the program). Therefore, a key factor in considering the selection of DSP is whether there are enough programmers who can adapt well to the DSP processor instruction set. 8 Development Tool Requirements Because DSP applications require highly optimized code, most DSP vendors provide some development tools to help programmers complete their optimization work. For example, most vendors provide processor simulation tools to accurately simulate the processor's activities at each instruction cycle. These are very useful tools for both ensuring real-time operation and code optimization. GPP vendors usually do not provide such tools, mainly because GPP programmers usually do not need this level of detailed information. The lack of GPP simulation tools that are accurate to the instruction cycle is a big problem faced by DSP application developers: since it is almost impossible to predict the number of cycles required for a high-performance GPP for a given task, it is impossible to explain how to improve the performance of the code.
|