DSP RISC Development of Single Processor VoIP[Copy link]
This post was last edited by Aguilera on 2018-3-20 23:49 The continuous integration of various services into the IP network environment has prompted the industry to develop various innovative Voice over IP (VoIP) terminal products, including IP phones, commercial and home VoIP gateways, and wireless IP phones. The market has also begun to move towards the integration of IP systems with voice functions, such as handheld devices such as PDAs, automobiles, global satellite positioning systems, and other devices. Although some systems with higher density voice channels still require traditional multi-processors and independent dedicated RISC and DSP cores, more designs are facing the constraints of cost, power consumption and complexity. , Use a single processor architecture to achieve the best effect. In addition, competitive pressure and urgent time to market have stimulated system designers to urgently need a complete single-processor VoIP platform to help them overcome the integration challenges between different processors. Single-processor VoIP designs can help companies achieve overall goals for cost, power, efficiency, and time to market. However, replacing standalone DSPs with a single component still presents performance challenges. Voice processing algorithms, such as ITU-T compatible voice codecs that support voice compression and decompression, Line Echo Cancellation, Voice Activity Detection (VAD), and Comfort Noise Generation (CNG), can create significant signal processing requirements. In addition, the processor core must handle telecommunication algorithms such as DTMF, dial tone generation, caller ID, quality of service (QoS), user interface functions (display, dial keys, ringtones, etc.), and APIs to connect to external applications. Because real-time performance accuracy is critical to voice applications, developers cannot simply rebuild existing DSP applications on a standard RISC core and expect to achieve optimal performance results. A successful single-core VoIP system requires a combination of DSP-oriented enhancements as the foundation of the RISC hardware and the development of innovative software best practices to fully utilize the processor's capabilities. The following article will explore how HelloSoft uses the ARM9E(tm) series processor core and various DSP enhancements to achieve these goals. DSP Enhancements in the Core To build a viable single-processor VoIP platform, you must first select the right RISC core to handle the various signal processing functions. HelloSoft's reference design uses the ARM926EJ-S(tm) processor core's DSP extensions embedded directly into the RISC processor's architecture. Its special internal enhancements include single-cycle 16x16 and 32x16 Multiple Accumulate (MAC) functions, saturation functions (such as saturating add, saturating double add, and saturating subtract functions), and Count Leading Zeros (CLZ) instructions. These enhanced instructions can be used to quickly develop stable control loops and bit-exact precision algorithms to meet the needs of various advanced signal processing systems, such as voice codecs, echo cancellation, etc. The CLZ function improves fixed decimal point arithmetic and division operations (as shown in Figure 1).
The DSP Enhanced Extension technology avoids major changes to the core's mature five-stage pipeline and Harvard memory architecture, so the impact on hardware resources can be minimized. This set of technologies does not increase registers or states, nor does it increase the use of registers. Only a small number of blocks are added to the ARM9E series data path, including a high-speed 32x16 multiplier, CLZ blocks, and two sets of saturation operation blocks. Therefore, the operation of the ARM926EJ-S core is closely related to the performance of other ARM9 cores (the ARM9 core uses a 0.13 micron native process and provides a clock speed of more than 220MHz). The ARM9E series extension components are also compatible with the DSP extension components in other ARM series cores, such as the ARM10E(tm) series and the ARM11(tm) series. This feature provides a solid foundation for R&D companies to build high-performance, low-power, single-processor VoIP systems, and provides optimized R&D flexibility and a channel for new technology transfer and upgrades. DSP software efficiency created by manual development Developing efficient VoIP code is not just about rebuilding existing DSP algorithms on the RISC core. Because DSP functions are originally extremely processor-dependent and must be written in an assembly language to fully utilize the hardware functions. Therefore, in addition to using the ARM9E series DSP extension technology, the VoIP functions are all manually written to fully utilize the resources of the underlying ARM9E series processor to create an excellent system that only requires 17MHz bandwidth to implement G.729AB codec, while G.168/16ms line echo cancellation only requires 15MHz bandwidth. It is now easier to implement voice processing algorithms on dedicated DSP processors because today's DSP hardware usually has enough burst pipeline processing capabilities to overcome certain software inefficiencies. Because DSPs execute multiple sets of operations simultaneously through a single instruction, software designers do not need to spend too much time on processing loop sequence and timing, or considering the amount of data loaded. In contrast, implementing various VoIP functions on DSP-optimized RISC processors requires a thorough understanding of key hardware-related issues such as data flow, loop timing, cross-loop sorting, and data loading efficiency. The unique advantage of the ARM9E series is that its 32x16 MAC can process 32-bit data in registers and two independent 16-bit operands. In addition to providing a compatible environment for many DSP functions that contain traditional 16-bit operations, the 32x16 MAC architecture also provides optimized data loading efficiency and can effectively utilize the processor's registers. Compared with other 32-bit RISC architectures, software can use the 32x16 MAC in the ARM9E series devices to help improve overall data loading efficiency by 4 times. Although the amount of program memory required in RISC implementations is higher than that of traditional DSPs, VoIP designs built with single-processor ARM9E series devices do not require a large amount of on-chip memory to support various DSP functions. Developers can significantly reduce the overall memory and power costs of single-core ARM926EJ-S processor implementations by using low-cost memory resources, such as external SRAM on the chip and smaller on-chip cache. For example, the 8K-byte instruction and data cache in the reference design can provide ample processing bandwidth for two standard VoIP channels. HelloSoft's voice algorithm can reduce resource consumption by 35% to 40% when loading data. This is because its intelligent function can automatically suspend and readjust processing loops, improve availability and the possibility of data reuse, and also improve the efficiency of the ARM9E series 16-bit MAC processing resources. In addition, HelloSoft's voice algorithm uses pre-stored values and data elements for specific calculation operations to reduce the overall calculation load. Another advantage of the ARM9E architecture is the counter (pointer) with automatic increment function. Using this function can save two cycles in each data loading operation. This function is very important for building standard VoIP functions, for example: in G.In the 729AB voice codec, the calculation speed is up to 10 million MACs per second. If the auto-increment counter can be used at this time, it means that 2 million cycles per second can be saved. In addition to independent instruction and data caches, the ARM926EJ-S processor core also implements Tightly Coupled Memories (TCM) memory. The DSP algorithms implemented by Hellosoft make extensive use of these TCMs as temporary RAM, which can efficiently access frequently used data segments, thereby eliminating the possibility of cache misses in critical intensive calculation loops. Cost reduction and design efficiency at the system level The DSP subsystem on the ARM9E series core implements voice codec, echo cancellation, VAD and other signal processing functions, which can help HelloSoft reference solution combine all VoIP subsystems into a single processor architecture (as shown in Figure 2). Important elements of this architecture include the DSP subsystem, quality of service (QoS), dial signaling and management, and all other high-level system functions, such as: GUI graphical interface, platform management and IP network interface layer. Single processor VoIP phone implementations can reduce component costs by at least $5 to $10 by eliminating the need for a separate DSP. In addition, developing DSP programs, signal stacks, and OS functions in the same processor environment results in a straightforward and robust implementation. The Memory Management Unit (MMU) built into the ARM926EJ-S core helps the design work with OSes such as Embedded Linux and WinCE. Hellosoft's design uses the SIP and RTP protocol stacks with the VoiceOS(tm) architecture. In the reference design, these OS-independent protocols are implemented using an open source Embedded Linux kernel, resulting in more efficient use of hardware resources and the ability to tune the infrastructure for different OS/RTOS environments, including VxWorks and WinCE. Hellosoft's VoiceOS is a system-level architecture that provides a streamlined abstraction layer that integrates the DSP subsystem, protocol stack, media processing functions, and interfaces to the OS and ARM9E processor hardware platforms, thereby simplifying migration to other operating systems and ARM hardware platforms. VoiceOS also provides a flexible abstraction layer that can be expanded to support a variety of new functions and interfaces, and implement "voice as a service" voice functions in various IP systems.
High-efficiency solutions that can support ultra-low-cost terminal devices and integrate various voice services with other devices will be widely adopted by the VoIP market. These solutions must rely on a single-core VoIP processing platform to help system designers meet the tight cost, power and product size constraints while shortening product development cycles and time to market.