Design example of voice application solution for handheld devices based on TI's OMAP platform-EEWORLD

Collect

The OMAP platform provides a perfect solution for developing voice applications for personal handheld devices. This low-power OMAP architecture combines DSP signal processing functions for voice with the general system performance of a RISC processor. An open software architecture is designed to encourage the development of complementary applications such as voice engines, voice applications and multimedia. Development support, including voice recognizers and prototype applications, can help developers quickly build their own products and shorten time to market. The OMAP platform ensures that developers can seize the growth opportunities of personal handheld devices by easily and flexibly adding voice applications. The

application of voice technology is increasing, bringing rare opportunities for application developers to add high-value functions to handheld devices, mobile devices and wireless personal devices. Today's personal handheld device voice is mostly limited to voice dialing, but technologies have emerged for the wider development of voice recognition and text-to-speech applications. Developers who intend to add voice functions need to be familiar with all aspects of voice technology. These issues include not only processing and memory requirements, but also how specific platform architecture and support can facilitate the development process and shorten time to market. Adding value with voice applications can

bring rich potential benefits. According to estimates from various market research firms, the compound annual growth rate for personal handheld devices is expected to reach 20% over the next two years, with total global device shipments reaching 700 million units by 2004. To tap into this huge market with value-added voice applications, developers must turn to underlying technologies that give them high performance, low power consumption, and support that helps them quickly launch new products.

Voice capabilities provide users with a natural way to input and output, and are safer than other forms of I/O, especially when the user is driving. In most applications, voice is an ideal complement to keyboards and displays, not a replacement for them. For example, in very noisy environments, listening and speaking may not be practical, so users may have to rely on keyboard input and display reading. Similarly, users often prefer to enter certain things, such as PIN numbers and passwords, on the keyboard rather than speaking them out loud for others to hear.

Voice dialing is the most commonly used voice technology in today's personal wireless devices. Voice dialing usually allows calls without the need for hands and ears, which is a particularly important feature when driving. Voice dialing includes name dialing, which is to call a name in the address book, and number dialing, which is to speak a phone number. As shown in Figure 1, other potential voice applications include:

1. Voice email?D?D Includes browsing mailboxes, writing emails using voice input, and listening to emails read out.
2. Information retrieval?D?D Stock prices, headlines, flight information, weather forecasts, etc. can all be retrieved from the Internet via voice. For example, instead of first going to a website and typing in a stock name or browsing a predefined list, a user can command: "My stock quote, Texas Instruments."
3. Personal information management?D?D Allows users to specify appointments, view calendars, add contact information, etc. by voice.
4. Voice browsing?D?D Using a voice program menu, users can surf the web, add voice favorites, and listen to the web page content read out.
5. Voice navigation?D?D A full voice input/output driving system for obtaining navigation under automatic and eyes-not-enough conditions.

Speech Technology Issues

Speech systems must meet certain basic requirements for use. Obviously, the speech output must be clear and understandable to the user. ASR must also support natural speech for a given application use. What is natural can vary widely, ranging from simple names and commands spoken word for word to continuous sentences with large vocabulary. In addition, people's natural speech and pronunciation patterns vary, so the system should be flexible to accept different speakers. The recognition engine must be accurate, otherwise users will not use the technology. The

system requirements for speech are processing intensive and may involve large amounts of memory, depending on the vocabulary supported. For server-based applications, wireless bandwidth usage will increase. These factors will also affect other system considerations. The higher the MIPS and transmission requirements of the application, the higher the power consumption of a given system, thereby reducing battery life or requiring more frequent charging. Response time may also increase when the application requires the use of processor external memory.

Certain application trade-offs can help reduce system requirements by abandoning unnecessary functions for handheld devices. A speaker-based system that recognizes only a small number of words and discrete speech will require significantly fewer resources than a speaker-based system that recognizes a large vocabulary and continuous speech. Support for additional languages increases processing requirements and doubles the memory required for an application. Noise and interference immunity are important features, but increase complexity and memory requirements. [page]

Obviously, developers want to minimize the performance degradation of the basic application when adding features such as speaker dependence, continuous speech, vocabulary size, and language support. There are options that help reduce the performance degradation in speech technology, such as distributed speech recognition (DSR). DSR splits the recognition task so that the handheld device can convert the raw speech into spectral feature vectors while the server performs the recognition process. This approach and similar distributed TTS approaches rely on standardization of processing methods and transmission protocols. Although these technologies are promising, developers are still faced with limited resources for speech applications in personal handheld devices.

Therefore, choosing the right platform for high-performance applications such as speech is as important as carefully designing the application's functions. Such a platform must have strong processing power while achieving a high level of power efficiency, not only in core operations but also in processing memory. There should be enough MIPS to support multimedia, security, and other complementary applications. It is also important to provide programmability to integrate new algorithm capabilities. Finally, the platform must include a software architecture designed to support modular application development to help developers quickly bring products to market.

OMAP Technology: An Excellent Voice Platform

TI's OMAP platform provides an excellent solution for developing voice applications in personal handheld devices. The dual-core architecture of the OMAP1510 and OMAP5910 processors integrates the power-efficient TMS320C55x? digital signal processor (DSP) and the high-performance ARM9RISC microprocessor. As a result, these OMAP processors provide the arithmetic-intensive signal processing capabilities required for voice while also providing the general-purpose performance required for system-level operations. The OMAP710 processor is a highly integrated single-chip solution with a DSP-based GSM/GPRS baseband for wireless communication processing and a dedicated TI-enhanced ARM925 processor for low-power execution of multimedia applications. The OMAP1510, OMAP5910 and OMAP710 processors support low-end ARM-based voice applications. They also have code compatibility, allowing developers to integrate software applications into individual products for different markets. The OMAP1510 and OMAP5910 feature DSP processing capabilities for more intensive voice applications.

Dual-core hardware architecture

The dual-core hardware platform of OMAP1510 and OMAP5910 is designed to maximize system performance and minimize power consumption. When used in personal handheld devices, the combination of DSP and RISC cores gives these processors unparalleled performance and power consumption advantages. RISC is extremely suitable for processing control code, such as user interface, OS and advanced applications. On the other hand, DSP is more suitable for real-time signal processing functions required by voice applications.

As shown in Figure 2, the OMAP1510 architecture includes on-chip cache memory for both processors, which can reduce the average number of sends to external memory while eliminating the power consumption of unnecessary external accesses. The memory management unit (MMU) of the two cores provides virtual physical memory conversion. Low-power operation mode can save power during periods when the processor is not used or rarely used. The

OMAP1510 architecture also includes two external memory interfaces and a single memory port. These three memory interfaces are completely independent of each other and can be accessed from any core or from the DMA unit at the same time. Each processor has its own peripheral interface, which not only supports direct connection to peripheral devices but also supports DMA connection from the processor DMA unit. On-chip peripherals including timers, general I/O, UART and watchdog timers, as well as color LCD controllers, all support general OS requirements.

The OMAP5910 architecture not only provides on-chip system functions, but also comes with peripherals such as 192KbytesRAM, USB1.1 host and client, MMC/SD card interface, multi-channel buffered serial port, real-time clock, GPIO and UART, LCD interface, SPI, uWire and i2s. Similar to OMAP1510, OMAP5910 also includes a built-in inter-processor communication mechanism that provides a transparent interface to the DSP for easier code development.

Designing Voice Applications for the OMAP Platform

In the OMAP Developer Network, TI is working with several leading third-party developers who are developing voice technologies such as ASR, TTS, DSR and speaker verification. These companies have unique advantages in the market that they can bring to OMAP users. At the same time, TI has developed internal voice recognition software specifically for small vocabulary and small speech recognition that fully utilizes the dual-core architecture of the OMAP platform. The TI Embedded Speech Recognizer (TIESR) provides the following functions: Speaker-independent command and control functions Speaker-independent continuous number recognition Speaker-independent continuous speech recognition Speaker-dependent name dialing, command and control Dynamic grammar and vocabulary functions to improve noise immunity in noisy environments for applications such as voice browsing Optional speaker adaptation functions for enhanced performance.

Voice Application Example

InfoPhone is a typical example of a voice application based on this embedded architecture. It was developed by TI specifically for the wireless field. InfoPhone is a Java application that implements voice functions and can also perform voice retrieval of useful information. TI has developed three prototype voice-based information services for InfoPhone, such as providing users with stock quotes, flight information and weather forecasts. Each service contains a 50-word vocabulary, and because of the dynamic vocabulary function, the system can perfectly switch between vocabulary. The application design keeps the keyboard input valid during speaking, providing flexibility when the environment is interrupted or the user needs to input privately. Figure 3 illustrates the speech recognition architecture in the InfoPhone example.

Reference address：Design example of voice application solution for handheld devices based on TI's OMAP platform

Previous article：Parameter design of 24V VFD MCU solution based on HT48R065V
Next article：DDS circuit for generating accurate PWM waveform

Recommended ReadingLatest update time:2024-11-22 20:02

MiniGUI transplantation based on OMAP5912 development board

With the rapid development of embedded systems, the use of ARM as a hardware platform and Linux as a software platform has attracted widespread attention. Graphical user interface (GUI) is the most mature human-computer interaction technology in computer systems to date. MiniGUI, as an excellent graphical user interfac

[Microcontroller]

MiniGUI transplantation based on OMAP5912 development board

Popular Resources
Popular amplifiers