Many of today's embedded industrial and automotive systems are designed based on 8-bit or 16-bit microcontroller architectures. With the advent of new low-power 32-bit architectures, these applications have the potential to achieve higher performance, accuracy, and power efficiency. In addition, increased processing power will help to achieve new product differentiation features, including advanced control algorithms, GUI displays, voice control, and next-generation interfaces such as capacitive touch sensing. 8-bit/16-bit microcontrollers usually consume a lot of computing resources to do these tasks. Today, powerful microcontrollers with built-in floating-point operations are beginning to appear, and 32-bit microcontrollers have enough power to implement many of these functions.
Evaluating the performance of microcontrollers
Compared with professional DSP processors, microcontrollers have the following advantages for signal processing:
(1) Effective loop control; (2) Rich peripherals; (3) Single processor structure, instruction set and development tool chain; (4) Unified interrupt and task switching environment, homogeneous memory; (5) The same operating system manages both control and signal processing tasks, based on MMU; (6) Short time to market due to greatly simplified development process; (7) Popular microcontrollers are easily available and development tools are low cost.
How to evaluate whether the performance of a microcontroller meets the application requirements is a question that engineers need to consider in the early stages of project design. Evaluating and summarizing information from data sheets is an effective method, and another method is to use a certain type of evaluation board to perform specific performance tests and power consumption tests. Both methods have their own disadvantages.
The efficiency difference between 32-bit and 8/16-bit systems is quite large. On a 16-bit processor, a normal 32-bit multiply/accumulate operation requires 4 multiplications and 4 additions. The need to access memory to store intermediate results or release multiple registers further reduces execution efficiency and may slow down other operations. Therefore, on a 16-bit processor, a 32-bit multiplication may require 20 to 40 cycles. The 32-bit UC3C processor only requires a single cycle. In addition, the 32-bit pipeline is wider, so data and instructions can be retrieved from memory faster.
During the evaluation process, three steps were used: (1) abstracting the system characteristics by running various system test benchmarks and varying different system parameters; (2) interpreting the collected characteristic data to establish the system behavior; and (3) using the system behavior to determine how to set control parameters so that the system performance achieves the desired effect.
Characterization
In theory, performance testing is a qualitative or quantitative assessment of the behavior of a system. In practice, the behavior of the system may not be detailed enough to define complete quality tests, and creating the tests may be too expensive to warrant their development. A good compromise for characterizing a system is to use a test benchmark as a test or series of tests executed in software that provide quantitative data that can be used to compare the characteristics of different systems.
To characterize the microcontroller, a set of performance test benchmarks were selected from the EEMBC Auto-Bench group. These benchmarks help predict the performance of microcontrollers in automotive electronics, industrial and general applications. Each benchmark test was run through multiple iterations to eliminate the impact of some startup code that is only run once at the beginning of each test. One advantage of using this industry-standard benchmark suite is that the resulting data can be compared with test data from other microcontrollers of similar architecture to judge overall system performance.
The microcontroller tested here is based on the ARM926EJ-S core with a hardware vector floating-point coprocessor and a 32 KB instruction cache (I-cache). This test measures the performance of the floating-point coprocessor and the instruction cache. The Auto-Bench test benchmarks were run at different operating frequencies of the microcontroller, and the energy consumed in each benchmark execution was measured using Energy-Bench. Energy-Bench is another EEMBC tool that measures the energy consumed by the processor while the benchmark load is running. The data collected from Energy-Bench allows observation of the energy efficiency of the microcontroller under various different loads. Having chosen these tools to evaluate the microcontroller, the next step is to determine the performance of the microcontroller under different operating conditions.
Performance Analysis
To analyze the performance of a microcontroller, it is necessary to determine the overall system response under different conditions. In the test project, the performance of the floating point coprocessor and the instruction cache on the NXP microcontroller needs to be evaluated.
Run the Auto-bench benchmark suite, changing four parameters: operating frequency, CPU core voltage, instruction cache state, and floating-point coprocessor state.
Figure 1 is a schematic diagram of setting up the Auto-Bench/Energy-Bench test environment. It consists of three parts: data acquisition system (DAC), software development environment and test target. The DAC of National Instruments is connected to a PC, which runs Energy-Bench, a power consumption and energy consumption test software. The software test environment uses KeilTM integrated development tools to compile, download and run the Auto-Bench test benchmark. By isolating the three power supply voltages supplied to the microprocessor, Energy-Bench can measure the energy consumed in the Auto-Bench benchmark test and calculate the total energy consumed in each test.
Auto-Bench was run at 4 different frequencies (13 MHz, 52 MHz, 104 MHz, and 208 MHz) and in combination with other test conditions including turning the floating point coprocessor on or off and turning the instruction cache on or off. The floating point coprocessor was disabled by default, causing the compiler to use software floating point for any case where floating point operations were required.
There is much more data collected than can be presented in this article, but here are two representative cases to show how the collected characterization data determines the performance of the system. Figure 2 shows the test data results of EEMBC's finite impulse response filter (FIR) in a graphical manner. Figure 3 shows the basic integer floating point data results collected by EEMBC in a graphical manner. Two different benchmarks were run at 13 MHz, varying the CPU core voltage between 0.9 V and 1.2 V. When the test benchmark was run with the CPU clock set to 208 MHz, the AHB clock was set to its limit of 104 MHz. In all other test frequencies, the CPU clock and AHB clock were the same.
Figure 2 EEMBC’s finite impulse response (FIR) test data results
Figure 3. Basic integer floating point data results collected from EEMBC
Floating point operations are real number operations. Since computers can only store integers, real numbers are approximations, so floating point operations are very slow and have errors. Most machines now have 32 bits, which means that if all 32 bits are used to represent integers, then for unsigned integers it is 0 to 2^32-1, and for signed integers it is -2^31 to 2^31-1.
First, let's look at the performance of the instruction cache, and observe Figure 2 and the graph plotted against cycles/s. The data shows that at all frequencies, the absolute performance of the microcontroller is better when the instruction cache is enabled. Second, even though the instruction cache provides better absolute performance as the CPU clock frequency increases, the relative magnitude of the improvement is not linear. The reader can verify this behavior by observing the graph plotted against cycles/s/MHz. Figure 2 shows that for almost all CPU clock frequencies, performance increases linearly by about 100 cycles/s/MHz, except when running at 208 MHz, where performance drops to 60 or 80 cycles/s/MHz, depending on whether the instruction cache is enabled or not.
It is obvious that the system runs faster when the instruction cache is enabled because the number of reads and writes to the AHB RAM is reduced when the CPU executes instructions from the instruction cache.
The non-linear performance characteristics are the result of the AHB clock having an upper limit of 104 MHz. When the AHB clock is slower than the CPU clock, the CPU must wait longer to fetch instructions from the RAM on the AHB bus, resulting in a smaller relative performance increase per MHz.
Next, we analyze the impact of the instruction cache on energy consumption. If we only consider the absolute power consumption in Figure 2, we may conclude that turning off the instruction cache can save energy for the entire system. However, the Energy-Bench data shows that when the instruction cache is enabled, the energy consumed per benchmark cycle is actually lower than when the instruction cache is turned off.
A closer look at the Energy graph shows that when the instruction cache is enabled, the energy consumed per cycle at 208 MHz, 1.2 V is even lower than at other operating frequencies. In fact, there is a 10% to 12% improvement. In other words, executing the same benchmark with the instruction cache enabled, running at high speed (208 MHz) for a shorter period of time is more energy efficient than running at a lower speed (52 MHz or 104 MHz) for a longer period of time.
From Figure 3 and the graph of cycles/s, we can see the efficiency and energy consumption of using the floating-point coprocessor. This graph shows quite vividly the performance effect of the integrated floating-point coprocessor. At a frequency of 208 MHz, with the instruction cache enabled and software floating-point operations, the microcontroller runs at about 8,500 cycles/s; with the floating-point coprocessor, this value increases to more than 32,500 cycles/s, a performance improvement of more than 280%.
To examine the effect of the floating-point coprocessor on energy consumption, see the energy graph in Figure 3. When using software floating-point operations with the instruction cache enabled, the energy per benchmark load at 208 MHz shows that the microcontroller consumes approximately 16 J per cycle; with the floating-point coprocessor this is less than 4 J/cycle - a savings of more than 75% for the same workload.
Figure 2 and the cycles/s graph show that the performance benchmarks are equivalent at 13 MHz and supply voltages of 0.9 V and 1.2 V.
However, the power graph shows that the power consumption at 1.2 V is about 75% higher than at 0.9 V.
System control parameters
In the test example, the EEMBC characterization tool used determines the performance of the instruction cache and floating point coprocessor in the target test system. Based on this performance, general configuration parameters can be selected to provide the best conditions for system performance with low energy consumption.
Here are some parameter choices that can control system power utilization and performance in environments like those of the EEMBC Auto-Bench benchmark suite:
(1) Enabling instruction cache can improve performance;
(2) The use of hardware floating-point coprocessors significantly improves computing performance and significantly reduces energy consumption compared to software floating-point coprocessors;
(3) At 208 MHz, with instruction cache enabled, the energy consumption is better than at lower frequencies;
(4) For 13 MHz low power operation, the core voltage is much better at 0.9 V than at 1.2 V.
Beyond these general summaries, the fact that system performance is determined is based on data from industry-standard performance and energy benchmarks that are publicly available and independently verified.
Using EEMBC Auto-Bench and Energy-Bench, you can get consistent performance analysis that is easy to demonstrate to others and can be repeated and verified.
Designing an embedded system is often a challenging task, as almost every embedded system has a relatively unique hardware configuration. Specific code often needs to be rewritten for a specific embedded operating system. There are often very strict energy consumption constraints. This article provides a quantitative scientific test method to help embedded engineers consider how to choose a controller suitable for a specific application to build a system. Even if the embedded systems tested vary greatly, solid data can still help system evaluators compare the same performance characteristics.
In the test setup for this article, the EEMBC characterization tool was used to determine the performance of the NXP microcontroller. This performance information was then used to select the best control parameters for the specific operating environment. The test routine quantified the system performance using the instruction cache and floating-point coprocessor of the microcontroller in the evaluation system. The collected characterization data facilitated the definition of system behavior and provided a methodology to select operating parameters to control system performance and energy consumption.
Test results show that the use of hardware vector floating-point units can improve system performance by about 5 times, reduce the amount of code, and lower power consumption.
The hardware floating-point coprocessor VFP9 is a feature of NXP's LPC3000 series based on the ARM926EJ-S core. NXP's low-power 90 nm process technology can achieve this function with very small chip area and extremely low power consumption, making the LPC3000 ARM9 microcontroller very suitable for industrial applications such as medical electronics that require signal processing.
Previous article:Application of Microcontrollers with Hardware Vector Floating Point Units in Medical Electronics
Next article:Carbon nanotube electrical probe array patented to detect electrical activity inside cells
Recommended ReadingLatest update time:2024-11-16 16:34
- Popular Resources
- Popular amplifiers
- Practical Deep Neural Networks on Mobile Platforms: Principles, Architecture, and Optimization
- ARM Embedded System Principles and Applications (Wang Xiaofeng)
- ARM Cortex-M4+Wi-Fi MCU Application Guide (Embedded Technology and Application Series) (Guo Shujun)
- osk5912 evaluation board example source code
- High-speed 3D bioprinter is available, using sound waves to accurately build cell structures in seconds
- [“Source” Observation Series] Application of Keithley in Particle Beam Detection Based on Perovskite System
- STMicroelectronics’ Biosensing Innovation Enables Next-Generation Wearable Personal Healthcare and Fitness Devices
- China's first national standard for organ chips is officially released, led by the Medical Devices Institute of Southeast University
- The world's first non-electric touchpad is launched: it can sense contact force, area and position even without electricity
- Artificial intelligence designs thousands of new DNA switches to precisely control gene expression
- Mouser Electronics provides electronic design engineers with advanced medical technology resources and products
- Qualcomm Wireless Care provides mobile terminal devices to empower grassroots medical workers with technology
- Magnetoelectric nanodiscs stimulate deep brain noninvasively
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Will there be conflicts between Wi-Fi and Bluetooth in the same frequency band?
- Analog signal isolation GP9303+GP8101 solution
- A heating unit power supply solution with constant and adjustable output power based on TPS61022
- Who remembers a video about the principles of network communication?
- [Sipeed LicheeRV 86 Panel Review] 7 - lvgl Solution to the problem of incorrect image color display
- Wireless communication technology-NB-IoT
- SMT32 is no longer available, let's discuss alternative models
- Robotics (Stanford University Open Course)
- [Anxinke UWB indoor positioning module NodeMCU-BU01] 01: Module appearance interface, onboard resources and power supply characteristics
- ESD204 surge protection for HDMI interface