32-bit MCU price 1 USD - Analysis of the characteristics and positioning of Luminary's ARM-based MCU
[Copy link]
Looking at the history of the development of the microprocessor field, there are countless remains of innovative structures and brands. Most of the innovative structures have technical advantages over the mainstream structures at that time. However, most of the emerging innovative structures have experienced similar cruel lessons, that is, in terms of the market, they were squeezed by the cruel monopoly of the ecological environment formed by the highly praised structures at that time. However, the company introduced in this article has an experienced management team and has avoided past lessons and developed its own new microprocessor products with innovative structures.
32-bit MCUs are not sold for the price of an arm/leg
Luminary Micro is a company that uses 32-bit ARM-based CPU cores. Luminary Micro's initial business strategy is to use the advantages of the compact Cortex-M3 core specially designed by ARM with low price and low power consumption to open up the market.
In order to accelerate the development of the new Stellars series products based on the Cortex-M3 core, on March 26 this year, the company announced that it would sell them at a distributor price of US$1 per 10,000 pieces, shocking the industry. This is 1/3 lower than the ARM-based MCU of Philips, the leader in low prices. Luminary Micro's pricing strategy is changing people's perception that 32-bit CPUs are only suitable for high-end systems. It gives original 8- and 16-bit designs more reasons to upgrade to 32-bit architectures.
Another key marketing strategy of the company is to rapidly expand the product spectrum of the Star series to meet the needs of different customers for on-chip storage capacity and general-purpose I/O ports. Because of the increase in I/O ports, the pins of new products such as LM3S301, LM3S310, LM3S318 and LM3S316 have increased to 48 accordingly, and the price starts at US$2.53. With more I/O ports, multi-channel on-chip A/D converters can be additionally integrated.
Luminary Micro believes that its ARM-based MCU architecture will be more attractive than other companies' traditional 8-bit architectures. The company's rapid expansion of device categories and good performance have attracted widespread attention in the industry.
The performance of Luminary's Star series products is equivalent to that of Microchip and Freescale 8-bit microprocessors.
Built-in Cortex core Although ARM and Luminary Micro are two different companies, they are in agreement on MCU strategy. Both parties are convinced that Cortex-M3 will replace the traditional MCU structure. Cortex-M3 has an ISA (instruction set architecture) based on ARMv7M. At the beginning, Cortex-M3 used a unique Thumb-2 instruction set instead of a complete 32-bit ARM instruction set, which caused fierce controversy. Cortex-M3 is designed for low power consumption, small size, short interrupt latency and excellent determinism. To this end, Thumb-2 has added 130 Thumb instructions. The 16-bit instruction subset of the original 32-bit ARM instruction set is used to significantly reduce the code capacity. The original ARM processor supports multiple CPU modes, and its Thumb instructions frequently switch back and forth between the two modes to run 32-bit and 16-bit ARM instructions. However, when an accident occurs, it must be immediately switched to 32-bit mode to handle it, which causes a lot of trouble. In contrast, Thumb-2 has single-cycle instruction sets for 32-bit and 16-bit, respectively, without switching back and forth between modes. Of course, the old ARM binary instructions can no longer run on Thumb-2 models like the Cortex-M3. Thumb-2 programmers that are compatible with the original Thumb instructions can use the unified ARM assembler library to convert ARM assembly language code, but C language libraries must be recompiled.
The Cortex-M3, like the original low-cost ARM7TDMI processor from ten years ago, does not have instruction and data caches. Caches introduce the most taboo uncertainty to real-time control. However, the Cortex-M3 is smaller than the ARM7TDMI and requires fewer logic gates in the core (only 33K). In addition, the Cortex-M3 also adds a memory protection unit (MPU), bus interfaces, and some other logic than the ARM7TDMI. All required are only 60K logic gates. Cortex-M3 also has two optional logic components designed specifically for Cortex-M3: memory protection unit (MPU) and embedded trace macrocell (ETM). Therefore, Cortex-M3 has better performance than ARM7TDMI core.
Figure 2 is the logic block diagram of Cortex-M3. In order to meet the determinism required by real-time control, the instruction and data caches have been deleted. The protection unit (MPU) and embedded trace macrocell (ETM) in the figure are optional.
Figure 2 Cortex-M3 logic block diagram
introduces MCU to RISC CPU MCU has many special functions that are different from general embedded processors. Let's take a look at a simple motor control system. The system must have a feedback channel to let the MCU know the current position of the motor or the displacement of the device pulled by the motor in time, so that the MCU can quickly respond to the next step. For example, when a robot picks up an egg, the sensor on the fingertip must send a signal to the MCU when the finger touches the surface of the egg, so that the robot can tighten its fingers appropriately, grab the egg and lift it. If the robot fails to stop the motor that tightens its fingers appropriately at the predetermined time point because the cache inside the embedded processor fails to hit once or a paging error occurs occasionally, then the robot will become an omelette hand. In order not to crush the egg, Cortex-M3 deletes the instruction and data cache parts that generate uncertainty and uses the flash memory or SRAM on the chip instead.
The control system must also respond to abnormal events as quickly as possible and be fully guaranteed. Imagine that when the CPU is processing other tasks, an external interrupt suddenly sends an alarm that the egg in the robot is in danger of breaking. The CPU must immediately stop the current task and buy time to stop the motor to prevent the cracked egg from being further crushed. Cortex-M3 uses hardware methods to shorten the delay of interrupts and effectively save the time spent by the CPU to enter the processing program after receiving the interrupt. This period of time for Cortex-M3 only takes 12 clock cycles.
In addition, Cortex-M3 also has a nested interrupt controller to eliminate unnecessary stack operations. When a higher priority interrupt request is received while processing the previous interrupt, there is no need to pop the information saved in the stack of the previous interrupt. It can be left until the end of the interrupt service program. Cortex-M3 knows that it should first transfer control to the high-priority interrupt and push the necessary information onto the stack. This tail chaining operation can reduce unnecessary storage traffic and speed up interrupt processing, reducing latency to 6 clock cycles.
The composition of a $1 32-bit MCU As a newly created company, Luminary Micro must firmly grasp the attention of the market, so it boldly made a very lethal pricing strategy: 32-bit ARM-based MCU is sold for only 1 dollar! The company did not disclose the chip size of these devices. TSMC (Taiwan Semiconductor Manufacturing Company) produces such devices using a 0.25-micron CMOS process with 5 metal layers and 2 polysilicon processes. Mixed-signal components and embedded flash memory can be realized, and it has good gate density and low leakage current. The Cortex-M3 uses a simple 3-stage pipeline, no cache, and low clock frequency (the initial device uses 20MHz) design. ARM pointed out that if the Cortex-M3 uses TSMC's 0.18-micron G process, its clock frequency can reach 120 MHz, but using a low clock frequency can reduce power consumption and packaging costs. Cheap metal wire bonding and plastic SOIC packaging are sufficient for such devices. With only 8KB of flash and 2KB of SRAM, the LM3S101 and LM3S102 are definitely for low-end applications, but their performance and storage are more than enough for millions of MCU applications. Figure 1 Luminary Micro's LM3S3xx series is only sold for $1
. Luminary Micro's goal is to rapidly expand the range of products, so it decided to focus on both research and development and the purchase of patents. At first, all peripherals were purchased by patents, and all efforts were concentrated on the development of design processes and various aspects of production. The engineering team focused on the functional innovation of new products in design and development.
Soon, an MPU similar to the advanced MMU (memory-management unit) was added to the Cortex-M3 core of the $1 product. The MPU only uses a simple comparison table to translate virtual storage addresses into physical addresses, and uses simple mapping technology to provide up to 8 mutually isolated physical addresses to the storage area for isolating code, data, and stacks.
The company used the Cortex-M3 with an MPU to produce 4 new MCUs of the Star series, and expanded their general-purpose I/O ports to increase the number of pins to 48. The I/O design of the Star series has the feature that all peripherals can be accessed at the same time, unlike some other companies' I/O ports that need to be re-set and reused separately. Flexible ADC broadens the application field In addition to the improvements to the edge scanning and debugging logic, the engineering design team spent a lot of effort on designing a retrofit design of an off-the-shelf logic module. Although there was an off-the-shelf mixed-signal ADC IP module that was purchased, the team eventually abandoned it and designed its own peripheral hardware instead. The innovative flexible ADC has a total of 8 input channels, with a total sampling rate of 250,000 times per second. In addition to sampling external analog sources, there is a dedicated channel to sample the internal temperature sensor. In order to free the CPU from the monotonous work of sampling each channel in turn, the ADC also has a set of 4 sets of independently programmable loop sequence controller logic. They can control the cyclic sampling (including oversampling) of a single channel, or they can sample multiple channels at different rates simultaneously. Users can choose flexibly and at will. The only limitation is that the total sampling rate does not exceed 250,000 times per second.
Each loop sequence controller of the flexible ADC has multiple trigger sources to choose from. The flexible ADC is particularly suitable for low-cost medical sensor sampling and other industrial controls.
The Star series and its competitors Luminary Micro is targeting the general market served by distributors. Low clock rates and killer pricing allow the Star series to directly address the 8- and 16-bit application markets. However, as the LM3Sxx series moves upstream, it is likely to encounter fierce competition from low-end 32-bit MCUs.
Looking at the core architectures of 8- and 16-bit MCUs, Luminary Micro is facing a small number of architectures from a large number of manufacturers. The most important of these is the ubiquitous 8051 architecture. The 8051 has a large number of manufacturers such as: ADI, Atmel, Dallas, Infineon, NEC, Philips, Renesas, Silicon Labs, STM, TI, and Intel itself. It should be noted that many of these manufacturers have modified and moved the 8051 architecture to some other architecture.
Freescale is another leading company with its own 8-bit MCU core architecture, with a market share of only 14% in 2005. Other companies with their own low-end MCU core structures include: Microchip's PICmicro MCU, NS's COP, Renesas's R8 and M16, and NEC's 78K.
Of course, it is impossible for all 8-bit applications to use 32-bit structures, as the price is a barrier. Engineers should understand that there is nothing free in the world. It is necessary to study in depth what 8-bit can do and what are the potential benefits and disadvantages of switching to 32-bit structures. This is where we should start to study seriously.
RISC code is no longer so bloated It has happened in history that RISC structures have had bloated codes compared to CISC structures. However, with the modern CISC structure with a 16-bit instruction subset, the problem has been overcome and even reversed. ARM has announced that the code density of Cortex-M3 is four times that of 8051, which is a fact.
The reason for this huge change is also related to the use of modern high-level languages in RISC structures. The situation has changed through adjustments to the instruction set and register files, and measures such as improving the transmission efficiency of pointers and function parameters. EEMBC's typical evaluation report on software contains information about code capacity, but unfortunately many companies have not made it public, so it is still difficult to make a comprehensive comparison. Generally speaking, the code required for Cortex-M3 will not be larger than that of other MCUs, which is the minimum level. As for switching to 32 bits, if negative results are really obtained one day, it may still be a good thing. Comparison of
power consumption The Atmel AT89LP is an improved 8051 architecture, with most instructions executed in single cycles and a 10x speed improvement. It is a low-power chip. The effective power consumption of the whole chip is 1 mA @ 3.0V and 1.0MHz. This is equivalent to 3mW / MHz for the whole chip of the AT89LP. The net core power consumption of the Cortex-M3 is 0.19mW / MHz. Of course, the comparison does not take into account the difference in process. However, the value still reflects that the effective power consumption of the 32-bit core is not very large, but the peripherals, memory, and I/O occupy a large proportion of the power consumption.
Luminary Micro says that the effective power consumption of the LM3Sxxx series is 35 mA @ 3.3V and 20MHz. This is equivalent to 5.8mW / MHz for the whole chip. This number should also include the part with more peripherals than the AT89LP. The best comparison should be made under the same peripheral conditions.
The power consumption in standby state is another concept. It depends on the number of transistors, device voltage, and process-related transistor leakage. Obviously, a 32-bit CPU structure requires more transistors than an 8-bit one, which inevitably increases the power consumption in standby mode.
The Star series chips use an external 3.3V power supply, which is converted to 2.5V by a voltage regulator inside the chip to supply the internal logic circuit. Therefore, Luminary Micro must use a low-voltage process to achieve power consumption improvements. Of course, this will inevitably lead to an increase in leakage. High-end CPUs use advanced design techniques to reduce leakage at low voltages, which is a method that cheap low-end CPUs are unable to pursue. So generally speaking, 8-bit CPUs have an absolute power consumption advantage over 32-bit CPUs in standby mode (including sleep).
Development tools are the main factor When it comes down to the technical point of view of using 8-bit or 32-bit architectures, development tools often become the deciding factor in the trade-off. Designers often choose tools with the best hardware and software to choose from in terms of power consumption, price, code efficiency, etc. This is why Freescale, after a long period of hardship, finally embarked on a unified development tool strategy for all MCUs. Luminary forces system designers of its 8- and 16-bit MCUs to bet on using the same tools as modern 32-bit architectures to optimize high-level software development. In 2006, the total shipments of ARM-based CPUs by manufacturers worldwide reached 2 million units, and ARM expects to reach 4.5 million units by 2010. By then, the ubiquitous reality will have grown into a huge ecosystem, and the healthy choice for MCU system designers will be to turn to ARM.
About half of Luminary's customers are already using ARM microprocessors and their systems also need MCUs. Luminary has existing development tools available, but the company will also invest heavily in developing support software specifically for the peripherals of the Star series, creating an extensible API library and various drivers for samples to help customers get their systems up and running as quickly as possible.
Competition from other 32-bit MCU manufacturers As long as the MCUs of the Star series meet the power consumption requirements, Luminary Micro will have an advantage over other 8- and 16-bit MCU manufacturers. Unless other manufacturers guide users to 32-bit MCUs, the competition will become fierce. The main competition should still be performance.
As a start-up, Luminary Micro is successful to get a small share of the $13 million MCU market. In a short period of time, the company has its own Cortex-M3 market and quickly took the lead. There are many powerful companies that are keen to catch up, so the company cannot slack off or stand still. (Translated by Liang Heqing from "Microprocessor Report")
|