ARM Cortex-M7 processor
The latest member of the Cortex-M processor family is the Cortex-M7. This new core has features that can be used to support new embedded technology requirements. It is designed for applications that require high processing performance, real-time responsiveness and energy efficiency. In general, the Cortex-M7 processor includes the following key features:
• High-performance, dual-instruction-issue 6-stage pipeline that can execute up to two instructions per clock cycle;
• 64-bit AXI system bus interface;
• Optional instruction cache (4 to 64KB) and data cache (4 to 64KB), each with optional ECC (error correction code) support;
• Optional 64-bit instruction tightly coupled memory (ITCM) and optional dual 32-bit data TCM (D{0,1}TCM), each TMC memory array supports customer ECC implementation;
• Optional low-latency AHB peripheral bus interface, allowing fast, deterministic access to peripherals in real-time applications.
Figure 1 ARM Cortex-M7 processor
ARM Cortex-M7 Processor Configuration Options
The microarchitecture of the Cortex-M7 processor is different from other cores in the Cortex-M processor series. The microarchitecture of the Cortex-M7 features a 6-stage superscalar pipeline implementation, which significantly improves system performance by improving architectural performance (reducing the number of cycles per instruction) and increasing the operating frequency. To support the higher instruction and data bandwidth requirements of superscalar designs, its key memory interfaces are designed to be 64-bit wide. The AXI system bus and single-cycle ITCM interface are both 64-bit, and the dual 32-bit D-TCM interface can handle two 32-bit transfers or one 64-bit data transfer in a single cycle. Table 1 summarizes the buses in the Cortex-M7 processor microarchitecture, highlighting the comparison between the new interfaces and previous generation ARM Cortex-M series devices.
The AXI master interface plays an important role in supporting the memory scalability required for many IoT applications. As new usage models are based on the continuous collection and analysis of data, it is critical to be able to use external memory to increase functionality. In addition to the AXI host interface, the TCM interface also provides an optimal single-cycle interface for executing the actual operations required for control. To support processor performance levels exceeding 5 CoreMarks/MHz, high-performance memory and bus interfaces are essential.
There are several factors to consider when choosing which buses to use in an SoC and how to utilize them, including:
• Which peripherals need to be connected to the AHB peripheral bus on the Cortex-M7 processor to achieve low latency access capabilities?
• Which peripherals need to be accessed by the DMA controller?
• What forms of access control and memory protection are required?
Figure 2 Minimum microcontroller
Table 1 ARM Cortex-M7 bus types and descriptions
For example, in a very simple design, the memory system can be connected to the TCM interface and the peripherals can be connected to the AHB peripheral interface, as shown in Figure 2. This configuration enables the SoC to not only take advantage of the scalable performance of the Cortex-M7 core, but still address the challenges associated with cost and size. For example, the connection of SRAM to the TCM interface can be actively supported to implement control edge nodes that require real-time performance.
Another configuration option is to connect embedded and/or external memory with the AXI interface and achieve higher performance by using cache memory. Most microcontroller applications contain many small control loops, so the number of cache misses executed by the firmware is very low. When using a cache-based design, the system may be less deterministic when executing programs from the AXI bus system. However, it is possible to place the exception vector table and interrupt handlers in SRAM connected to the ITCM interface to achieve deterministic behavior when executing interrupt handlers.
The memory scalability, performance, and efficiency benefits of AXI interfaces and caches are key to meeting application requirements. Such configurations offer many benefits that are consistent with IoT applications, such as supporting wireless firmware updates and leveraging large external memories for data storage needs. However, not all use cases require every option, so challenges related to cost, size, and power consumption must be considered.
The design of a memory system can offer a wide variety of configuration options. There are many aspects and factors to consider, including:
• Execution from AXI or TCM interface;
• Cache size (if using AXI);
• How embedded memory access is accelerated, and the bandwidth of flash memory;
• Optional ECC support.
Many different factors can influence the decision, such as the read access speed of the embedded Flash memory, clock speed requirements, and the typical size of the target application and its program flow behavior.
If the embedded memory access speed is close to the required processor speed, the embedded flash can be connected to the ITCM interface with some flash access acceleration. In other cases, using AXI with cache will be more appropriate. If the application needs to execute programs from an external memory controller, the memory controller is usually connected to the AXI interface, which also requires the support of instruction cache and data cache. In some cases, the application may only need to use external memory for data storage. In such cases, an instruction cache is not needed.
Choosing the cache size depends largely on the properties of the application code. When running program code from embedded memory, both the instruction cache and the data cache are utilized because the program image often contains literal data, lookup tables, or read-only constants along with the instructions. Applications typically have more instruction words than data/constants inside the program image. As program size grows, cache requirements increase, and it is not uncommon for the instruction cache to be larger than the data cache. Conversely, some applications may have small control or DSP loops and may have large amounts of data used as coefficients for calculations. In such cases, a larger D-cache may be more beneficial to system performance than a larger I-cache.
Of course, when optimizing for performance, maximizing caches is also required to ensure the lowest latency for larger code and data sizes. However, by running a large cache memory at the same speed as the processor, cache lookups can consume a lot of power depending on other factors. In addition, the cache miss rate curve for most applications approaches zero as the size increases, which means that further increasing the cache size will not improve performance. Fortunately, the configurability of the Cortex-M7 core allows SoC architects to integrate a variety of cache sizes, from no cache to up to 64KB instruction cache and 64KB data cache. With this flexibility, designers can tune the SoC to meet the needs of the target application.
Figure 3 Microcontroller with external memory
Figure 4 ARM Cortex-M7 processor dual-core lockstep configuration
In addition to the architectural options, many other features on the Cortex-M7 processor can be configured. For example, the SoC's floating point unit (FPU) function can be configured to have no FPU at all, an FPU with IEEE-754 single-precision floating point operations, or an FPU that supports both IEEE-754 single-precision operations and double-precision operations.
Other configuration features include:
• The number of interrupts and priority levels in the NVIC;
• Memory Protection Unit (MPU) configuration;
• Debug and trace capabilities;
• Functional safety related features (ECC, dual-core lockstep).
There are many advantages to hardware acceleration of floating point operations. Obviously, the performance of floating point operations can be accelerated when there is a hardware floating point unit. In addition, memory space can be optimized because hardware support reduces the number and associated size of software libraries required to perform floating point operations. The reduced processing time and reduced memory footprint ultimately improve the energy efficiency of applications, clearing the way for functions that traditionally require more complex embedded systems. This advantage is very important for energy efficiency because floating point operations for DSP filters can be accelerated by up to 20 times. Having both single-precision and double-precision floating point options further increases the scalability of the new processors.
As IoT technology continues to expand, the need to address the security and integrity challenges of embedded applications is also growing. In addition to the same error exception handling capabilities and memory protection units as other Cortex-M processors, the Cortex-M7 processor also includes optional TCM memory and cache error correction code (ECC) support. This enables automatic and instant correction of single-bit errors in memory and detection of double-bit errors.
In addition, the Cortex-M7 processor also supports a dual-core lockstep configuration option. In this configuration, the core logic is instantiated twice, and the cache and TCM memory arrays are shared. Because they can be protected by ECC, the silicon area cost can be greatly reduced (see Figure 4), enabling very robust fault-tolerant system designs.
Implementation of Freescale KineTIs KV5x MCU series
An example of an implementation choice of the Cortex-M7 processor is Freescale's newly released KineTIs KV5x MCU family, a scalable MCU product family targeted at motor control and digital power conversion applications. In this SoC, some of the configuration options selected for the Cortex-M7 processor include the integration of a 16KB instruction cache and an 8KB data cache. This SoC uses the 64-bit AXI bus as an access port to the embedded flash memory. The instruction cache and data cache ensure that the control software residing in the embedded memory is accelerated to support the performance levels required by the connected industrial control use cases. In addition to the cache, the KineTIs KV5x MCU family also integrates 64KB of SRAM connected to the ITCM interface and 128KB of SRAM connected to the DTCM interface. This provides the necessary processor local storage to support real-time control operations with the lowest latency memory.
Previous article:Design of inverter circuit based on ARM control
Next article:Using ARM and FPGA to build a neural network processor communication solution
Recommended ReadingLatest update time:2024-11-15 14:20
- Learn ARM development(16)
- Learn ARM development(17)
- Learn ARM development(18)
- Embedded system debugging simulation tool
- A small question that has been bothering me recently has finally been solved~~
- Learn ARM development (1)
- Learn ARM development (2)
- Learn ARM development (4)
- Learn ARM development (6)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
- Brief Analysis of Automotive Ethernet Test Content and Test Methods
- How haptic technology can enhance driving safety
- Let’s talk about the “Three Musketeers” of radar in autonomous driving
- Why software-defined vehicles transform cars from tools into living spaces
- 【NXP Rapid IoT Review】W3 Environmental Data Collection
- [Flower carving hands-on] Interesting and fun music visualization project (09) - X Music Spectrum
- Fudan Micro FM33LC046N Review Summary
- TMS320C55x Assembly Language Programming
- Does the POR of the 430 microcontroller need to be reset if the voltage is not enough?
- Tektronix 618 promotion has started! The lowest price of the year!
- Request a cadencePCB file
- MSP-FET430UIF Win8 Driver
- National College Student Electronic Design Competition TI Processor Board Application Details
- Finally I've waited for you, all the terminal R&D and testing data has been collected!