Exploring the ARM Cortex-M7 core: Preparing for tomorrow's IoT-EEWORLD

Collect

　　ARM Cortex-M7 processor

　　The latest member of the Cortex-M processor family is the Cortex-M7. This new core has features that can be used to support new embedded technology requirements. It is designed for applications that require high processing performance, real-time responsiveness and energy efficiency. In general, the Cortex-M7 processor includes the following key features:

　　• High-performance, dual-instruction-issue 6-stage pipeline that can execute up to two instructions per clock cycle;

　　• 64-bit AXI system bus interface;

　　• Optional instruction cache (4 to 64KB) and data cache (4 to 64KB), each with optional ECC (error correction code) support;

　　• Optional 64-bit instruction tightly coupled memory (ITCM) and optional dual 32-bit data TCM (D{0,1}TCM), each TMC memory array supports customer ECC implementation;

　　• Optional low-latency AHB peripheral bus interface, allowing fast, deterministic access to peripherals in real-time applications.

　　 Figure 1 ARM Cortex-M7 processor

　　ARM Cortex-M7 Processor Configuration Options

　　The microarchitecture of the Cortex-M7 processor is different from other cores in the Cortex-M processor series. The microarchitecture of the Cortex-M7 features a 6-stage superscalar pipeline implementation, which significantly improves system performance by improving architectural performance (reducing the number of cycles per instruction) and increasing the operating frequency. To support the higher instruction and data bandwidth requirements of superscalar designs, its key memory interfaces are designed to be 64-bit wide. The AXI system bus and single-cycle ITCM interface are both 64-bit, and the dual 32-bit D-TCM interface can handle two 32-bit transfers or one 64-bit data transfer in a single cycle. Table 1 summarizes the buses in the Cortex-M7 processor microarchitecture, highlighting the comparison between the new interfaces and previous generation ARM Cortex-M series devices.

　　The AXI master interface plays an important role in supporting the memory scalability required for many IoT applications. As new usage models are based on the continuous collection and analysis of data, it is critical to be able to use external memory to increase functionality. In addition to the AXI host interface, the TCM interface also provides an optimal single-cycle interface for executing the actual operations required for control. To support processor performance levels exceeding 5 CoreMarks/MHz, high-performance memory and bus interfaces are essential.

　　There are several factors to consider when choosing which buses to use in an SoC and how to utilize them, including:

　　• Which peripherals need to be connected to the AHB peripheral bus on the Cortex-M7 processor to achieve low latency access capabilities?

　　• Which peripherals need to be accessed by the DMA controller?

　　• What forms of access control and memory protection are required?

　　 Figure 2 Minimum microcontroller

　　 Table 1 ARM Cortex-M7 bus types and descriptions

　　For example, in a very simple design, the memory system can be connected to the TCM interface and the peripherals can be connected to the AHB peripheral interface, as shown in Figure 2. This configuration enables the SoC to not only take advantage of the scalable performance of the Cortex-M7 core, but still address the challenges associated with cost and size. For example, the connection of SRAM to the TCM interface can be actively supported to implement control edge nodes that require real-time performance.

　　Another configuration option is to connect embedded and/or external memory with the AXI interface and achieve higher performance by using cache memory. Most microcontroller applications contain many small control loops, so the number of cache misses executed by the firmware is very low. When using a cache-based design, the system may be less deterministic when executing programs from the AXI bus system. However, it is possible to place the exception vector table and interrupt handlers in SRAM connected to the ITCM interface to achieve deterministic behavior when executing interrupt handlers.

　　The memory scalability, performance, and efficiency benefits of AXI interfaces and caches are key to meeting application requirements. Such configurations offer many benefits that are consistent with IoT applications, such as supporting wireless firmware updates and leveraging large external memories for data storage needs. However, not all use cases require every option, so challenges related to cost, size, and power consumption must be considered.

　　The design of a memory system can offer a wide variety of configuration options. There are many aspects and factors to consider, including:

　　• Execution from AXI or TCM interface;

　　• Cache size (if using AXI);

　　• How embedded memory access is accelerated, and the bandwidth of flash memory;

　　• Optional ECC support.

　　Many different factors can influence the decision, such as the read access speed of the embedded Flash memory, clock speed requirements, and the typical size of the target application and its program flow behavior.

　　If the embedded memory access speed is close to the required processor speed, the embedded flash can be connected to the ITCM interface with some flash access acceleration. In other cases, using AXI with cache will be more appropriate. If the application needs to execute programs from an external memory controller, the memory controller is usually connected to the AXI interface, which also requires the support of instruction cache and data cache. In some cases, the application may only need to use external memory for data storage. In such cases, an instruction cache is not needed.

　　Choosing the cache size depends largely on the properties of the application code. When running program code from embedded memory, both the instruction cache and the data cache are utilized because the program image often contains literal data, lookup tables, or read-only constants along with the instructions. Applications typically have more instruction words than data/constants inside the program image. As program size grows, cache requirements increase, and it is not uncommon for the instruction cache to be larger than the data cache. Conversely, some applications may have small control or DSP loops and may have large amounts of data used as coefficients for calculations. In such cases, a larger D-cache may be more beneficial to system performance than a larger I-cache.

　　Of course, when optimizing for performance, maximizing caches is also required to ensure the lowest latency for larger code and data sizes. However, by running a large cache memory at the same speed as the processor, cache lookups can consume a lot of power depending on other factors. In addition, the cache miss rate curve for most applications approaches zero as the size increases, which means that further increasing the cache size will not improve performance. Fortunately, the configurability of the Cortex-M7 core allows SoC architects to integrate a variety of cache sizes, from no cache to up to 64KB instruction cache and 64KB data cache. With this flexibility, designers can tune the SoC to meet the needs of the target application.

　　 Figure 3 Microcontroller with external memory

　　 Figure 4 ARM Cortex-M7 processor dual-core lockstep configuration

　　In addition to the architectural options, many other features on the Cortex-M7 processor can be configured. For example, the SoC's floating point unit (FPU) function can be configured to have no FPU at all, an FPU with IEEE-754 single-precision floating point operations, or an FPU that supports both IEEE-754 single-precision operations and double-precision operations.

　　Other configuration features include:

　　• The number of interrupts and priority levels in the NVIC;

　　• Memory Protection Unit (MPU) configuration;

　　• Debug and trace capabilities;

　　• Functional safety related features (ECC, dual-core lockstep).

　　There are many advantages to hardware acceleration of floating point operations. Obviously, the performance of floating point operations can be accelerated when there is a hardware floating point unit. In addition, memory space can be optimized because hardware support reduces the number and associated size of software libraries required to perform floating point operations. The reduced processing time and reduced memory footprint ultimately improve the energy efficiency of applications, clearing the way for functions that traditionally require more complex embedded systems. This advantage is very important for energy efficiency because floating point operations for DSP filters can be accelerated by up to 20 times. Having both single-precision and double-precision floating point options further increases the scalability of the new processors.

　　As IoT technology continues to expand, the need to address the security and integrity challenges of embedded applications is also growing. In addition to the same error exception handling capabilities and memory protection units as other Cortex-M processors, the Cortex-M7 processor also includes optional TCM memory and cache error correction code (ECC) support. This enables automatic and instant correction of single-bit errors in memory and detection of double-bit errors.

　　In addition, the Cortex-M7 processor also supports a dual-core lockstep configuration option. In this configuration, the core logic is instantiated twice, and the cache and TCM memory arrays are shared. Because they can be protected by ECC, the silicon area cost can be greatly reduced (see Figure 4), enabling very robust fault-tolerant system designs.

　　Implementation of Freescale KineTIs KV5x MCU series

　　An example of an implementation choice of the Cortex-M7 processor is Freescale's newly released KineTIs KV5x MCU family, a scalable MCU product family targeted at motor control and digital power conversion applications. In this SoC, some of the configuration options selected for the Cortex-M7 processor include the integration of a 16KB instruction cache and an 8KB data cache. This SoC uses the 64-bit AXI bus as an access port to the embedded flash memory. The instruction cache and data cache ensure that the control software residing in the embedded memory is accelerated to support the performance levels required by the connected industrial control use cases. In addition to the cache, the KineTIs KV5x MCU family also integrates 64KB of SRAM connected to the ITCM interface and 128KB of SRAM connected to the DTCM interface. This provides the necessary processor local storage to support real-time control operations with the lowest latency memory.

[1] [2]

Keywords：ARM Reference address：Exploring the ARM Cortex-M7 core: Preparing for tomorrow's IoT

Previous article：Design of inverter circuit based on ARM control
Next article：Using ARM and FPGA to build a neural network processor communication solution

Recommended ReadingLatest update time:2024-11-15 14:20

ARM series processor architecture

1. Commonly used embedded processors include ARM, MIPS, PowerPC, X86, 68K/Cold fire, etc. MIPS is the abbreviation of Microprocessor without Inter-locked Pipeline Stages, which is a processor core standard developed by MIPS Technology. There are currently 32-bit and 64-bit MIPS chips. PowerPC is a CPU chip jointly deve

[Microcontroller]

Design and implementation of ARM fingerprint collector based on FPS200 solid fingerprint sensor

Introduction: This paper designs a fingerprint acquisition system based on ARM, using FPS200 solid fingerprint sensor as fingerprint acquisition component. It reduces many software optimization image processes. Samsung's S3C2440 is used as the hardware platform and WindowsCE system is used as the software platform. It

[Microcontroller]

Design and implementation of ARM fingerprint collector based on FPS200 solid fingerprint sensor

ARM 11 naked running 1 LED cycle light up

init.s ; Note: There needs to be a space before IMPORT, otherwise the compilation will fail IMPORT Main AREA |C$$code|,CODE,READONLY global start start bl Main END main.c #include stdio.h #define rGPMCON (*(volatile unsigned

[Microcontroller]

Design of microwave frequency automatic measurement system based on ARM

1. Introduction Microwaves usually refer to decimeter waves, centimeter waves and millimeter waves. Regarding its frequency range, one statement is: 300MHz ~ 300GHz (1MHz = 106Hz, 1GHz = 109) The corresponding wavelength in free space is about 1m~1mm. The rise and vigorous development of microwave technology has led t

[Power Management]

Design of microwave frequency automatic measurement system based on ARM

ARM cortex-M3 exception handling analysis

1. The possible states of the processor before entering the exception are: 1.handler 2. Thread, MSP 3. Thread, PSP 2. When an exception occurs: 1. There is a stack push process. When an exception occurs, PSP is used and pushed into PSP. When an exception occurs, MSP is used and pushed into MSP. 2. The value of LR wi

[Microcontroller]

How to port Linux 2.4 to embedded systems on ARM platform

At present, embedded processors based on ARM core have become the mainstream in the embedded system market. With the widespread application of ARM technology, the establishment of embedded operating systems for ARM architecture has become a hot topic in current research. S3C2410 is a 16/32-bit embedded processor devel

[Microcontroller]

How to port Linux 2.4 to embedded systems on ARM platform

Azure ARM (19) Migrate traditional ASM VM to ARM VM (2)

　　As we have mentioned in the previous section: Azure ARM (18) Migrating traditional ASM VMs to ARM VMs (1) 　　Azure Virtual Network has been created. After migrating the Virtual Network, we can migrate all VMs (LeiVM01 and LeiVM02) in the VNet to ARM mode. 　　　　The official migration is divided into two parts: 　　1.

[Microcontroller]

Azure ARM (19) Migrate traditional ASM VM to ARM VM (2)

arm linux port mtd-utils 1.x

background It is related to the production environment of the company. I don't want to burn nand flash under uboot every time. I also think that the method of upgrading with USB disk is slow. Besides, the relevant driver is not written by me, so I don't want to be controlled by others. I still hope it is more u

[Microcontroller]

Popular Resources
Popular amplifiers