Using asymmetric dual-core MCU to improve system performance

Publisher:智慧启迪Latest update time:2013-12-30 Source: 21ic Keywords:MCU  C2000  Concerto Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

 1. Background

As various industries develop towards intelligence, embedded products have increasingly stringent requirements for energy consumption and efficiency. Especially in the fields of smart grid, industry and medical, the core MCU processor of a product faces multiple challenges. For example, an automated motor system or distributed industrial system requires more digital signal processing capabilities to control the motor more accurately, and more and more advanced network interfaces (CAN, Ethernet or Wireless, etc.) to achieve real-time distributed monitoring or control functions. For another example, in Figure 1, a solar inverter system requires a DSP engine to implement DC/AC or DC/DC algorithms, and multiple inverters need to be networked through Wireless or Ethernet to achieve intelligent diagnosis and monitoring.

 

 

面对这些需求,有两种传统的方案可以解决。一种方案是采用两颗单独的 MCU/DSP,其中一颗 MCU或者 DSP 用于实现数字信号处理或者控制算法,另外一颗 MCU 实现网络协议栈或者图形显示界面等。这类方案的存在诸多缺点,首先两颗 MCU 增加了 PCB 的面积,而且双 MCU 之间的通讯的可靠性和数据吞吐率受到限制,另外,功耗也将显著增加,程序开发者甚至需要维护多个软硬件开发环境。另外一种方案是采用更高主频和更多片内资源的单核 MCU/DSP,分时地完成数据处理和辅助通信或显示功能,这种方案显著增加了系统成本和功耗,最致命的是,当客户的产品需要增加新的功能的时候,工程师需要重新计算 MCU 内核的资源和不同任务所需要的运行时间,需要更多的测试时间,因此不利于扩展和产品维护。

In the face of various deficiencies, heterogeneous dual-core architecture came into being, which can solve the above problems well. In fact, asymmetric dual-core MCU can allocate different system tasks to different MCU cores, with fine division of labor, and can optimally balance performance, power consumption and cost. Communication between two MCU cores can be achieved in different ways, such as sharing memory area and message area, which is very simple and easy to implement. In the following chapters, this article will take TI's latest Concerto series product TMS320F28M35H52C as an example to elaborate on the advantages of asymmetric heterogeneous dual-core MCU and the performance improvement it brings to the system.

2. Features of C2000 Concerto dual-core MCU

The C2000 Concerto series MCU is an innovative heterogeneous dual-core product launched by TI. The Concerto hybrid architecture integrates the industry's best real-time control functions and communication functions into one chip, providing high performance, high efficiency and reliability, thereby achieving real-time control loops and fast communication response with low latency [1]. The following describes its features from the aspects of core, memory architecture, communication peripherals, etc. The functional block diagram of the Concerto series TMS320F28M35H52C is shown in Figure 2.

 

 

The first is the high-performance core. The Concerto series MCU includes two cores: Cortex-M3 and C28x. The Cortex-M3 core is the core of the Master subsystem of Concerto, and its main frequency can run at up to 125 MHz. The Cortex-M3 core is a 32-bit ARM core with a very high cost performance. It has been widely used in the industry. Its performance and stability have also been widely accepted by users. It is very suitable for communication and event control. C28x is a new generation of 32-bit DSP core, which is the core of most of TI's existing C2000 products. It can run at up to 150 MHz. The C28x in Concerto has a floating-point unit (Floating-Point Unit), VCU coprocessor, etc., with super performance, and is very suitable for high-throughput data processing. As a control subsystem, C28x is controlled by the Cortex-M3 Master subsystem at a macro level.

The second is the optimized memory architecture. As shown in Figure 2, the C28x of TMS320F28M35H52C can use 512KB Flash memory with ECC check, 64KB ROM, and 36KB RAM with ECC check; Cortex-M3 can use 512KB Flash memory with ECC check, 64KB ROM, and 32KB RAM with ECC check [3]. Between the two cores, there are shared peripherals and storage areas. A total of 64K bytes of shared RAM and 4K message RAM.

The third is peripherals. As shown in Figure 2, the C28x core of TMS320F28M35H52C can control peripherals optimized for closed-loop control, such as DMA, high-speed ADC (3MSPS), multi-channel high-precision PWM (24-channel PWM and 16-channel high-precision HRPWM), eCAP, eQEP, etc.; the Cortex-M3 core can control multiple serial interfaces, Ethernet, CAN and other industrial communication peripherals. At the same time, the two cores can also share peripherals such as ADC to enhance the flexibility of the entire system.

Finally, the software architecture. As shown in Figure 3, controlSUITE is a development resource and software package and development platform that integrates all C2000 MCUs. It provides peripheral routines, DSP libraries, documents, and development board information for TMS320F28M35H52C developers. ControlSUITE also provides a free full-featured real-time operating system TI-RTOS platform, as shown in Figure 4. TI-RTOS is based on the SYS/BIOS real-time kernel and integrates stable middleware, such as TCP/IP protocol stack, USB protocol stack, FAT file system, IPC multi-core communication components, etc.

 

 

 

 

3. IPC inter-kernel communication

The communication between the Cortex-M3 and C28x cores mainly completes two functions, one is data communication, and the other is the transmission of status and control information. IPC (inter-core communication) data communication requires a large RAM to support, while the transmission of status and control information only requires a series of status flags. In addition, UART4 on the Cortex-M3 side and SCIA on the C28x side; as well as SSI3 on ​​the Cortex-M3 side and SPIA on the C28x side are interconnected inside Concerto, without the need for hardware connection outside the chip, and whether to enable such functions depends on the CortexM3 system configuration.

3.1 Message RAM Memory Area

TMS320F28M35H52C uses Message RAM to implement IPC data communication. As shown in Figure 5, the 2K-byte MTOC Message RAM is used to pass messages from the Master (Cortex-M3) subsystem to the Control (C28x) subsystem; the 2K-byte CTOM Message RAM is used to pass messages from the Control subsystem to the Master subsystem. Since both subsystems are equipped with DMA peripherals, DMA can also read and write Message RAM, thereby improving system efficiency. The Message RAM area ensures exclusive access to Messages through the read and write permissions of the RAM memory. For example, the C28x CPU and DMA can read and write access to the CTOM Message RAM area, while the Cortex-M3 CPU and uDMA can only read and access the CTOM Message RAM. Similarly, the read and write access permissions of the two cores to the MTOC Message RAM area are exactly opposite.

[page]

 

Message RAM is only used as data cache for IPC, and IPC also needs to be completed with the help of specific control logic circuits. As shown in Figure 6, the Master subsystem and the Control subsystem both use 5 registers to implement the logic flow control of IPC: IPCACK, IPCSTS, IPCFLG, IPCCLR, and IPCSET. These 5 registers are all 32 bits, and each bit corresponds to a channel of IPC, so handshake communication of up to 32 channels can be realized. Bit0 to Bit3, a total of 4 channels, can trigger the IPC interrupt of the message receiver, and Bit4 to Bit31, a total of 28 channels, require the software query of the message receiver to obtain whether the data is received in the Message RAM. If only status and control information is transmitted between the two cores (such as Semaphore in RTOS), it can be achieved only through the above registers without the participation of Message RAM.

 

 

The following briefly introduces the operation flow of the IPC module by taking the Master subsystem sending a frame of data to the Control subsystem as an example.

1. Cortex-M3 first writes a frame of data into MTOC Message RAM;

2. Cortex-M3 sets Bit9 of MTOCIPCSET (CM3 mapped memory area), as shown in Figure 6. At this time, Bit9 of MTOCIPCSTS (C28x mapped memory area) will also be set;

3. C28x polls Bit9 of MTOCIPCSTS and finds that Bit9 is set; (If the previous operation is one of Bit0 to Bit3, it will trigger C28x to generate an IPC interrupt)

4. C28x reads the data in MTOC Message RAM. At this time, Cortex-M3 successfully sends a frame of data to C28x.

3.2 Shared RAM Memory Area

In most cases, the 2K-byte IPC Message RAM area can meet the data communication between the C28x and M3 subsystems. With DMA, the communication efficiency can be further improved. If the user wants to transfer larger blocks of data between the two subsystems at one time, another method is to use Shared RAM memory.

TMS320F28M35H52C has a 64K-byte Shared RAM area, with a total of 8 blocks S0-S7, each 8K bytes, as shown in Figure 7. Cortex-M3 can be set to let any Shared RAM area be controlled by C28x or M3. For example, after mapping S0 to the C28x side, C28x CPU and DMA can read and write S0, while M3 and uDMA can only read S0, not write and prefetch.

If the Cortex-M3 needs to send 6K bytes of data to the C28x side at one time, it can first map the Shared RAM area S0 to the local memory space, and then send a flag to the C28x through IPC to notify it that the data can be taken away.

 

 

3.3 IPC Software Driver

The controlSUITE software development kit provides two IPC software driver libraries, IPC Driver and IPC_Lite Driver. IPC_Lite Driver only uses IPC registers to implement communication and does not require additional RAM, but users can only support one IPC interrupt service ISR and do not support processing IPC requests in the form of queues. The usage of IPC_Lite Driver is as follows:

 

 

 

 

1. The kernel that actively initiates the data request will first call the function provided by the IPC_Lite Driver. In this example, M3 is the kernel that sends data and executes the "IPCLiteMtoCDataRead" function.

• IPC_FLAG2 is the C28 interrupt flag, indicating to the C28 core that a message has arrived.

• IPC_FLAG17 is the response flag used by C28 to indicate to the M3 core that a command has been processed.

• The address of the C28 from which data needs to be read is also passed as a parameter to the C28 core.

• The reason why this function is called in a while loop is that it may return STATUS_FAIL and will not send information to C28 until MtoC IPC interrupt 2 and flag 17 are enabled, after which the function returns STATUS_PASS.

2. The core that passively receives the data request will parse the command of its IPCCOM register in the ISR. In this example, the C28 MtoCIPCINT2 ISR knows that the flag is set, parses the command of the MTOCIPCCOM register, and recognizes that it is a read data command.

3. The kernel that passively receives the data request will call the same function name as the kernel that actively initiates the data request. In this example, C28 executes IPCLiteMtoCDataRead, IPC_FLAG2 as the interrupt flag parameter, and IPC_FLAG17 as the status flag parameter.

4. If the received command is valid, the IPC_Lite driver function will process the read command and acknowledge the status and interrupt flags. If the received command is invalid, only the interrupt flag is acknowledged to release the interrupt for subsequent commands, while the status flag remains set.

IPC Driver creates a circular buffer in the Message RAM, so that multiple IPC communication commands can be buffered in the form of a queue and then processed one by one, and can support multiple IPC interrupt service routines ISR at the same time. Of course, IPC Driver requires more RAM to support it. Unlike IPC-Lite, in order to use the IPC driver, some settings need to be added to the M3 and C28 projects.

The first step is to add the IPC loop buffer and pointer segments to the CTOM and MTOC message RAM in the linker location files (.cmd) of the M3 and C28. This is shown below:

 

 

In the second step, at least one volatile global tIpcController variable (for C28-M3 IPC interrupt) must be defined and initialized in the application source code, as shown below:

 

 

 

 

 

[page]

1. The kernel that actively initiates the data request will first call a command function provided by the IPC Driver. In this example, M3 is the kernel that initiates the data request and executes the "IPCMtoCSetBits" function.

• g_sIpcController1 is a variable of type tIpcController that controls the communication between the M3 and C28 IPC interrupt channels.

• SETMASK_16BIT is a 16-bit mask indicating the bit fields that should be set. IPC_LENGTH_16_BITS indicates that the data object of the command operation is 16-bits.

• The function is configured to allow blocking "ENABLE BLOCKING", which means the function will wait until M3 PutBuffer has an empty buffer. If the function is configured to not allow blocking "DISABLE BLOCKING", once the "Put" buffer is full, it will immediately return STATUS_FAIL and will not send a message to C28. If the "Put" buffer is free, the function will return STATUS_PASS,

The message was successfully sent to C28.

2. The core that passively accepts data requests will continuously call the IpcGet function to read the messages in the sMessage structure as long as there are messages in the "Get" buffer. In the ISR, the IpcGet function is called, and the tIpcController variable on the C28 side is used to bind two identical IPC interrupt channels of M3 and C28 (the same as the tIpcController used to send commands on the M3 side).

3. Even if the core that passively receives data does not acknowledge the IPC interrupt flag, the core that actively requests data can still send messages continuously because the tIpcController variable queues the messages into the "Put" buffer (the same as the "Get" buffer of the core that passively receives data requests). The ISR of the core that passively receives data requests will continuously obtain and process messages until the "Get" buffer is

null.

4. Division of tasks between Cortex M3 and C28x cores

The advantage of the Cortex-M3 subsystem lies in its ability to process transactions and manage communication peripherals, while the C28x core subsystem has superior performance in real-time control and data processing. Therefore, in a system, it is crucial to reasonably allocate the transactions processed by the two subsystems and optimize the configuration of resources. On the one hand, the Concerto-based system should maximize the use of the DSP and real-time control advantages of C28x and give full play to the advantages of the closed-loop system composed of ADC, PWM, and C28x; on the other hand, the human-machine interface, communication protocol stack, file system, etc. should be run on the Cortex-M3 subsystem side as much as possible. The following two application cases discuss how to improve system efficiency through reasonable task division.

4.1 PV Inverter Network Node

The main function of the photovoltaic inverter is to invert the DC power output by the photovoltaic panel into 110V/220V AC power, and finally connect it to the grid or transmit it off-grid to the power-consuming equipment. In a high-power photovoltaic power generation network topology, there are often many photovoltaic inverters, which need to be monitored, and the control center needs to observe the working status of each photovoltaic inverter in real time. Therefore, the functions of the photovoltaic inverter network node mainly include DC/AC inverter and network connection. As shown in Figure 9, the C28x subsystem (running at 100MHz) completes the MPPT and DC/AC inversion algorithms. There are many ways to connect to the network, and the commonly used methods include Ethernet, RS485 or CAN. The Cortex-M3 subsystem (100 MHz) of the TMS320F28M35H52C has interfaces such as Ethernet, RS485 and CAN, and supports a variety of wired and wireless connection functions.

 

 

Figure 8 Solar HV DC-AC Kit

For the C28x subsystem, the state machine design concept is used to distinguish different system states. Different states represent different operating modes, and other tasks can take corresponding actions according to specific operating modes. For example, the following 5 different operating modes can be used.

• Power On Mode: After the system is powered on, it enters Power On Mode. After the system is powered on, the Cortex-M3 kernel boot program in F28M35H52C1 is started first. At this time, the C28x control subsystem and analog subsystem are in reset state, and the M3 main subsystem needs to release them from reset state. The M3 main subsystem sets the clock frequency of the M3 and C28x cores. Since the ratio of the main frequency of M3 and C28x must be

It is an integer ratio, so the main frequency settings of M3 and C28x can only be 60/60MHz, 75/150MHz, and 100/100MHz. After the main frequency settings of M3 and C28x are completed, the M3 main subsystem needs to configure the peripheral resources and GPIO of the entire chip to determine which GPIOs can be configured by the C28x control subsystem. In this system, the main frequency of M3 and C28x is set to 75/150MHz. When all initialization operations are completed, the system automatically enters Standby Mode.

• Standby Mode: All PWM and relays are turned off. The system waits for a start command and also detects if an error occurs.

• Soft Start Mode: Upon receiving the start command, the system enters soft start mode, PWM and relays are turned on. If the start is successful and no errors occur, the system automatically enters normal inverter mode.

• Normal Inverter Mode: In this mode, the system outputs power. If no error occurs and no shutdown command is received, the system will remain in this mode.

• Fault Mode: If an error occurs, such as bus overvoltage, the system immediately enters Fault Mode. All PWM outputs are blocked and the output relays are disconnected. The Fault state can be cleared by keystroke or GUI. After clearing, the system returns to Standby Mode.

 

 

Figure 90 C28x end program system state machine

 

 

Figure 101 Concerto ADC block diagram

The Concerto series has two 12-bit ADC modules, each ADC module contains two sample-and-hold circuits, supports synchronous or sequential sampling mode, 3 analog comparators with 10-bitDAC, and the analog signal input range is 0V~3.3V (internal reference) or VREFHI/VREFLO ratio (external reference).

Figure 11 shows the detailed ADC configuration. Both the Cortex-M3 and C28x cores of the TMS320F28M35H52C can access the ADC result registers, and the two ADC modules share four analog inputs. This feature of the Concerto ADC module allows security verification of key signals and improves system reliability.

4.2 Power Line Carrier Communication PLC Smart Home Gateway

Smart home gateway can form a network of smart appliances in the room in a wired or wireless way for centralized management. As shown in Figure 10, the C28x (running at 150MHz) of TMS320F28M35H52C mainly completes the OFDM physical layer algorithm of the Power Line Carrier Communication PLC. The Cortex-M3 (75MHz) runs the TCP/IP protocol to access the Ethernet, and secondly, it can optionally connect the GPRS module through the UART interface or connect the TFT color screen user interface through the EBI external expansion bus.

[page]

 

5. Summary

The Concerto C2000 heterogeneous dual-core MCU combines the C28x DSP core with the ARM Cortex-M3 core, demonstrating powerful performance in efficient data processing, data communication, and event management. The two subsystems of C28x and Cortex-M3 have clear division of labor, and the IPC module cleverly implements real-time and efficient inter-core communication. In terms of software, the controlSUITE development platform provides a variety of components, including TCP/IP protocol stack, IPC driver, USB protocol stack, FAT file system, etc., which can help users develop innovative products faster.

Keywords:MCU  C2000  Concerto Reference address:Using asymmetric dual-core MCU to improve system performance

Previous article:6 Principles of PCB Wiring
Next article:Design of 4G monitoring and alarm system based on dual-star positioning

Recommended ReadingLatest update time:2024-11-16 20:24

How to select a PIC microcontroller? PIC microcontroller selection reference
The selection of a single-chip microcomputer is an important and troublesome matter. If the selection is proper, the product will be cost-effective and stable in operation. Otherwise, it may cause the product cost to be too high or affect the normal operation of the product, or even fail to meet the pre-design require
[Microcontroller]
How to select a PIC microcontroller? PIC microcontroller selection reference
Precautions for welding of MCU learning board
Pay attention to the following points during the welding process: 1. Pay attention to the positive and negative polarity of electrolytic capacitors, light-emitting diodes, and buzzers. The long pins of the three are connected to the positive pole and the short pins are connected to the negative pole. If they are conne
[Microcontroller]
Based on 51 single chip microcomputer, a program that lights up every 3 seconds
#include reg52.h sbit led=P2^0; void delay3s(void) //error 0us { unsigned char a,b,c; for(c=189;c 0;c--) for(b=230;b 0;b--) for(a=33;a 0;a--); } void main() { while(1) { led = ~led;
[Microcontroller]
Medical multi-parameter monitor solution based on NXP 32-bit MCU
  Multi-parameter monitors can provide important patient information for medical clinical diagnosis. They can detect important parameters of the human body such as ECG signals, heart rate, blood oxygen saturation, blood pressure, respiratory rate and body temperature in real time, and realize the supervision and alarm
[Microcontroller]
Medical multi-parameter monitor solution based on NXP 32-bit MCU
Microcontroller Test 1: Light up a LED
If you get a chip and want to use it, you must first know how to connect it. The chip we use is called 89S52. Let's see how to connect it. 1. Power supply: This is of course essential. The microcontroller uses a 5V power supply, with the positive pole connected to pin 40 and the negative pole (ground) con
[Microcontroller]
Microcontroller Test 1: Light up a LED
Cannot set breakpoints when using ST-LINK to debug the STM32 microcontroller
       I am using an STM32 microcontroller. When I used ST-LINK to debug the program yesterday, I found that some statements could not set breakpoints. The program was OK when it was compiled, and there were no errors. After burning the program into the microcontroller, the program seemed to run normally. However, aft
[Microcontroller]
The third lesson of the C language tutorial for single-chip microcomputers is to generate HEX files and minimize the system
Before we begin the main content of C language, let's first look at how to use KEIL uVISION2 to compile and generate HEX files for burning chips. The HEX file format is data information arranged by address proposed by Intel. The data width is bytes. All data is represented by hexadecimal numbers. It is often used to sa
[Microcontroller]
The third lesson of the C language tutorial for single-chip microcomputers is to generate HEX files and minimize the system
Handheld programmer (SMP) for 51 MCU
The C8051F microcontroller is a highly integrated mixed-signal system-on-chip (SoC) with a high-speed CIP-51 core compatible with 8051 and fully compatible with the MCS-51 instruction set. It has built-in program memory FLASH and data memory RAM. It integrates commonly used analog peripherals such as ADC and DAC, as
[Microcontroller]
Handheld programmer (SMP) for 51 MCU
Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号