STM32 STM32MP15X is a heterogeneous SOC based on CM4+CA7. The CM4 side is used to process real-time tasks, and the CA7 side runs Linux, which is responsible for more complex calculations and task processing. The data collected by the CM4 side is usually given to CA7 for processing, so an inter-core communication mechanism is inevitably required, and inter-core communication requires hardware and software support.
hardware:
1) IPCC: responsible for generating interrupt signals
2) Shared memory: responsible for data interaction
software:
1) CM4 side OPenAMP
2) CA7 Linux side RPMsg framework (Mailbox, Remoteproc, RPMsg, VirtIO)
For the basic principles of inter-core communication please read the relevant wiki site documentation.
https://wiki.stmicroelectronics.cn/stm32mpu/wiki/Exchanging_buffers_with_the_coprocessor
This article will discuss the problems that may be encountered in inter-core communication based on several actual cases.
-
How to increase RPMsg buffer size
-
How to increase the number of RPMsg channels
-
RPMsg communication process deadlocked
-
No more buffer in queue appears during dual-core communication, and communication is terminated
How to increase RPMsg buffer size
2.1. Default Configuration
First, let's take a look at the default configuration related to RPMSG in the current code, as shown in the following code
CA7:virtio_rpmsg_bus.c
#define MAX_RPMSG_NUM_BUFS (512)
#define MAX_RPMSG_BUF_SIZE (512)
CM4:rpmsg_virtio.h
#define RPMSG_BUFFER_SIZE (512)
Reserve memory configuration, refer to the following Linux device tree configuration
Stm32mp15xx-dkx.dtsi:
vdev0vring0: vdev0vring0@10040000 {
compatible = "shared-dma-pool";
reg = <0x10040000 0x1000>;
no-map;
};
vdev0vring1: vdev0vring1@10041000 {
compatible = "shared-dma-pool";
reg = <0x10041000 0x1000>;
no-map;
};
vdev0buffer: vdev0buffer@10042000 {
compatible = "shared-dma-pool";
reg = <0x10042000 0x4000>;
no-map;
};
Number of channels:
defined in the header file of the CM4 core, this value will eventually be passed to the CA7 Linux side.
CM4: openamp_conf.h define VRING_NUM_BUFFS 16
From the above configuration, we can see that there are currently 16 RPMSG channels in total, and the size of each channel is 512 bytes.
The information sent and received in each communication consists of two parts: the message header and the actual content of the information. Vring is used to transmit the message header, and the buffer pool transmits the actual data.
TX Vring:vdev0vring0,reserve 0x1000 which is 4K byte,
RX Vring:vdev0vring1,reserve 0x1000 which is 4K byte
Buffer pool: vdev0buffer, reserve 0x1000 which is 16K byte
2.1.1. How to calculate vring size:
How to calculate the size of Vring (i.e. Message header):
Linux side: Virtio_ring.h
static inline unsigned vring_size(unsigned int num, unsigned long align)
{
return ((sizeof(struct vring_desc) * num + sizeof(__virtio16) * (3 + num)
+ align - 1) & ~(align - 1))
+ sizeof(__virtio16) * 3 + sizeof(struct vring_used_elem) * num;
}
CM4:Virtio_vring.h
static inline int vring_size(unsigned int num, unsigned long align)
{
int size;
size = num * sizeof(struct vring_desc);
size += sizeof(struct vring_avail) + (num * sizeof(uint16_t)) + sizeof(uint16_t);
size = (size + align - 1) & ~(align - 1);
size += sizeof(struct vring_used) +
(num * sizeof(struct vrindg_used_elem)) + sizeof(uint16_t);
return size;
}
According to the above method, the vring size under 16 channels is calculated to be 438 bytes. However, due to the rule of reserve memory in Linux system, it needs to be aligned by page (4096 bytes), so 0x1000 is the minimum reserve value.
2.1.2. How to calculate vdev0buffer size
vdev0buffer_size = buffer_size * number _of buffer * 2 Because the TX and RX channels are managed independently, we need to multiply by 2 to get the current vdev0buffer_size = 512*16*2 = 16K byte, which is consistent with the above configuration. Therefore, we need to pay attention to the number of RPMSG channels, that is, the number of Vring queues, that is, the maximum cache queue for data communication between M4 and A7.
Based on the current configuration, only 512 bytes of data can be sent at a time. If the amount exceeds 512 bytes, the data will be truncated and discarded. Therefore, if you encounter a scenario where the amount of data in a single frame exceeds 512 bytes and you do not want to truncate the data and send it in batches, you can consider increasing the buffer size.
2.2. How to change the buffer size to 2048 bytes
First, according to the formula vdev0buffer_size = buffer_size * number _of buffer * 2,
the reserve memory required by vdev0buffer is calculated to be 2048*16*2 = 64KB, which means that only vdev0buffer_size occupies the entire MCUSRAM3, as shown in the figure below
The entire memory space of CM4 is only 384KB (excluding retention RAM), so the memory overhead is huge. You can consider reducing the number of channels to 4. Then the total memory required according to 2048*4*2 is still 16KB, but at this time you can only create 4 rpmsg channels at most, corresponding to 4 rpmsg-tty device nodes, and the buffer pool can only store 4 frames of data in total.
The modifications involved are as follows
Channel size:
CA7:virtio_rpmsg_bus.c
#define MAX_RPMSG_NUM_BUFS (512)
#define MAX_RPMSG_BUF_SIZE (2048)
CM4:rpmsg_virtio.h
#define RPMSG_BUFFER_SIZE (2048)
The Reserve memory configuration remains unchanged, device tree example:
Stm32mp15xx-dkx.dtsi:
vdev0vring0: vdev0vring0@10040000 {
compatible = "shared-dma-pool";
reg = <0x10040000 0x1000>;
no-map;
};
vdev0vring1: vdev0vring1@10041000 {
compatible = "shared-dma-pool";
reg = <0x10041000 0x1000>; no-map;
};
vdev0buffer: vdev0buffer@10042000 {
compatible = "shared-dma-pool";
reg = <0x10042000 0x4000>;
no-map;
};
Channel number modification:
CM4: openamp_conf.h
define VRING_NUM_BUFFS 4
How to increase the number of channels to 64?
And the channel size is changed to 256 bytes
Through the understanding in the previous article, we already know how to calculate the size of vdev0buffer and vdev0vring. According to the calculation formula of channel vdev size vdev0buffer_size = buffer_size * number_of buffer * 2, we can calculate the required vdev0buffer_size = 256 * 64*2 = 32768byte = 0x8000 Vring size: 16 channels, vring size is 438byte, then the vring_size for 64 channels is about 1752byte = 0x6db. Since the reserve memory is aligned according to the 4K page size (0x1000), and the vring_size 0x6db required for 64 channels is still less than 0x1000, the vring size can remain unchanged.
Modify as follows:
CA7:virtio_rpmsg_bus.c
#define MAX_RPMSG_NUM_BUFS (512)
#define MAX_RPMSG_BUF_SIZE (256)
CM4:rpmsg_virtio.h
#define RPMSG_BUFFER_SIZE (256)
The Reserve memory configuration remains unchanged, device tree example:
Stm32mp15xx-dkx.dtsi:
vdev0vring0: vdev0vring0@10040000 {
compatible = "shared-dma-pool";
reg = <0x10040000 0x1000>;
no-map;
};
vdev0vring1: vdev0vring1@10041000 {
compatible = "shared-dma-pool";
reg = <0x10041000 0x1000>;
no-map;
};
vdev0buffer: vdev0buffer@10042000 {
compatible = "shared-dma-pool";
reg = <0x10042000 0x8000>;
no-map;
};
Channel number modification:
CM4: openamp_conf.h
define VRING_NUM_BUFFS 64
RPMSG communication deadlock case analysis
The problem phenomenon is that a deadlock occurs during the RPMSG communication process, resulting in communication interruption and system restart.
4.1. Reference log snippet:
From the above log snippet, we can see that there are two ways to call virtio_rpmsg_trysend, namely 1), vfs_write
2), __irq_svc
That is, the sending of rpmsg messages is triggered from user space and peripheral interrupts respectively.
When the problem occurs, virtio_rpmsg_trysend triggered by the user space is calling rpmsg_send_offchannel_raw. At this time, the peripheral interrupt triggers virtio_rpmsg_trysend again, but when getting_a_tx_buf is executed, it causes a deadlock when acquiring the mutex lock through mutex_lock. This is because the previous
rpmsg_send_offchannel_raw execution has not ended and the mutex lock has not been released, thus causing a deadlock. Therefore, in view of this situation, when triggering the virtio_rpmsg_trysend call, mutual exclusion needs to be done well.
This problem is caused by unreasonable business program design. A mutex lock is used in virtio_rpmsg_trysend. Therefore, try not to call virtio_rpmsg_trysend in an interrupt trigger. If a user program is initiating a RPMSG call at the same time, due to the unpredictable nature of interrupts, when the interrupt triggers the virtio_rpmsg_trysend call, as long as the frequency is fast enough, it is entirely possible to conflict with the virtio_rpmsg_trysend call triggered by the user program. Due to the use of a mutex lock, deadlock is foreseeable in this case. Developers need to understand this situation and adjust the calling strategy of virtio_rpmsg_trysend to avoid competition.
This problem manifests itself as the CA7 Linux kernel continuously printing the log No more buffer in queue after RPMSG has been communicating normally for a period of time, and the communication process is interrupted. At this time, even if CM4 is reloaded, it still cannot be restored and the system can only be restarted. This problem only occurs in the OPENSTLINUX_V1.0 version, that is, the corresponding kernel v4.19 version, and has been fixed in subsequent versions.
The above log is the log printed by the Linux kernel. From the only log, we can only see that there is no free buffer in the current buffer queue. In order to get more detailed logs for further analysis, we turn on the dynamic debugging switch of the module related to the communication with the cooperative core CM4 in Linux.
CA7 Linux side
Board $> echo -n 'file stm32_rproc.c +p' > /sys/kernel/debug/dynamic_debug/control
Board $> echo -n 'file remoteproc*.c +p' > /sys/kernel/debug/dynamic_debug/control
Board $> echo -n 'file stm32_ipcc.c +p' > /sys/kernel/debug/dynamic_debug/control
At the same time, the log level on the CM4 side also needs to be adjusted to LOGDEBUG to output more debugging information.
Retest and get the following log:
CA7 Linux side log
:
CM4 side log:
During normal communication, the ipcc tx and rx interrupts can be triggered normally. After the problem occurs, it can be seen from the CA7 Linux log that there is only a tx interrupt but no rx interrupt. It can also be seen from the M4 side log that the interrupt trigger is no longer updated, and the last interrupt has not been processed. The interrupt signal of the communication between CM4 and CA7 is generated by IPCC. From this, it can be basically concluded that the abnormal communication is caused by the abnormal state of IPCC. Focus on checking the interrupt processing related code. When checking the interrupt related functions stm32_ipcc_rx_irq and stm32_ipcc_tx_irq in the CA7 Linux code stm32_ipcc.c, it is found that no mutual exclusion processing is performed when using stm32_ipcc_set_bits for register operations. stm32_ipcc_set_bits is the core operation for updating IPCC registers, and it will be called multiple times in many places. Therefore, if mutual exclusion processing is not performed, the IPCC register update is prone to disorder. The final phenomenon is as shown in the above log. The last interrupt was not processed on the CM4 side, which caused the CA7 side to be unable to receive the rx interrupt again. This problem can be solved by adding spin lock protection.