Feiling Embedded OK1043A-C DPDK Environment Experience
[Copy link]
The FET1043A-C core board launched by Feiling Embedded in June this year is designed with NXP's QorIQ LS1043A processor, with four ARMv8-A architecture Cortex-A53 cores, a main frequency of 1.6G, low power consumption and high energy efficiency. The four-channel SerDes up to 10GB includes a variety of flexible configurations. The supporting baseboard designed by Feiling maximizes the network performance of the QorIQ LS1043A processor. It adopts a 10G and six Gigabit design, with the DPAA1 acceleration engine inside the processor, plus 2GB of large-capacity DDR4 memory, it is simply a network performance monster.
With such excellent hardware, is the traditional Linux kernel still a good match? The answer is no.
There are several reasons for this:
Interrupt processing. When a large number of data packets arrive in the network, frequent hardware interrupt requests will be generated. These hardware interrupts can interrupt the execution process of the previous lower priority soft interrupts or system calls. If such interruptions are frequent, high performance overhead will be generated.
Memory copy. Under normal circumstances, a network data packet needs to go through the following process from the network card to the application: the data is transferred from the network card to the buffer opened by the kernel through DMA and other methods, and then copied from the kernel space to the user space. In the Linux kernel protocol stack, this time-consuming operation even accounts for 57.1% of the entire data packet processing flow.
Context switching. Frequently arriving hardware interrupts and soft interrupts may preempt the execution of system calls at any time, which will generate a lot of context switching overhead. In addition, in a multi-threaded server design framework, scheduling between threads will also generate frequent context switching overhead. Similarly, the energy consumption of lock contention is also a very serious problem.
Local failure. Nowadays, mainstream processors are multi-core, which means that the processing of a data packet may span multiple CPU cores. For example, a data packet may be interrupted in cpu0, processed in kernel mode in cpu1, and processed in user mode in cpu2. This may easily cause CPU cache failure and local failure across multiple cores. If it is a NUMA architecture, it will cause cross-NUMA memory access, which will greatly affect performance.
Memory management. The memory page size of a traditional server is 4K. To increase the memory access speed and avoid cache misses, the number of entries in the cache mapping table can be increased, but this will affect the CPU retrieval efficiency.
From the above problems, we can see that the kernel itself is a very big bottleneck. The obvious solution is to find a way to bypass the kernel. After many predecessors' research, DPDK stands out among many solutions.
“Knowledge gained from books is shallow; one must practice to truly understand.” Let’s experience DPDK through an example.
First, to use the DPDK environment, you need to modify the device tree and configure the network to user mode. The device tree files you need to use are:
OK10xx-linux-fs/flexbuild/build/linux/linux/arm64/fsl-ls1043a-rdb-usdpaa.dtb
Copy fsl-ls1043a-rdb-usdpaa.dtb to the root directory of the development board and use the following command to replace the device tree:
mv/run/media/mmcblk0p2/fsl-ls1043a-rdb-sdk.dtb/run/media/mmcblk0p2/fsl-ls1043a-rdb-sdk.dtb.bak
cp/fsl-ls1043a-rdb-usdpaa.dtb /run/media/mmcblk0p2/boot
ln -s /run/media/mmcblk0p2/boot/fsl-ls1043a-rdb-usdpaa.dtb/run/media/mmcblk0p2/boot/fsl-ls1043a-rdb-sdk.dtb
reboot
After the replacement is successful, start the development board and enter: ifconfigfm1-mac1
If the message Device not found is displayed, the replacement is successful.
After testing DPDK, restore the default configuration method:
cp/run/media/mmcblk0p2/fsl-ls1043a-rdb-sdk.dtb.bak/run/media/mmcblk0p2/fsl-ls1043a-rdb-sdk.dtb
reboot
After configuring the network to user mode, how do we use them? How do we use TCP/UDP? Don't worry, if you want to use TCP or UDP in DPDK, you need to port a protocol stack to DPDK. In the introductory stage, let's first experience the test routines that include layer 2 forwarding in DPDK.
The Layer 2 forwarding network topology is shown in the following figure:
Use Port2 and Port3 (corresponding to fm1-mac3 and fm1-mac4) of the OK1043A-C platform to forward data between LinuxHost and OK1012A-C. You can replace LinuxHost and OK1012A-C with other network devices.
Configure OK1043A-C:
l2fwd-c 0xf -n 1 -- -p 0xc -q 1 --no-mac-updating
Configure OK1012A-C:
ifconfig eth0 192.168.1.200
tcpdump -i eth0 -vv -n -e
Configure Linux Host:
ifconfig eth0 192.168.1.120
sudomodprobe pktgen.ko
echo " add_deviceeth0 " > /proc/net/pktgen/kpktgend_0
echo " dst_mac6e:56:7d:85:ce:4d " > /proc/net/pktgen/eth0
echo " dst192.168.1.200 " >/proc/net/pktgen/eth0
echo " pkt_size64 " > /proc/net/pktgen/eth0
echo " count1000000 " > /proc/net/pktgen/eth0
echo " start " > /proc/net/pktgen/pgctrl
We let the LinuxHost send out 1 million 64-byte packets to test OK1043A-CDPDK's ability to forward these packets.
By checking the serial port print information of OK1043A-C, we found that DPDK has completely forwarded all received data packets. At the same time, if you are careful, you will find that when using DPDK for data forwarding, the CPU load has always been high, because it has been polling at the application layer to see if there are any data packets to be processed.
|