"Run Linux Kernel (2nd Edition) Volume 2: Debugging and Case Analysis" - Use of spin locks and deadlock detection
[Copy link]
This post was last edited by maskmoo on 2024-3-25 09:56
In Linux systems, deadlock is a common system state, usually caused by lock contention in the kernel. Among them, spin lock is a common synchronization mechanism in the Linux kernel, used to protect shared resources from interference from concurrent access. This article will explore the root cause of soft deadlock, analyze the lock contention problem in the Linux kernel, and combine the experimental process to demonstrate the use of spin locks and incorrect practices that may lead to soft deadlock.
Introduction to Linux spin locks:
Spin lock is a lightweight synchronization primitive that uses busy waiting to protect critical sections. In the Linux kernel, spin locks are implemented by setting the execution state of the processor to busy waiting. When the lock is occupied, the thread will remain in a loop until the lock is released. This method is more efficient when waiting for critical section resources in a short period of time, but long spin waiting may cause system performance to degrade.
Summary of experimental process:
-
QEMU loads the Linux system and logs in: Load the Linux system through QEMU and perform experimental operations after logging in.
-
Compile and load the experimental code: Compile and load the code required for the experiment, including the kernel module containing the spin lock.
-
Load the kernel module and trigger soft lock: After loading the kernel module, run the system and observe the triggering of soft lock.
-
Soft deadlock detection: Enable the Lockdep function and recompile the Linux kernel image to detect and locate the occurrence point of soft deadlock.
1 Start QEMU+runninglinuxkernel
cdhome/rlk/rlk/runninglinuxkernel_5.0
./run_rlk_arm64.sh run
QEMU loads the Linux system Login name: benshushu Password: 123
2 Compile and load the experimental code
Compile the reference code for the experiment;
cd /mnt/rlk_lab/rlk_basic/chapter_10_lock/lab1
make
3 Load the kernel module.
su
sudo insmod spinlock-nest.ko
After a while, the system's watchdog triggered a soft lockup. In Linux systems, soft lockup is a system state that is usually caused by some part of the kernel running for a long time without releasing the CPU. This may be caused by the following reasons:
-
Infinite loop in the kernel : Some kernel code may have logic errors or programming errors, causing it to enter an infinite loop state. This can cause some parts of the kernel to occupy the CPU for a long time without being released, resulting in soft deadlock.
-
Interrupt handling : Soft deadlock may also be caused by the interrupt handler running for a long time and failing to release the CPU. If the interrupt handler takes up too much CPU time, other processes will not be able to get execution time, thus causing soft deadlock.
-
Lock contention in the kernel : In a multithreaded environment, if certain critical areas in the kernel are not properly protected by locks, lock contention may occur, causing some threads to occupy the CPU for a long time without releasing it, eventually leading to soft deadlock.
-
Hardware issues : Sometimes soft locks may be caused by hardware failure or instability. For example, hardware interrupts cannot be triggered or handled correctly, which may cause the interrupt handler to run for a long time and cause soft locks.
The reason for the soft deadlock here is the problem of lock competition. You can see the specific code implementation later.
The code implementation of spinlock-nect.c is shown below. This code is a simple Linux kernel module that implements the use of spinlocks and memory allocation functions in the kernel. The main logic includes:
- A static spin lock and global variables are defined for synchronization and storage of allocated pages respectively.
- Implemented a nest_lock function which acquires the spin lock and tries to allocate a memory page, then releases the spin lock and frees the memory page, but there is an incorrect spin lock operation.
- A lockdep_thread function is implemented, which runs as a kernel thread and calls the nest_lock function in a loop.
- The lockdep_thread thread is started in the module initialization function, and an error is returned if the thread creation fails.
- The lock_thread thread is stopped in the module exit function.
static DEFINE_SPINLOCK(hack_spinA);
static struct page *page;
static struct task_struct *lock_thread;
static int nest_lock(void)
{
int order = 5;
spin_lock(&hack_spinA);
page = alloc_pages(GFP_KERNEL, order);
if (!page) {
printk("cannot alloc pages\n");
return -ENOMEM;
}
spin_lock(&hack_spinA);
msleep(10);
__free_pages(page, order);
spin_unlock(&hack_spinA);
spin_unlock(&hack_spinA);
return 0;
}
static int lockdep_thread(void *nothing)
{
set_freezable();
set_user_nice(current, 0);
while (!kthread_should_stop()) {
msleep(10);
nest_lock();
}
return 0;
}
static int __init my_init(void)
{
lock_thread = kthread_run(lockdep_thread, NULL, "lockdep_test");
if (IS_ERR(lock_thread)) {
printk("create kthread fail\n");
return PTR_ERR(lock_thread);
}
return 0;
}
static void __exit my_exit(void)
{
kthread_stop(lock_thread);
}
MODULE_LICENSE("GPL");
module_init(my_init);
module_exit(my_exit);
3 Deadlock Detection
Deadlock detection relies on the Lockdep function, and the CONFIG_DEBUG_LOCKDEP option needs to be turned on. Therefore, the Linux image running in qemu needs to be modified and recompiled.
Go back to the Ubuntu host and modify the Linux kernel configuration file arch/arm64/configs/debian _defconfig. Add the following configuration
cd /home/rlk/rlk/runninglinuxkernel_5.0/arch/arm64/configs/debian_defconfig
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
sudo ./run_debian_arm64.sh build_kernel
sudo ./run_rlk_arm64.sh update_rootfs
After completion, restart the qemu virtual machine and compile the load test program (I failed to update the rootfs in my environment and encountered various strange problems. Later, I solved it by pulling the library from the new one)
git clone https://e.coding.net/benshushu/runninglinuxkernel_5.0/runninglinuxkernel_5.0.git
The kernel log can show the deadlock path and the stack information of the function call when it occurs.
Summarize:
Understanding the characteristics and applicable scenarios of spin locks through this experiment is crucial to avoiding system performance problems such as soft deadlock. I hope this article can provide readers with some ideas and methods for understanding and solving soft deadlock problems.
|