A practice of analyzing kernel deadlock with crash
【Recommended reading】
Kernel deadlock debugging method
In the above article , we talked about how to debug deadlock information through lockdep. But what if the system with the problem does not have lockdep configured, or there is no related log? Here we share how to dynamically detect deadlock problems through the crash tool (in addition to the crash tool, trace32 is more convenient, please refer to https://blog.csdn.net/forever_2015/article/details/77434580).
Preliminary analysis using crash
Generally, the reason for the system to freeze is that the core thread is in the UNINTERRUPTIBLE state. So first use the ps command in the crash environment to view the threads in the UNINTERRUPTIBLE state in the system. The parameter -u can filter out the kernel threads:
The bt command can be used to view the call stack of a thread. Let's take a look at the most critical watchdog thread in the UN state above:
From the call stack, we can see that the proc_pid_cmdline_read() function is blocked, and the corresponding code is:
static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
size_t _count, loff_t *pos)
{
......
tsk = get_proc_task(file_inode(file));
if (!tsk)
return -ESRCH;
mm = get_task_mm(tsk);
put_task_struct(tsk);
......
down_read(&mm->mmap_sem);
......
}
Here we need to obtain the mmap_sem lock of a thread mm, which is held by another thread.
Derived Read-Write Lock
To know which thread holds this lock, we must first use assembly to deduce the specific value of this lock. You can use the dis command to look at the assembly code of proc_pid_cmdline_read():
0xffffff99a680aaa0 is where down_read() is called, and its first parameter x0 is the sem lock, such as:
void __sched down_read(struct rw_semaphore *sem)
The x0 and x28 registers store the value of sem, so x21 is naturally the address of mm_struct, because the offset of the mmap_sem member of mm_struct is 104 (0x68). You can use the whatis command to view the declaration of the structure, such as:
Therefore we only need to know x21 or x28 to know the value of mm and mmap_sem locks.
When a function is called, the called function will save the registers to be modified in its own stack frame, so we can find these two registers in down_read() and the function calls after it:
That is to say, in the following functions, as long as x21 or x28 is found to be used, these registers will be saved in its stack frame.
Start from the bottom down_read():
Obviously it does not use x21 or x28, continue to look at the assembly code of rwsem_down_read_failed():
Find x21 in this function, which is saved at the offset 32 bytes of the rwsem_down_read_failed stack frame. The sp of rwsem_down_read_failed() is 0xffffffd6d9e4bcb0
sp + 32 = 0xffffffd6d9e4bcd0. Use the rd command to check the value of x21 stored in the address 0xffffffd6d9e4bcd0:
Use the struct command to view this mm_struct:
The owner here is the task_struct of the thread to which the mm_struct belongs:
The address of the sem lock is 0xffffffd76e349a00+0x68 = 0xffffffd76e349a68 , so:
From this analysis, we know that the watchdog thread is blocked when reading the proc node of thread 1651. The reason is that the mmap_sem lock of this process is held by other threads. So who holds this lock?
Thread holding the read-write lock
We continue to analyze the problem and use the search command with the -t parameter to find the current lock from the stack space of all threads in the system:
Generally, the lock value is stored in registers, and the registers are stored in the stack during the sub-function call. Therefore, as long as the current lock value ( 0xffffffd76e349a68 ) is found in the stack space , the thread is likely to be the lock holding or waiting thread.
Of the 20 threads found here, 19 are the lock-waiting threads mentioned above, and the remaining one is most likely the lock-holding thread:
View the call stack of this thread:
Since
the address where the lock is stored in thread
2124
is 0xffffffd6d396b8b0, which is within the stack frame range of handle_mm_fault(), it can be inferred that the function holding the lock should be before handle_mm_fault().
Let's first look at the do_page_fault function:
There is indeed a place in the code that holds mmap_sem, and it is a reader, so it can be determined that the read-write lock held by 2124 blocks 19 threads including watchdog.
Next, we need to see why thread 2124 holds the lock and does not release it for a long time.
Deadlock
It can be seen that thread 2124 is waiting for the processing result of fuse, and we know that the request of fuse is processed by sdcard.
In the log, we can see that there is indeed thread 2767 in the UNINTERRUPTIBLE state related to sdcard :
The mutex lock that thread 2767 is waiting for is 0xffffffd6948f4090.
Its owner's task and pid are:
First, use the bt command to find that the stack range of 2124 is 0xffffffd6d396b4b0~0xffffffd6d396be70:
From the stack, you can find the mutex:
The mutex value is found at address ffffffd6d396bc40, which is in the stack frame of __generic_file_write_iter.
It is certain that the lock is held before __generic_file_write_iter, and it is likely in ext4_file_write_iter. Check its source code:
Now it is clear that thread 2124 is waiting for thread 2767 to process the fuse request, and thread 2767 is locked by the mutex lock held by thread 2124, which means that the two threads are interlocked.
This article is limited to introducing how to locate the deadlock problem. As for how to solve the specific implementation of the module involved, it will not be elaborated here due to space constraints.
5T technical resources are available for free! Including but not limited to: C/C++, Arm, Linux, Android, artificial intelligence, microcontrollers, Raspberry Pi, etc. Reply " peter
"
in the official account
to get them for free! !
Remember to click Share , Like and Watching , give me some power
Featured Posts
- Arteli - RT Thread - Portable Heart Rate Monitor Sharing
- Author:Thermit Overview Fromtheendof2019to2020,thenewcrownpneumoniacausedhugelossesinChinaandeventheworld.Basedontheexperienceofthepastfewmonths,inthecontextoftheepidemic,strengtheningthetraceabilityandmonit
- 火辣西米秀 Domestic Chip Exchange
- What skills do you need to master as an AI engineer?
- AIengineersneedtomasterawiderangeofskillsandknowledge,herearesomekeypoints: 1.Programminglanguage:BeproficientinPython,whichisthepreferredlanguageintheAIfieldbecauseofitsrichlibraryandeaseofuse.Atthesame
- wangerxian Embedded System
- Analog Filter and Circuit Design Handbook
- Averypracticalreferencebookonanalogfiltersandcircuitdesign. Thismanualisdividedinto19chapters,whichanalyzeanddiscussthedesignofvariousactiveandpassivefilters,computer-aideddesignoffilters,mathematicalcharacteris
- arui1999 Download Centre
- Download the information and get free gifts! A century of technology, a decade of disruption: How to survive in the electric vehicle revolution?
- niceone Relyingonstackingbatteriestomeethighcapacityandendurancewillneverachievehighreliabilityofelectricvehicles!Itisnotsurprisingthatelectricvehiclesarecatchingfirenow!Itwouldbestrangeiftheydidnotcatch
- eric_wang Automotive Electronics
- Please guide me how to use the oscilloscope in multisim
- WhydoprobesA/Bhavetwocontacts,+and-,asiftheycanbeusedjustbyconnectingthem? Also,whatisthisEXTTRIGusedforandhowtouseit? Itshouldbe-ground+signalEXTTRIGconnectstoexternaltriggersou
- 深圳小花 MCU
- [DigiKey Creative Contest] Portable Life Detector 06+ Integration and Debugging of Various Modules
- ThepreviouspostintroducedhowtoobtainECGdataandparsetheprotocol,andfinallydisplayitonthecomputer,realizingtheECGacquisitionfunction.Thispostintroduceshowtointegrateallthemodulestogethertorealizetheexpectedw
- sipower DigiKey Technology Zone