###1. Problem I am debugging a hung issue of ocfs2 (a cluster filesystem in linux kernel). This hung is
caused by a ocfs2 testcase named "mmap_truncate". Naturally, I show the status of this process and cat the processes's stack as below:# pgrep -a mmap_truncate24250 mmap_truncate -c 20 -s 300 /mnt/ocfs2//mmaptruncate.txt# cat /proc/24250/stack[] call_rwsem_down_write_failed+0x13/0x20[ ] ocfs2_truncate_file+0x127/0x6c0 [ocfs2][ ] ocfs2_setattr+0x698/0xa90 [ocfs2][ ] ocfs2_inode_unlock+0x33/0x80 [ocfs2][ ] notify_change+0x1ae/0x380[ ] do_truncate+0x5e/0x90[ ] do_sys_ftruncate.constprop.11+0x108/0x160[ ] entry_SYSCALL_64_fastpath+0x12/0x6d
What if I want to know the code line of a function where it hung up? For example:
ocfs2_truncate_file+0x127
###2. Howto
- Install kernel debuginfo and debugsource packages,etc. I'm using SUSE linux:
zypper install kernel-default-debuginfo kernel-default-debugsource
- Install ocfs2 kernel module debuginfo package if you're using SLES, while openSUSE is not needed to do so.
zypper in ocfs2-kmp-default-debuginfo
- uncompress
vmlinux-${uname -r}.gz
img
#gunzip /boot/vmlinux-4.4.19-60-default.gz
- invoke crash to analyze living system. You'd better get a overview of crash from
man crash
crash /boot/vmlinux-4.4.19-60-default
- show if ocfs2 module is loaded by
mod
, andhelp mod
for more info
crash> modffffffffa04d8440 ocfs2_stackglue 20480 (not loaded) [CONFIG_KALLSYMS]ffffffffa04ecf40 ocfs2_nodemanager 225280 (not loaded) [CONFIG_KALLSYMS]ffffffffa05e1c80 ocfs2 1019904 (not loaded) [CONFIG_KALLSYMS]
- load ocfs2 module
crash> mod -s ocfs2 MODULE NAME SIZE OBJECT FILEffffffffa05e1c80 ocfs2 1019904 /lib/modules/4.4.19-60-default/kernel/fs/ocfs2/ocfs2.ko
- show the code line of ocfs2_truncate_file+0x127, and search the output for the offset near 0x127==295
crash> dis -l ocfs2_truncate_file.../usr/src/debug/kernel-default-4.4.19/linux-4.4/linux-obj/../fs/ocfs2/file.c: 4830xffffffffa0545717: mov 0x18(%rsp),%rax0xffffffffa054571c : lea 0x278(%r12),%rsi0xffffffffa0545724 : lea 0x230(%rax),%rdi0xffffffffa054572b : callq 0xffffffffa0568480 ...
- Jump to the line of source code!
vi /usr/src/debug/kernel-default-4.4.19/linux-4.4/linux-obj/../fs/ocfs2/file.c +483
and we got it!
down_write(&OCFS2_I(inode)->ip_alloc_sem);
感谢来自亚马逊的于鹏给予的帮助!