博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
如何使用crash分析vmcore - 之基础思路case1
阅读量:6968 次
发布时间:2019-06-27

本文共 11393 字,大约阅读时间需要 37 分钟。

如何使用crash分析vmcore - 之基础思路case1

dmesg查看内核日志

[2493382.671020] systemd-shutdown[1]: Sending SIGKILL to PID 28975 (docker-containe).[2493382.671078] systemd-shutdown[1]: Sending SIGKILL to PID 29015 (systemd).[2493420.208723] EXT4-fs (nvme0n1p1): sb orphan head is 140906170[2493420.209198] sb_info orphan list:[2493420.209663]   inode nvme0n1p1:140906170 at ffff88490edabfb8: mode 100666, nlink 0, next 149423507[2493420.210129]   inode nvme0n1p1:149423507 at ffff8801b99391a8: mode 100666, nlink 0, next 17567381[2493420.210583]   inode nvme0n1p1:17567381 at ffff8806d4a26998: mode 100744, nlink 0, next 17570510[2493420.211050]   inode nvme0n1p1:17570510 at ffff886387f82ef8: mode 100644, nlink 0, next 17570503[2493420.211508]   inode nvme0n1p1:17570503 at ffff886a1f15bfb8: mode 100644, nlink 0, next 241700498[2493420.211966]   inode nvme0n1p1:241700498 at ffff8877481800e8: mode 100644, nlink 0, next 243138756[2493420.212431]   inode nvme0n1p1:243138756 at ffff88761ad10518: mode 100644, nlink 0, next 241565954[2493420.212900]   inode nvme0n1p1:241565954 at ffff8870d64bbfb8: mode 100755, nlink 0, next 241566333[2493420.213366]   inode nvme0n1p1:241566333 at ffff88721ae74c48: mode 100644, nlink 0, next 241050093[2493420.213833]   inode nvme0n1p1:241050093 at ffff887704958948: mode 100755, nlink 0, next 241567324[2493420.214545] ------------[ cut here ]------------[2493420.219336] kernel BUG at fs/ext4/super.c:879!  <<<======这里指明BUG的代码位置[2493420.223948] invalid opcode: 0000 [#1] SMP[2493420.228133] Modules linked in: kpatch_D751550(OE) kpatch_D631237(OE) unix_diag(E) af_packet_diag(E) netlink_diag(E) dccp_diag(E) dccp(E) tcp_diag(E) udp_diag(E) inet_diag(E) [last unloaded: aisqos_hotfixes][2493420.246846] CPU: 58 PID: 1 Comm: systemd-shutdow Tainted: G        W  OE K 4.9.79-009.ali3000.alios7.x86_64 #1[2493420.257009] Hardware name: Inventec     AliServer Thor01-2U             /TB800G4-G1      , BIOS A1.20 03/06/2018[2493420.267339] task: ffff887e45918000 task.stack: ffffc90000014000[2493420.273425] RIP: 0010:[
] [
] ext4_put_super+0x36f/0x3c0 [ext4] <<<=======这里指明BUG的代码位置[2493420.282593] RSP: 0018:ffffc90000017de8 EFLAGS: 00010206[2493420.288079] RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d[2493420.295384] RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206[2493420.302682] RBP: ffffc90000017e18 R08: 00000000000081a4 R09: 0000000000000000[2493420.309988] R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278[2493420.317293] R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88[2493420.324598] FS: 00007f3241ccf840(0000) GS:ffff887e78480000(0000) knlGS:0000000000000000[2493420.332850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033[2493420.338767] CR2: 00007f5e1372fbd0 CR3: 00000004daa52000 CR4: 00000000007606f0[2493420.346065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[2493420.353361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400[2493420.360660] PKRU: 55555554[2493420.363536] Stack:[2493420.365721] 9cbae75a00000000 ffff887e43298800 ffffffffa034a5e0 ffff887e3818c7b8[2493420.373365] 0000000000000000 ffff887e45918bb0 ffffc90000017e38 ffffffff81244aaf[2493420.380991] 0000000000000083 ffff887e357b8680 ffffc90000017e58 ffffffff81244e37[2493420.388617] Call Trace:[2493420.391239] [
] generic_shutdown_super+0x6f/0x100[2493420.397676] [
] kill_block_super+0x27/0x70[2493420.403508] [
] deactivate_locked_super+0x43/0x70[2493420.409945] [
] deactivate_super+0x5a/0x60[2493420.415770] [
] cleanup_mnt+0x3f/0x90[2493420.421169] [
] __cleanup_mnt+0x12/0x20[2493420.426733] [
] task_work_run+0x80/0xa0[2493420.432306] [
] exit_to_usermode_loop+0xaa/0xb0[2493420.438572] [
] syscall_return_slowpath+0xaa/0xb0[2493420.445011] [
] entry_SYSCALL_64_fastpath+0xc3/0xc5[2493420.451623] Code: 60 04 00 00 48 8b 80 e0 00 00 <0f> 0b 49 c7 c7 88 ff 34 a0 49 8b[2493420.459829] RIP [
] ext4_put_super+0x36f/0x3c0 [ext4][2493420.466633] RSP
crash>

通过dmesg日志,我们可以通过两个方法判断 bug的代码位置:

1. [2493420.219336] kernel BUG at fs/ext4/super.c:879!2. [2493420.273425] RIP: 0010:[
] [
] ext4_put_super+0x36f/0x3c0 [ext4]其中(0x36f代表和ext4_put_super函数入口的偏移量,0x3c0是基准地址 )

从2找到代码crash的具体位置:

(gdb) p 0x36f$11 = 879

反汇编函数,找到位置

crash> dis -l ext4_put_super

在crash中查看代码

crash本身是可以查看代码的,前提是你需要加载模块, 比如:

加载模块ext4:

crash> mod -s ext4crash> mod  <<----列出所有的模块

第879行:

crash> l *ext4_put_super+0x36f0xffffffffa031a8df is in ext4_put_super (fs/ext4/super.c:879).874              * isn't empty.  The on-disk one can be non-empty if we've875              * detected an error and taken the fs readonly, but the876              * in-memory list had better be clean by this point. */877             if (!list_empty(&sbi->s_orphan))878                     dump_orphan_list(sb, sbi);879             J_ASSERT(list_empty(&sbi->s_orphan));880881             sync_blockdev(sb->s_bdev);882             invalidate_bdev(sb->s_bdev);883             if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {

只有当我们找到具体的代码,才能进一步分析代码,究竟为什么会crash,比如,这个函数的参数(可能是某个struct)的值到底是什么?

bt打印栈

bt栈[exception RIP: ext4_put_super+879] 有可以看到是在 函数ext4_put_super的第879行

crash> btPID: 1      TASK: ffff887e45918000  CPU: 58  COMMAND: "systemd-shutdow" #0 [ffffc90000017a58] machine_kexec at ffffffff810603e8 #1 [ffffc90000017ab8] __crash_kexec at ffffffff811211cd #2 [ffffc90000017b80] __crash_kexec at ffffffff811212a5 #3 [ffffc90000017b98] crash_kexec at ffffffff811212eb #4 [ffffc90000017bb8] oops_end at ffffffff81030905 #5 [ffffc90000017be0] die at ffffffff81030ddb #6 [ffffc90000017c10] do_trap at ffffffff8102df02 #7 [ffffc90000017c60] do_error_trap at ffffffff8102e2d9 #8 [ffffc90000017d20] do_invalid_op at ffffffff8102e830 #9 [ffffc90000017d30] invalid_op at ffffffff8171b63e    [exception RIP: ext4_put_super+879]    RIP: ffffffffa031a8df  RSP: ffffc90000017de8  RFLAGS: 00010206    RAX: ffff88490edabf50  RBX: ffff887e43299000  RCX: 00000001949b336d    RDX: 0000000000000000  RSI: 0000000000000206  RDI: 0000000000000206    RBP: ffffc90000017e18   R8: 00000000000081a4   R9: 0000000000000000    R10: 0000000000000cb8  R11: 0000000000001e92  R12: ffff887e43299278    R13: ffff887e43298800  R14: ffff887e43299278  R15: ffffffffa034ff88    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf#12 [ffffc90000017e40] kill_block_super at ffffffff81244e37#13 [ffffc90000017e60] deactivate_locked_super at ffffffff81244f73#14 [ffffc90000017e80] deactivate_super at ffffffff8124547a#15 [ffffc90000017e98] cleanup_mnt at ffffffff81264b2f#16 [ffffc90000017eb0] __cleanup_mnt at ffffffff81264bc2#17 [ffffc90000017ec0] task_work_run at ffffffff810a7b50#18 [ffffc90000017f00] exit_to_usermode_loop at ffffffff810032ba#19 [ffffc90000017f30] syscall_return_slowpath at ffffffff81003baa#20 [ffffc90000017f50] entry_SYSCALL_64_fastpath at ffffffff8171a783    RIP: 00007f3241195c47  RSP: 00007fffb3db5438  RFLAGS: 00000246    RAX: 0000000000000000  RBX: 0000560b87fbd920  RCX: 00007f3241195c47    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000560b87fbdd10    RBP: 0000560b87fbda00   R8: 0000000000000000   R9: 00007f32410e416d    R10: 0000000000000021  R11: 0000000000000246  R12: 0000560b87fbdd10    R13: 00007fffb3db5538  R14: 00007fffb3db5523  R15: 0000000000000000    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002bcrash>

反汇编上下函数

当我们,分析到了出错的具体的代码行,下一步需要分析,传入的参数和struct

首先,我们需要看下 函数 ext4_put_super的原型,发现是static void ext4_put_super(struct super_block *sb),只有一个参数, 而且是一个结构体struct super_block, 现在我们需要知道 *sb 指针的地址是多少呢? 那这个地址肯定是 上个函数 generic_shutdown_super 传递给它的.

现在分析的关键是,我们需要知道,当generic_shutdown_superffffffff81244aaf 处,调用到 ext4_put_super的时候,传给 ext4_put_super 的指针地址是多少?

首先,需要 反汇编 函数generic_shutdown_super 找到地址ffffffff81244aaf

crash> dis -l generic_shutdown_super /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 4360xffffffff81244aa0 
: mov 0x30(%r12),%rax0xffffffff81244aa5
: test %rax,%rax0xffffffff81244aa8
: je 0xffffffff81244aaf
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 4370xffffffff81244aaa
: mov %rbx,%rdi <===rbx 和 rdi 数据一致0xffffffff81244aad
: callq *%rax <===在这里调用下个函数/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/include/linux/compiler.h: 2430xffffffff81244aaf
: mov 0x608(%rbx),%rax/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 4390xffffffff81244ab6
: lea 0x608(%rbx),%rdx0xffffffff81244abd
: cmp %rax,%rdx0xffffffff81244ac0
: jne 0xffffffff81244b1f

接着,反汇编ext4_put_super, 你会发现push了很多的寄存器的值到stack

crash> dis -l ext4_put_super/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 8240xffffffffa031a570 
: nopl 0x0(%rax,%rax,1) [FTRACE NOP]0xffffffffa031a575
: push %rbp0xffffffffa031a576
: mov %rsp,%rbp0xffffffffa031a579
: push %r15 <===第1个寄存器入栈0xffffffffa031a57b
: push %r14 <===第2个寄存器入栈0xffffffffa031a57d
: push %r13 <===第3个寄存器入栈0xffffffffa031a57f
: push %r12 <===第4个寄存器入栈0xffffffffa031a581
: mov %rdi,%r130xffffffffa031a584
: push %rbx <===第5个寄存器入栈(rbx是在上个函数的时候,就有值的,所以,ext4_put_super函数的第一个参数的指针的地址就是这个寄存器的值)0xffffffffa031a585
: sub $0x8,%rsp0xffffffffa031a589
: mov 0x460(%rdi),%rbx/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 8260xffffffffa031a590
: mov 0xe0(%rbx),%r14/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 8300xffffffffa031a597
: callq 0xffffffffa03133f0
crash> bt -f#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]    ffffc90000017de8: 9cbae75a00000000(           ) ffff887e43298800(第5个寄存器的值)    ffffc90000017df8: ffffffffa034a5e0(第4个寄存器的值) ffff887e3818c7b8(第3个寄存器的值)    ffffc90000017e08: 0000000000000000(第2个寄存器的值) ffff887e45918bb0(第1个寄存器的值)    ffffc90000017e18: ffffc90000017e38 ffffffff81244aaf(这两个是不代表寄存器的)#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf    ffffc90000017e28: 0000000000000083 ffff887e357b8680    ffffc90000017e38: ffffc90000017e58 ffffffff81244e37
crash> struct super_block ffff887e43298800struct super_block {  s_list = {    next = 0xffffffff81cb3db0 
, <=======这里也验证了,就是地址ffff887e43298800表示的就是 struct super_block prev = 0xffff887e43968800 }, s_dev = 271581185, s_blocksize_bits = 12 '\f', s_blocksize = 4096, s_maxbytes = 17592186040320, s_type = 0xffffffffa03589c0
, s_op = 0xffffffffa034a5e0
, dq_op = 0xffffffffa034a720
, s_qcop = 0xffffffff81843f60
, s_export_op = 0xffffffffa034a580
, s_flags = 805371904, s_iflags = 1, s_magic = 61267, s_root = 0x0, s_umount = { count = { counter = -4294967295 }, wait_list = { next = 0xffff887e43298878, prev = 0xffff887e43298878 }, wait_lock = { raw_lock = { val = { counter = 0 } }

Refers

转载于:https://www.cnblogs.com/muahao/p/9925629.html

你可能感兴趣的文章
circRNA 在人和小鼠脑组织中的表达
查看>>
新人替代旧人
查看>>
2步安装1个hive docker运行环境[centos7]
查看>>
Android Keystore 对称-非对称加密
查看>>
工作总结 获取html 标签 自定义属性值 根据html 自定义属性 获取 到标签...
查看>>
window.external的使用
查看>>
wait/waitpid函数与僵尸进程、fork 2 times
查看>>
iOS中Storyboard使用要点记录
查看>>
payload和formData有什么不同?
查看>>
【文件监控】之一:理解 ReadDirectoryChangesW part1
查看>>
Objective-C
查看>>
PyCharm搭建pyqt5开发环境
查看>>
微信小程序实战–集阅读与电影于一体的小程序项目(七)
查看>>
摄像机、投影、3D旋转、缩放
查看>>
给大家分享两款正在使用的ref“.NET研究”lector插件
查看>>
关于presentModalViewController的一点儿思考
查看>>
【128】Word中的VBA
查看>>
PowerCollections
查看>>
禁用gridview,listview回弹或下拉悬停
查看>>
FineReport报表和水晶报表的比较
查看>>