内存太高导致free内存低于水位时,会导致网络收包时因free 内存低于水位线频繁触发分配内存失败导致无法ssh登陆机器。 这时候如果发生了重启,或者是sysrq自己触发重启并生成了coredump,可以用来定位问题的原因。
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 4042945 15.4 GB ----
FREE 20622 80.6 MB 0% of TOTAL MEM
USED 4022323 15.3 GB 99% of TOTAL MEM
SHARED 7499 29.3 MB 0% of TOTAL MEM
BUFFERS 59 236 KB 0% of TOTAL MEM
CACHED 10497 41 MB 0% of TOTAL MEM
SLAB 59464 232.3 MB 1% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 0 0 ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 0 0 0% of TOTAL SWAP
COMMIT LIMIT 2021472 7.7 GB ----
COMMITTED 4495745 17.1 GB 222% of TOTAL LIMIT
先看看内存,可以看到没什么free内存了。
整理下内存使用情况:
crash> ps -G > ps.txt
sed -i "s/>/ /g" ps.txt
统计进程一共占了多少内存:
cat ps.txt | awk '{sum+=$8} END {print sum}'
15939684
可以看到基本已经打满了。(注意:这种统计方式对于共享内存比较高的机器是不准确的!!!一种方法是cat ps.txt | awk '{print $8}' | sort -n | uniq | awk '{sum+=$1} END {print sum}'
只统计RSS相同的进程,但是这样也不准!只能分别试试看哪个更接近了)
找到占内存最多的进程:
cat ps.txt | awk '{print $8,$9,$1}' | sort -nr | more
10616 top 9961
10616 top 6856
10616 top 5672
10616 top 5565
10616 top 20218
10616 top 19449
全是top?
crash> ps | grep UN | wc -l
760
crash> ps -l | grep UN
[ 99026812502124] [UN] PID: 28627 TASK: ffff880003aecb00 CPU: 7 COMMAND: "top"
[ 99026812502058] [UN] PID: 19049 TASK: ffff880022d68000 CPU: 6 COMMAND: "top"
[ 99026812488690] [UN] PID: 18404 TASK: ffff880002b28000 CPU: 7 COMMAND: "top"
可以看到很多top这都UN了,OOM杀是杀不掉
另外这里补充一个小插曲,对于OOM,也可以在coredump中查到他的oom_adj_score来确认他能不能被oom杀掉,方法如下:
crash> task -R signal pid
PID: pid TASK: ffff881850ef8000 CPU: 47 COMMAND: "xxxxx"
signal = 0xffff882d0eaa2a80,
crash> signal_struct.oom_score_adj_min 0xffff882d0eaa2a80
oom_score_adj_min = -998
可以看到这种-998的显然是杀不掉的!
回到top上来,看一下他的栈:
crash> bt ffff880003aecb00
PID: 28627 TASK: ffff880003aecb00 CPU: 7 COMMAND: "top"
#0 [ffff8803e2ee5bd0] __schedule at ffffffff815f3776
#1 [ffff8803e2ee5c38] schedule_preempt_disabled at ffffffff815f40f9
#2 [ffff8803e2ee5c48] __mutex_lock_slowpath at ffffffff815f22d1
#3 [ffff8803e2ee5ca0] mutex_lock at ffffffff815f1c47
#4 [ffff8803e2ee5cb8] lookup_slow at ffffffff815ec2dd
#5 [ffff8803e2ee5cf0] link_path_walk at ffffffff811890a3
#6 [ffff8803e2ee5d90] path_openat at ffffffff8118acde
#7 [ffff8803e2ee5e38] do_filp_open at ffffffff8118c1dd
#8 [ffff8803e2ee5f10] do_sys_open at ffffffff8117aa07
#9 [ffff8803e2ee5f70] sys_open at ffffffff8117ab2e
#10 [ffff8803e2ee5f80] system_call_fastpath at ffffffff816cdfb9
RIP: 00007f27f41522d0 RSP: 00007ffc6fc96dc8 RFLAGS: 00010246
RAX: 0000000000000002 RBX: ffffffff816cdfb9 RCX: 00007f27f4645d87
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f27f484b840
RBP: 0000000000000349 R8: 0000000000000000 R9: 00007f27f4646677
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8117ab2e
R13: ffff8803e2ee5f78 R14: 0000000001c72368 R15: 00000000016e8ff0
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
他在等锁?
那么来看一下他是卡慢还是卡死?
crash> task -R sched_info 28627
PID: 28627 TASK: ffff880003aecb00 CPU: 7 COMMAND: "top"
sched_info = {
pcount = 770201,
run_delay = 24190351955485,
last_arrival = 99026812502124,
last_queued = 0
},
crash> dmesg | grep sysrq
[149186.504014] IP: [<ffffffff813939a6>] sysrq_handle_crash+0x16/0x20
看起来是卡了很久的,基本上算是卡死了。
看下在等什么锁,从栈上找的,不感兴趣可以忽略:
crash> dis mutex_lock
0xffffffff815f1c30 <mutex_lock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff815f1c35 <mutex_lock+5>: push %rbp
0xffffffff815f1c36 <mutex_lock+6>: mov %rsp,%rbp
0xffffffff815f1c39 <mutex_lock+9>: push %rbx
0xffffffff815ec2ba <lookup_slow+16>: mov %rdi,%rbx
0xffffffff815ec2bd <lookup_slow+19>: push %rcx
0xffffffff815ec2be <lookup_slow+20>: mov 0x8(%rdi),%r13
0xffffffff815ec2c2 <lookup_slow+24>: mov 0x30(%rdi),%rdi
0xffffffff815ec2c6 <lookup_slow+28>: cmp 0x30(%r13),%rdi
0xffffffff815ec2ca <lookup_slow+32>: je 0xffffffff815ec2ce <lookup_slow+36>
0xffffffff815ec2cc <lookup_slow+34>: ud2
0xffffffff815ec2ce <lookup_slow+36>: add $0x98,%rdi
0xffffffff815ec2d5 <lookup_slow+43>: mov %rsi,%r12
0xffffffff815ec2d8 <lookup_slow+46>: callq 0xffffffff815f1c30 <mutex_lock>
#3 [ffff8803e2ee5ca0] mutex_lock at ffffffff815f1c47
ffff8803e2ee5ca8: ffff8803e2ee5e40 ffff8803e2ee5ce8
ffff8803e2ee5cb8: ffffffff815ec2dd
找到锁的地址:
crash> inode.i_mutex 0xffff88042ec30040
i_mutex = {
count = {
counter = -124
},
wait_lock = {
{
rlock = {
raw_lock = {
lock = 0 '\000',
spinners = 0 '\000'
}
}
}
},
wait_list = {
next = 0xffff88000da31e80,
prev = 0xffff880008d59c50
},
owner = 0xffff88000f04e400,
spin_mlock = 0x0
}
看下他的owner:
crash> bt 0xffff88000f04e400
PID: 24156 TASK: ffff88000f04e400 CPU: 2 COMMAND: "ps"
#0 [ffff880006259848] __schedule at ffffffff815f3776
#1 [ffff8800062598b0] __cond_resched at ffffffff8107f026
#2 [ffff8800062598c8] _cond_resched at ffffffff815f405a
#3 [ffff8800062598d8] wait_iff_congested at ffffffff81131f78
#4 [ffff880006259938] __alloc_pages_nodemask at ffffffff8111d8b8
#5 [ffff880006259a68] alloc_pages_vma at ffffffff8115b9ea
#6 [ffff880006259ad0] handle_pte_fault at ffffffff8113d26c
#7 [ffff880006259b58] handle_mm_fault at ffffffff8113e2cf
#8 [ffff880006259bc8] __do_page_fault at ffffffff815f93a8
#9 [ffff880006259cd0] do_page_fault at ffffffff815f977e
#10 [ffff880006259ce0] do_async_page_fault at ffffffff815f8e48
#11 [ffff880006259cf0] async_page_fault at ffffffff815f5c58
[exception RIP: filldir+99]
RIP: ffffffff8118e513 RSP: ffff880006259da8 RFLAGS: 00050246
RAX: 0000000000000000 RBX: ffff880006259f38 RCX: 0000000000000142
RDX: 0000000002262fd8 RSI: ffff880006259e83 RDI: ffff880006259f38
RBP: ffff880006259dd0 R8: 00000000024dd1ce R9: 0000000000000004
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000002262ff0
R13: 0000000000000018 R14: 0000000000000002 R15: 0000000000000004
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#12 [ffff880006259dd8] proc_fill_cache at ffffffff811e43a8
#13 [ffff880006259e48] proc_pid_readdir at ffffffff811e4d6e
#14 [ffff880006259ec8] proc_root_readdir at ffffffff811e0e1f
#15 [ffff880006259ef0] vfs_readdir at ffffffff8118e3a0
#16 [ffff880006259f30] sys_getdents at ffffffff8118e7c5
#17 [ffff880006259f80] system_call_fastpath at ffffffff816cdfb9
RIP: 00007fca6481ee35 RSP: 00007ffeeef0fa98 RFLAGS: 00010216
RAX: 000000000000004e RBX: ffffffff816cdfb9 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000022623b0 RDI: 0000000000000004
RBP: ffffffffffffff00 R8: 00000000022623b0 R9: 0000000000000001
R10: 00007ffeeef0fb50 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000022623b0 R14: 000000000000000d R15: 0000000002262380
ORIG_RAX: 000000000000004e CS: 0033 SS: 002b
可以看到,等待的这个锁的主人ps进程因为获取不到内存而睡眠了。
看下他的调度情况,是卡曼还是卡死?
crash> task -R sched_info 24156
PID: 24156 TASK: ffff88000f04e400 CPU: 2 COMMAND: "ps"
sched_info = {
pcount = 225134,
run_delay = 49050541888909,
last_arrival = 149186548046662,
last_queued = 149186552018854
},
没有卡死,但是获取的太慢了,迟迟无法释放锁。
看下有多少人在等这个锁?
crash> list -l mutex_waiter.list -s mutex_waiter.task 0xffff88000da31e80 | wc -l
1498
可以看到,有749个进程在等这个锁,和UN进程的数量基本匹配。
所以分析到现在的逻辑是,在内存不足触发回收的时候起了个ps,这个ps分配不到页面要去回收内存,导致他持有的一个锁没办法释放,然后又起了一大堆top,这些top在等ps持有的锁而D住,由于D住,OOM也没办法杀掉这些进程来释放内存空间,导致ps回收内存变得更慢,形成一个恶性循环。
所以要继续分析,他们到底是怎么起来的?
ps可以看到,top都是timeout进程起来的。
举个例子:
看下最早启动的D住的top进程:
[ 98752998549794] [UN] PID: 18094 TASK: ffff88000202cb00 CPU: 1 COMMAND: "top"
PID: 18094 TASK: ffff88000202cb00 CPU: 1 COMMAND: "top"
RUN TIME: 1 days, 05:04:31
START TIME: 44515
UTIME: 22426
STIME: 62680
上机查看,crontab里面是有很密集的timeout任务,把这个反馈给用户让用户确认情况是否符合预期,用户业务侧排查后问题解决。
另外在做好内存资源控制后,可以评估下,把min_free_kbytes调整到总内存的1%,避免网络收包时因free 内存低于水位线频繁触发分配内存失败。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。