记一次服务器异常及修复
最近同事发现产线服务器重启服务时出现如下报错。
[root@server ~]# service sshd restart
Redirecting to /bin/systemctl restart sshd.service
Error: No space left on device
[root@server ~]# systemctl restart dhcpd.service
Error: No space left on device
[root@server ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 472G 195G 278G 42% /
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 126G 4.1G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda1 1014M 166M 849M 17% /boot
tmpfs 26G 0 26G 0% /run/user/0
根据报的Error,字面意思为设备空间不足。一般来说,造成这种报错的原因一般有两种:
于是信心十足敲下命令证明自己猜想。。。
[root@server bin]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 472G 195G 278G 42% /
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 126G 4.1G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda1 1014M 166M 849M 17% /boot
tmpfs 26G 0 26G 0% /run/user/0
[root@server bin]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/centos-root 247431168 1169533 246261635 1% /
devtmpfs 33000182 748 32999434 1% /dev
tmpfs 33004461 1 33004460 1% /dev/shm
tmpfs 33004461 1514 33002947 1% /run
tmpfs 33004461 16 33004445 1% /sys/fs/cgroup
/dev/sda1 524288 340 523948 1% /boot
tmpfs 33004461 1 33004460 1% /run/user/0
额?怎么跟预想的不太一样,空间看样子都足够的。查看message
、dmesg
、sel
等信息,也无硬盘异常log,不像硬盘问题。
By default, Linux only allocates 8192 watches for inotify, which is ridiculously low. And when it runs out, the error is also No space left on device, which may be confusing if you aren't explicitly looking for this issue.
可通过命令man 7 inotify
查询inotify
相关介绍(文末附录 man page for inotify)
[root@server ~]# sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 8192
[root@server ~]# cat /proc/sys/fs/inotify/max_user_watches
8192
查询可得当前upper limit on the number of watches that can be created per real user ID
的确是默认值8192。
查询当前实际值如下,实际值已大于默认设置最大值,故报错。
[root@server ~]# find /proc/*/fd -user "$USER" -lname anon_inode:inotify \
-printf '%hinfo/%f\n' 2>/dev/null \
| xargs cat | grep -c '^inotify'
8557
命令原理:
This will first find all open file descriptors created by
inotify_init*(2)
, and will then look into the corresponding/proc/PID/fdinfo/FD
file for the info about the watch descriptors added withinotify_add_watch(2)
to each of them (look into the proc(5) manpage under/proc/[pid]/fdinfo/
for a description of the inotify-specific entries).
同理可查询每个/proc/PID/fdinfo/FD
对应watch descriptors数,并找出执行命令和文件
[root@server ~]# for i in `find /proc/*/fd -user "$USER" -lname\
anon_inode:inotify -printf '%hinfo/%f\n' 2>/dev/null`;\
do echo -e "$i \t `cat $i|grep -c '^inotify'`";done
/proc/17810/fdinfo/11 2
/proc/17825/fdinfo/3 3
/proc/17825/fdinfo/8 4
/proc/17847/fdinfo/6 2
/proc/17873/fdinfo/6 1
/proc/17879/fdinfo/3 1
/proc/17880/fdinfo/3 1
/proc/18341/fdinfo/5 3
/proc/18882/fdinfo/7 1
/proc/19235/fdinfo/9 5
/proc/1/fdinfo/10 1
/proc/1/fdinfo/14 4
/proc/1/fdinfo/15 4
/proc/1/fdinfo/17 4
/proc/57300/fdinfo/4 8630
/proc/7143/fdinfo/3 2
/proc/9380/fdinfo/7 11
[root@server ~]# cat /proc/57300/cmdline
python xxx.py
由此可看出上述PID 57300即为罪魁祸首,其实际命令也已查出。
由于上一步查出的脚本为一关键任务脚本,暂时无法关掉,故增大fs.inotify.max_user_watches
以解决此问题。
编辑/etc/sysctl.conf
,添加行fs.inotify.max_user_watches = 81920
,并执行以下命令
[root@server ~]# sysctl -p
fs.inotify.max_user_watches = 81920
重新查询inotify
:
[root@server ~]# sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 81920
[root@server ~]# cat /proc/sys/fs/inotify/max_user_watches
81920
执行systemctl
验证结果如下,已解决
[root@server etc]# service sshd restart
Redirecting to /bin/systemctl restart sshd.service
https://serverfault.com/questions/708001/error-no-space-left-on-device-when-starting-stopping-services-only
https://unix.stackexchange.com/questions/498393/how-to-get-the-number-of-inotify-watches-in-use
NAME
inotify - monitoring file system events
DESCRIPTION
The inotify API provides a mechanism for monitoring file system events. Inotify can be used to monitor individual files, or to monitor directories. When a
directory is monitored, inotify will return events for the directory itself, and for files inside the directory.
...
/proc interfaces
The following interfaces can be used to limit the amount of kernel memory consumed by inotify:
/proc/sys/fs/inotify/max_queued_events
The value in this file is used when an application calls inotify_init(2) to set an upper limit on the number of events that can be queued to the corre‐
sponding inotify instance. Events in excess of this limit are dropped, but an IN_Q_OVERFLOW event is always generated.
/proc/sys/fs/inotify/max_user_instances
This specifies an upper limit on the number of inotify instances that can be created per real user ID.
/proc/sys/fs/inotify/max_user_watches
This specifies an upper limit on the number of watches that can be created per real user ID.
...
本文分享自 WriteSimpleDemo 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!