日前,收到监控系统邮件告警,告知MySQL备份盘磁盘可用率不足20%,故而通过SSH远程上去,发现是因为保留的备份数据副本(全备)过多的原因,因为手动删除了较早的全备副本,然后,惊奇的是,几分钟后磁盘可用比仍居高不下,故进行故障排查。
[root@bogon bak]# df -HT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
ext4 28G 24G 3.1G 89% /
tmpfs tmpfs 4.2G 2.2G 2.0G 52% /dev/shm
/dev/sda1 ext4 500M 42M 432M 9% /boot
/dev/mapper/bak_vg0-bak_lv0
ext4 2.2T 2.2T 0 100% /bak
/dev/sr0 iso9660 3.9G 3.9G 0 100% /media/RHEL-6.8 Server.x86_64
[root@bogon bak]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
1738080 208630 1529450 13% /
tmpfs 1007800 279 1007521 1% /dev/shm
/dev/sda1 128016 39 127977 1% /boot
/dev/mapper/bak_vg0-bak_lv0
131072000 20 131071980 1% /bak
/dev/sr0 0 0 0 - /media/RHEL-6.8 Server.x86_64
[root@bogon bak]# du -sh /bak/
1.0T /bak/
lsof被誉为Unix/Linux界的瑞士军刀,其用于查看哪些文件被哪些进程所打开,又因lsof需要访问核心内存和各种文件,因此需要root用户或具有执行该命令权限的sudo用户执行。 注:在Unix/Linux中,一切皆文件,故这里的文件包括硬件设备所对应的文件描述符和TCP/UDP端口等
[root@bogon bak]# lsof | less
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 253,0 4096 2 /
init 1 root rtd DIR 253,0 4096 2 /
init 1 root txt REG 253,0 150352 913981 /sbin/init
init 1 root mem REG 253,0 66432 1305647 /lib64/libnss_files-2.12.so
init 1 root DEL REG 253,0 1305631 /lib64/libc-2.12.so
init 1 root DEL REG 253,0 1305602 /lib64/libgcc_s-4.4.7-20120601.so.1.#prelink#.e8mfMB
init 1 root DEL REG 253,0 1305659 /lib64/librt-2.12.so
init 1 root DEL REG 253,0 1305655 /lib64/libpthread-2.12.so.#prelink#.CEboiG
init 1 root DEL REG 253,0 1305681 /lib64/libdbus-1.so.3.4.0.#prelink#.ACI0Uelsof
4条
打开备份盘相关文件的进程被标记为deleted
的记录[root@bogon bak]# lsof | grep /bak
bash 1692 root cwd DIR 253,2 4096 2 /bak
su 1718 root cwd DIR 253,2 4096 2 /bak
bash 5190 root cwd DIR 253,2 4096 2 /bak
lsof 5332 root cwd DIR 253,2 4096 2 /bak
grep 5333 root cwd DIR 253,2 4096 2 /bak
lsof 5334 root cwd DIR 253,2 4096 2 /bak
bash 16311 oracle cwd DIR 253,2 4096 2 /bak
bash 24371 root cwd DIR 253,2 0 64618497 /bak/dumpbak (deleted)
su 29121 root cwd DIR 253,2 0 64618497 /bak/dumpbak (deleted)
oracle 29334 oracle 25w REG 253,2 1078 64618500 /bak/dumpbak/import.log (deleted)
oracle 29336 oracle 35u REG 253,2 1014089334784 64618498 /bak/dumpbak/expdp-20190302.dmp (deleted)
活动进程
给Kill
掉(实质上已是僵尸进程)[root@bogon bak]# kill -9 24371 29121 29334 29336
deleted
的记录[root@bogon bak]# lsof | grep /bak
su 1718 root cwd DIR 253,2 4096 2 /bak
bash 5190 root cwd DIR 253,2 4096 2 /bak
lsof 5344 root cwd DIR 253,2 4096 2 /bak
grep 5345 root cwd DIR 253,2 4096 2 /bak
lsof 5346 root cwd DIR 253,2 4096 2 /bak
bash 16311 oracle cwd DIR 253,2 4096 2 /bak
[root@bogon bak]# df -HT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
ext4 28G 24G 3.1G 89% /
tmpfs tmpfs 4.2G 2.2G 2.0kG 52% /dev/shm
/dev/sda1 ext4 500M 42M 432M 9% /boot
/dev/mapper/bak_vg0-bak_lv0
ext4 2.2T 1.3T 723G 64% /bak
/dev/sr0 iso9660 3.9G 3.9G 0 100% /media/RHEL-6.8 Server.x86_64