首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >ceph添加OSD故障一例

ceph添加OSD故障一例

作者头像
用户9314062
发布2022-05-20 14:02:40
发布2022-05-20 14:02:40
1.6K0
举报
文章被收录于专栏:LINUX开源玩家LINUX开源玩家

这几天在给ceph增加OSD时候遇到一个故障,分享一下处理经验。

故障现象:

执行添加命令报错:

代码语言:javascript
复制
# ceph-volume lvm prepare --data /dev/sdv --block.wal /dev/nvme0n1p1 --block.db /dev/nvme0n1p7
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new bd07ea2f-9e65-46e2-92b0-42ce1c9796f6
 stderr: 2020-03-24 02:00:36.122857 7f0b0c1ba700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
 stderr: 2020-03-24 02:00:36.122869 7f0b0c1ba700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
 stderr: 2020-03-24 02:00:36.122870 7f0b0c1ba700  0 librados: client.bootstrap-osd initialization error (2) No such file or directory
 stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id

手动在目录/var/lib/ceph/bootstrap-osd/下新建删除文件正常,把ceph服务都重启了一遍还是这样。

检查对比各个节点,发现其他可以创建OSD的节点有该文件,对比md5码发现多数节点该文件相同(有一个节点MD5码不同)

代码语言:javascript
复制
$ ansible ceph1 -uroot -m shell -a 'md5sum /var/lib/ceph/bootstrap-osd/ceph.keyring'
node5 | SUCCESS | rc=0 >>
5d77927663a4869c757854dd56d9238e  /var/lib/ceph/bootstrap-osd/ceph.keyring

node1 | SUCCESS | rc=0 >>  
10cd904c67e2bf03adbb9647382b4d09  /var/lib/ceph/bootstrap-osd/ceph.keyring

node3 | SUCCESS | rc=0 >>  
10cd904c67e2bf03adbb9647382b4d09  /var/lib/ceph/bootstrap-osd/ceph.keyring

node2 | SUCCESS | rc=0 >>  
10cd904c67e2bf03adbb9647382b4d09  /var/lib/ceph/bootstrap-osd/ceph.keyring

node4 | FAILED | rc=1 >>
md5sum: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directory

node6 | FAILED | rc=1 >>
md5sum: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directory

所有节点该文件中的key值都一致

代码语言:javascript
复制
$ ansible ceph1 -uroot -m shell -a 'cat /var/lib/ceph/bootstrap-osd/ceph.keyring'
node3 | SUCCESS | rc=0 >>
[client.bootstrap-osd]
▸   key = 1AQDzPXdcTh0BDRAA4prnwky9uGO3iVuiZtqsKQ==

node1 | SUCCESS | rc=0 >>
[client.bootstrap-osd] 
▸   key = 1AQDzPXdcTh0BDRAA4prnwky9uGO3iVuiZtqsKQ==

node5 | SUCCESS | rc=0 >>
[client.bootstrap-osd] 
▸   key = 1AQDzPXdcTh0BDRAA4prnwky9uGO3iVuiZtqsKQ==
▸   caps mon = "allow profile bootstrap-osd"

node2 | SUCCESS | rc=0 >>
[client.bootstrap-osd]
▸   key = 1AQDzPXdcTh0BDRAA4prnwky9uGO3iVuiZtqsKQ==

node4 | FAILED | rc=1 >>
cat: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directory

node6 | FAILED | rc=1 >>
cat: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directory

直接拷贝一个过来

代码语言:javascript
复制
# scp /var/lib/ceph/bootstrap-osd/ceph.keyring node004:/var/lib/ceph/bootstrap-osd/ceph.keyring

再次执行,这次顺利通过

代码语言:javascript
复制
# ceph-volume lvm prepare --data /dev/sdv --block.wal /dev/nvme0n1p1 --block.db /dev/nvme0n1p7 
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 12221f9e-06ae-456a-9e47-2408de97ac6f
Running command: vgcreate --force --yes ceph-90f542ea-bd2b-4fbf-bf74-91550627e9d1 /dev/sdv
 ......
 ......
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 19 --monmap /var/lib/ceph/osd/ceph-19/activate.monmap --keyfile - --bluestore-block-wal-path /dev/nvme0n1p1 --bluestore-block-db-path /dev/nvme0n1p7 --osd-data /var/lib/ceph/osd/ceph-19/ --osd-uuid 12221f9e-06ae-456a-9e47-2408de97ac6f --setuser ceph --setgroup ceph      
--> ceph-volume lvm prepare successful for: /dev/sdv

执行成功后会自动挂载到系统

代码语言:javascript
复制
# df
Filesystem           1K-blocks    Used Available Use% Mounted on
......
tmpfs                132026216      48 132026168   1% /var/lib/ceph/osd/ceph-19
......

因为没有激活,当前该osd状态为down

代码语言:javascript
复制
# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME         STATUS REWEIGHT PRI-AFF
-1       61.85667 root default
-3       18.19307     host ynode001
  0   hdd  3.63860         osd.0         up  1.00000 1.00000
  1   hdd  3.63860         osd.1         up  1.00000 1.00000
......
 20              0 osd.19              down        0 1.00000 

激活

代码语言:javascript
复制
# cat /var/lib/ceph/osd/ceph-19/fsid
12221f9e-06ae-456a-9e47-2408de97ac6f
# ceph-volume lvm activate 19 12221f9e-06ae-456a-9e47-2408de97ac6f
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-90f542ea-bd2b-4fbf-bf74-91550627e9d1/osd-block-12221f9e-06ae-456a-9e47-2408de97ac6f --path /var/lib/ceph/osd/ceph-19
.......
Running command: systemctl enable --runtime ceph-osd@19
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@19.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@19
 --> ceph-volume lvm activate successful for osd ID: 19

此时状态为"up",并已经自动开始同步数据。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-03-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 LINUX开源玩家 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档