L版本开始极大的降低了对运维操作复杂度,新增了很多命令去确保数据安全,很多新手在删除OSD的时候很容易忽视了集群PGs的状态最终导致数据丢失,因此官方加入以下几个命令
- ceph osd ok-to-stop: Checks whether it looks like PGs will remain
available even if the specified OSD(s) are stopped.
- ceph osd safe-to-destroy: Checks whether it is safe to destroy an OSD.
This does various checks to ensure there is no data on the OSD(s), no
unfound objects, stuck peering, and so forth.
用户在删除OSD之前运行这些命令,通过命令返回的内容,就可以判断删除操作是否能够确保数据安全。
另外在删除OSD的时候,官方也提供了2种类型的操作,一种是使用ceph osd destroy去替换故障磁盘,一种是彻底删除OSD,具体说明如下
- ceph osd destroy: zap info about an OSD but keep it's ID in place (with
a 'destroyed' flag) so that it can be recreated with a replacement device.
- ceph osd purge: zap everything about an OSD, including the ID
下面用真实案例来告诉大家如何删除一个OSD-0,删除前,运行前面提到的ok-to-stop和safe-to-destroy命令,根据返回的结果来决定是否能够执行删除OSD操作。
[root@demo cephuser]# ceph osd ok-to-stop osd.0
OSD(s) 0 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 0 PGs are likely to be degraded (but remain available) as a result.
[root@demo cephuser]# ceph osd safe-to-destroy osd.0
OSD(s) 0 are safe to destroy without reducing data durability.
删除之前最好确认OSD对应的状态和数据信息
[root@demo cephuser]# ceph osd tree #确认OSD的STATUS等信息
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
0 hdd 0 osd.0 up 1.00000 1.00000
[root@demo cephuser]# ceph osd crush ls osd.0 #确认osd-0的Crush信息
osd.0
[root@demo cephuser]# ceph auth get osd.0 #获取osd-0相关的cephx信息
exported keyring for osd.0
[osd.0]
key = AQCgFytbZ3J8HxAAFYL5i36b0D3OIoJpnwZ4Uw==
caps mgr = "allow profile osd"
caps mon = "allow profile osd"
caps osd = "allow *"
如果是替换现有OSD-0的磁盘,则执行destroy操作,你会发现只有OSD-0的keyring被删除。
[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd destroy osd.0 --yes-i-really-mean-it
destroyed osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
0 hdd 0 osd.0 destroyed 1.00000 1.00000 #状态变为destroyed
[root@demo cephuser]# ceph osd crush ls osd.0
osd.0
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring
如果是彻底删除OSD-0,则执行purge操作,你会发现所有OSD-0关联的信息都被删除。
[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd purge osd.0 --yes-i-really-mean-it
purged osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
[root@demo cephuser]# ceph osd crush ls osd.0
Error ENOENT: node 'osd.0' does not exist
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring
最后如果你需要清理原来OSD-0对应数据盘的数据,可以执行以下命令
[root@demo cephuser]# ceph-volume lvm zap /dev/sdb --destroy
--> Zapping: /dev/sdb
--> Unmounting /var/lib/ceph/osd/ceph-0
Running command: umount -v /var/lib/ceph/osd/ceph-0
stderr: umount: /var/lib/ceph/osd/ceph-0 (tmpfs) 已卸载
--> Destroying volume group ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa because --destroy was given
Running command: vgremove -v -f ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa
stderr: Removing ceph--3d2c442b--c663--48d5--bb18--098cb5f307fa-osd--block--7449a599--585a--4caf--8452--9b5facab3df3 (253:2)
stderr: Archiving volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" metadata (seqno 21).
stderr: Releasing logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3"
stderr: Creating volume group backup "/etc/lvm/backup/ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" (seqno 22).
stdout: Logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3" successfully removed
stderr: Removing physical volume "/dev/sdb" from volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa"
stdout: Volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" successfully removed
--> Destroying physical volume /dev/sdb because --destroy was given
Running command: pvremove -v -f /dev/sdb
stderr: Wiping internal VG cache
Wiping cache of LVM-capable devices
stdout: Labels on physical volume "/dev/sdb" successfully wiped.
Running command: wipefs --all /dev/sdb
Running command: dd if=/dev/zero of=/dev/sdb bs=1M count=10
--> Zapping successful for: /dev/sdb
在使用ceph-volume的时候如果你需要将db和wal放置到独立的SSD分区上,那么你需要提前手工进行分区(ceph-volume后续会提供自动分区方案,目前需要手工),以建立OSD-1的wal和db为例。
使用sgdisk新建分区,并指定分区的partuuid以及label标签
sgdisk --new=0:0:+100M --change-name=1:osd-1-wal --partition-guid=1:4fbd7e29-9d25-41b8-afd0-062c0ceff051 --mbrtogpt -- /dev/sdc
sgdisk --new=0:0:+100M --change-name=2:osd-1-db --partition-guid=2:4fbd7e29-9d25-41b8-afd0-062c0ceff052 --mbrtogpt -- /dev/sdc
打上label标签的好处就是方便运维,一目了然就知道分区的具体作用。效果如下
[root@demo cephuser]# blkid
/dev/sda1: UUID="a1e4eaa2-fd83-44e9-937e-a1b360c1c707" TYPE="xfs"
/dev/sda2: UUID="UYtArS-hwDi-nNkc-N78D-6EZX-cIXW-PSpbgq" TYPE="LVM2_member"
/dev/mapper/centos-root: UUID="7065f959-f667-4253-8a90-d49afcf19a29" TYPE="xfs"
/dev/mapper/centos-swap: UUID="bbdef107-d9f9-4470-860e-9d2f55821c4c" TYPE="swap"
/dev/sdc1: PARTLABEL="osd-1-wal" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff051"
/dev/sdc2: PARTLABEL="osd-1-db" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff052"
在使用ceph-volume初始化osd的时候,最好使用partuuid去进行初始化,这样在LV的tag字段ceph.wal_device和ceph.db_device会按你的partuuid进行设置
[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052 --block.wal /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051
[root@demo cephuser]# /usr/sbin/lvs --noheadings --readonly --separator=" " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
/dev/centos/root root centos fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
/dev/centos/swap swap centos r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
ceph.block_device=/dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407,ceph.block_uuid=WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.db_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.encrypted=0,ceph.osd_fsid=9fe04363-fb0b-498d-9354-274513ef7407,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051,ceph.wal_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff051 /dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407 osd-block-9fe04363-fb0b-498d-9354-274513ef7407 ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5
如果你使用/dev/sdc1一类名称进行初始化,那么如果发生重启,则可能会出现对应的ceph.wal_device和ceph.db_device发生变化,虽然现在OSD在挂载的时候会按uuid来查找对应的设备(使用get_osd_device_path方法,具体见下面代码),但是为了确保tag数据的一致性,还是推荐使用partuuid方式。
[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/sdc1 --block.wal /dev/sdc2
[root@demo cephuser]# /usr/sbin/lvs --noheadings --readonly --separator=" " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
/dev/centos/root root centos fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
/dev/centos/swap swap centos r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
ceph.block_device=/dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.block_uuid=Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKm,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/sdc1,ceph.db_uuid=d0fe82cf-34cb-4b95-b70c-52f37a86b333,ceph.encrypted=0,ceph.osd_fsid=471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/sdc2,ceph.wal_uuid=d0577828-e8df-48f0-888a-3a8ee183322c /dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3 osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3 ceph-2850c361-dc1c-434b-9689-f73798bb514e Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKm
uuid获取db和wal路径的源码示例如下
#ceph-12.2.5/src/ceph-volume/ceph_volume/devices/lvm/activate.py
#激活osd的时候调用下面的方法
def activate_bluestore(lvs, no_systemd=False):
.....
db_device_path = get_osd_device_path(osd_lv, lvs, 'db', dmcrypt_secret=dmcrypt_secret)
wal_device_path = get_osd_device_path(osd_lv, lvs, 'wal', dmcrypt_secret=dmcrypt_secret)
#获取最终存储设备路径
def get_osd_device_path(osd_lv, lvs, device_type, dmcrypt_secret=None):
"""
``device_type`` can be one of ``db``, ``wal`` or ``block`` so that
we can query ``lvs`` (a ``Volumes`` object) and fallback to querying the uuid
if that is not present.
Return a path if possible, failing to do that a ``None``, since some of these devices
are optional
"""
osd_lv = lvs.get(lv_tags={'ceph.type': 'block'})
is_encrypted = osd_lv.tags.get('ceph.encrypted', '0') == '1'
logger.debug('Found block device (%s) with encryption: %s', osd_lv.name, is_encrypted)
uuid_tag = 'ceph.%s_uuid' % device_type
device_uuid = osd_lv.tags.get(uuid_tag) #根据lv tag的ceph.db_uuid或者ceph.wal_uuid字段来获取最终的设备路径
if not device_uuid:
return None
device_lv = lvs.get(lv_uuid=device_uuid)
if device_lv:
if is_encrypted:
encryption_utils.luks_open(dmcrypt_secret, device_lv.lv_path, device_uuid)
return '/dev/mapper/%s' % device_uuid
return device_lv.lv_path
else:
# this could be a regular device, so query it with blkid
physical_device = disk.get_device_from_partuuid(device_uuid)
if physical_device and is_encrypted:
encryption_utils.luks_open(dmcrypt_secret, physical_device, device_uuid)
return '/dev/mapper/%s' % device_uuid
return physical_device or None
return None