1.6 KiB
Ceph Maintenance
This MOP covers Maintenance Activities related to Ceph.
Table of Contents
- Table of Contents
-
- Generic commands
-
- Replace failed OSD
-
1. Generic Commands
Check OSD Status
To check the current status of OSDs, execute the following:
nccli osd-maintenance check_osd_status
OSD Removal
To purge OSDs in down state, execute the following:
nccli osd-maintenance osd_remove
OSD Removal By OSD ID
To purge OSDs by OSD ID in down state, execute the following:
nccli osd-maintenance remove_osd_by_id --osd-id <OSDID>
Reweight OSDs
To adjust an OSD’s crush weight in the CRUSH map of a running cluster, execute the following:
nccli osd-maintenance reweight_osds
2. Replace failed OSD
In the context of a failed drive, Please follow below procedure. Following commands should be run from utility container
Capture the failed OSD ID. Check for status down
nccli ceph osd tree
Remove the OSD from Cluster. Replace <OSD_ID>
with above captured failed OSD ID
nccli osd-maintenance osd_remove_by_id --osd-id <OSD_ID>
Remove the failed drive and replace it with a new one without bringing down the node.
Once new drive is placed, delete the concern OSD pod in error
or CrashLoopBackOff
state. Replace <pod_name>
with failed OSD pod name.
kubectl delete pod <pod_name> -n ceph
Once pod is deleted, kubernetes will re-spin a new pod for the OSD. Once Pod is up, the osd is added to ceph cluster with weight equal to 0
. we need to re-weight the osd.
nccli osd-maintenance reweight_osds