Check ceph-s, status is warning
One PG is in the following state: active+clean+scrubbing+deep+repair
Check the detailed reason through ceph health detail to identify which specific pg is abnormal.
[root@cvknode01 ~]# ceph health detail
HEALTH_WARN Degraded data redundancy: 1 pg repair
PG_DEGRADED Degraded data redundancy: 1 pg repair
pg 2.81 is active+clean+scrubbing+deep+repair, acting [13,30,6]
Use the query command with the pg id confirmed by the above command.
The progress of ceph pg 2.81query is as follows.
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2025-05-14 14:32:52.349278",
"might_have_unfound": [
{
"osd": "6",
"status": "already probed"
},
{
"osd": "7",
"status": "not queried"
},
{
"osd": "30",
"status": "already probed"
}
],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "MIN",
"backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "4346",
"scrubber.active": true,
"scrubber.state": "NEW_CHUNK",
"scrubber.start": "2:8115ce2a:::rbd_data.1.2778318e1e8a0.000000000002b3a8:0",
"scrubber.end": "2:8115ce2a:::rbd_data.1.2778318e1e8a0.000000000002b3a8:0",
"scrubber.subset_last_update": "0'0",
"scrubber.deep": true,
"scrubber.seed": 4294967295,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2025-05-14 14:32:51.349996"
}
],
It can be seen that osd6 and osd30 are normal, while osd7 is not queried, indicating that the corresponding disk previously experienced an anomaly.
Check the hdm of the host where the corresponding disk is located, and confirm that there is indeed a disk fault alarm.
On-site personnel contacted the storage team for investigation, confirming the issue. The storage team recommended replacing the disk as soon as possible.
Currently, wait for the storage to automatically repair. After recovery, ceph -s health status will be ok. Replace the disk with early alarm as soon as possible.