UIS hyper-convergence E0750P11 pg shows active+clean+scrubbing+deep+repair

Cloud host

2025-05-22 14:39:39 Published

0 Followed
0Collected ，652Browsed

zhiliao_OaCcf

Problem Description

Check ceph-s, status is warning

One PG is in the following state: active+clean+scrubbing+deep+repair

Process Analysis

Check the detailed reason through ceph health detail to identify which specific pg is abnormal.

[root@cvknode01 ~]# ceph health detail

HEALTH_WARN Degraded data redundancy: 1 pg repair

PG_DEGRADED Degraded data redundancy: 1 pg repair

pg 2.81 is active+clean+scrubbing+deep+repair, acting [13,30,6]

Use the query command with the pg id confirmed by the above command.

The progress of ceph pg 2.81query is as follows.

"recovery_state": [

{

"name": "Started/Primary/Active",

"enter_time": "2025-05-14 14:32:52.349278",

"might_have_unfound": [

{

"osd": "6",

"status": "already probed"

{

"osd": "7",

"status": "not queried"

{

"osd": "30",

"status": "already probed"

}

"recovery_progress": {

"backfill_targets": [],

"waiting_on_backfill": [],

"last_backfill_started": "MIN",

"backfill_info": {

"begin": "MIN",

"end": "MIN",

"objects": []

"peer_backfill_info": [],

"backfills_in_flight": [],

"recovering": [],

"pg_backend": {

"pull_from_peer": [],

"pushing": []

}

"scrub": {

"scrubber.epoch_start": "4346",

"scrubber.active": true,

"scrubber.state": "NEW_CHUNK",

"scrubber.start": "2:8115ce2a:::rbd_data.1.2778318e1e8a0.000000000002b3a8:0",

"scrubber.end": "2:8115ce2a:::rbd_data.1.2778318e1e8a0.000000000002b3a8:0",

"scrubber.subset_last_update": "0'0",

"scrubber.deep": true,

"scrubber.seed": 4294967295,

"scrubber.waiting_on": 0,

"scrubber.waiting_on_whom": []

}

{

"name": "Started",

"enter_time": "2025-05-14 14:32:51.349996"

}

It can be seen that osd6 and osd30 are normal, while osd7 is not queried, indicating that the corresponding disk previously experienced an anomaly.

Check the hdm of the host where the corresponding disk is located, and confirm that there is indeed a disk fault alarm.

On-site personnel contacted the storage team for investigation, confirming the issue. The storage team recommended replacing the disk as soon as possible.

Solution

Currently, wait for the storage to automatically repair. After recovery, ceph -s health status will be ok. Replace the disk with early alarm as soon as possible.

Please rate this case:

0 Comments

No Comments

UIS hyper-convergence E0750P11 pg shows active+clean+scrubbing+deep+repair

Problem Description

Process Analysis

Solution

Add Comments: