Experience case of Unified Platform heketi pod status as CrashLoopBackOff at a certain site

2025-04-30 19:22:06 Published
  • 0 Followed
  • 0Collected ,3Browsed

Problem Description

Execute the kubectl get pod -A -owide | grep -v Run command in the Unified Platform backend and found that the status of the heketi pod is CrashLoopBackOff as shown in the following figure

Continue to execute the kubectl logs -f heketi-xxx (heketi pod name) -n glusterfs-example command and found an error indicating digits file does not exist as shown in the following figure

Process Analysis

First execute the command kubectl get pod -A | grep gluster to obtain the gluster pod name. Then execute the command kubectl exec -it gluster-xxx (gluster pod name) -n glusterfs-example bash to log in to any gluster pod backend. Next, execute the command gluster volume heal heketidbstorage info to see that heketi.db is in the Is in split-brain state (i.e., split-brain state). As shown in the following figure:

When the version of the Unified Platform is earlier than E0613P08, the heketidbstorage volume may experience split-brain in the event of server power-off or network fluctuations, which can cause abnormalities in the heketi database.

Solution

(1) Temporary mitigation solution: Log in to any gluster pod backend and execute the command gluster volume heal heketidbstorage split-brain latest-mtime /heketi.db for repair. After repair, execute the command gluster volume heal heketidbstorage info to see the following normal response:

(2) Permanent solution: Upgrade the Unified Platform to version E0613P08 (inclusive) or later.

Please rate this case:   
0 Comments

No Comments

Add Comments: