Yangtze Memory SE005 drive YM120104 version FW issue

2025-06-26 14:54:38 Published
  • 0 Followed
  • 0Collected ,8Browsed

Problem Description

A customer used the Yangtze Memory SE005 drive with FW version YM120104, experiencing a high failure rate where the issue manifested as single drive dropouts

Process Analysis

Example of failure phenomenon:

(1) On May 4, 2024, drive F00 encountered a command timeout. The RAID controller attempted to reset the drive but failed, resulting in F00 being dropped and RAID degradation.

2024-05-04 11:04:01 PDIndex(Front:0)----Command timeout on PD 08(e0xfc/s0), CDB: 28 00 07 3b f0 00 00 02 00 00

2024-05-04 11:04:01 PDIndex(Front:0)----PD 08(e0xfc/s0) Path 4433221100000000 reset (Type 03)

2024-05-04 11:04:42 PDIndex(Front:0)----Removed: PD 08(e0xfc/s0)

2024-05-04 11:04:42 PDIndex(Front:0)----Diagnostics failed for PD 08(e0xfc/s0)

2024-05-04 11:05:15 PCIe slot:1---LDDevno:0 change to Degraded.

2024-05-04 11:08:13 Drive Fault

(2) On May 10, 2024, the faulty drive was replaced, rebuild succeeded, and RAID returned to normal.

2024-05-10 11:17:39 PDIndex(Front:0)----Inserted: PD 08(e0xfc/s0)

2024-05-10 11:17:44 PDIndex(Front:0)----Rebuild automatically started on PD 08(e0xfc/s0)

2024-05-10 11:17:56 Drive Presence

2024-05-10 11:18:00 Rebuild/Remap in progress

2024-05-10 11:18:07 The Front HardDisk in slot 0 has been replaced,SN from YMD1480JA214610C97 to YMD1480JA2149103W7

2024-05-10 11:42:11 Rebuild complete on VD 00/0---CtrlIndex(1)

2024-05-10 11:42:11 State change on VD 00/0 from DEGRADED(2) to OPTIMAL(3)---CtrlIndex(1)

2024-05-10 11:42:11 VD 00/0 is now OPTIMAL---CtrlIndex(1)

(3) SDS logs indicate a normal drive drop process, ruling out other factors such as the RAID controller or backplane in the link. The issue is strongly related to the drive itself.

FW issue description:

Yangtze Memory SSD drive SE005, YM120104 firmware issue: drive firmware design defect, abnormal RAM reading, improper handling of Assert function causing drive lockup and failure.

Detailed failure mode description: Longsys SSD firmware (FW) sets many Asserts corresponding to different abnormal working states for selective processing. The function of Asserts is to prevent program execution from deteriorating under abnormal conditions. Assert 245 is the processing function in FW when errors are detected during DDR/SRAM read/write verification. After storing Assert Info, FW directly enters while(1), causing an infinite loop that prevents FW from continuing execution or responding to Host commands. If there are unfinished Host CMDs in read/write I/O between the drive and RAID card at this time, CMD TimeOut will occur. For subsequent new Host CMDs, the SSD will also be unable to receive (Rx) and execute them, resulting in drive drop.

Solution

1. For failed drives, replace them with new ones with FW version YM120105.

2. For drives that have not yet failed, upgrade their FW version to YM120105.

Please rate this case:   
0 Comments

No Comments

Add Comments: