The customer reported a RAID controller failure. After replacing the RAID controller, the issue recurred, and the same alarm message appeared in the logs.
Analyze logs:
#\dump_info\LogDump\maintenance_log
2024-12-06 00:41:48 INFO : SVR-0000000,Collecting physical drive log from OOB started.
2024-12-06 00:42:05 INFO : SVR-0000000,Collecting physical drive log from OOB ended.
2024-12-06 22:30:35 ERROR: SVR-0072002,RAID Card1 heartbeat abnormal asserted(0 to 1)
2024-12-06 22:33:13 ERROR: SVR-0080006,RAID controller (RAID Card1) communication loss - Asserted
2024-12-06 22:33:37 ERROR: SVR-0072002,RAID Card1 heartbeat abnormal deasserted(1 to 0)
2024-12-06 22:34:36 ERROR: SVR-0072002,RAID Card1 heartbeat abnormal asserted(0 to 1)
2024-12-06 22:38:05 ERROR: SVR-0080006,RAID controller (RAID Card1) communication loss - Asserted
2024-12-06 22:58:36 ERROR: SVR-0072002,RAID Card1 heartbeat abnormal deasserted(1 to 0)
2024-12-06 22:59:36 ERROR: SVR-0072002,RAID Card1 heartbeat abnormal asserted(0 to 1)
#\dump_info\LogDump\app_debug_log_all
GetCtrlPhyConnectionsInfo fail return 0x1001 indicates communication disrupt
2024-12-13 00:42:01 StorageMgnt ERROR: sml_lsi.c(13839): smlib: LSI:GetCtrlInfo failed, CtrlId = 0, return 0x1001
2024-12-13 00:42:01 StorageMgnt ERROR: sml_lsi.c(14127): smlib: LSI:GetCtrlPhyConnectionsInfo failed, CtrlId = 0, return 0x1001
2024-12-13 00:42:02 StorageMgnt ERROR: sml_lsi.c(13839): smlib: LSI:GetCtrlInfo failed, CtrlId = 0, return 0x1001
2024-12-13 00:42:02 StorageMgnt ERROR: sml_lsi.c(14127): smlib: LSI:GetCtrlPhyConnectionsInfo failed, CtrlId = 0, return 0x1001
Based on the above information, Huawei factory analysis concludes that the raid card has a surprise dom fault print, followed by repeated initialization failures of the raid card firmware.
Replace the motherboard to ensure the RAID card link is functioning properly. It is also advisable to bring a spare RAID card.