MCX6 NIC single-machine bidirectional traffic test performance fails to reach the theoretical 200G performance

  • 0 Followed
  • 0Collected ,15Browsed

Network Topology

Server: 4500 G6 (The issue is not significantly related to the server model)
NIC model: IB-MCX653105A-HDAT-200Gb-1P

Problem Description

Compare the differences in hardware (HW) configuration between competitors and our company. There are differences in the processor and NIC models.
Additionally, it is generally normal for two servers directly connected via NICs to achieve 90% of the theoretical performance in throughput testing.
However, this command tests the local loopback address and relies more on local kernel performance.

Process Analysis

The issue persisted after replacing the same model 5520 processor from a competitor, as well as models 6330, 5380, and 5420. Performance even degraded when attempting to boost single-core turbo frequency by disabling cores.
Subsequent comparison of NIC firmware parameter settings revealed a discrepancy in the MAX_ACC_OUT_READ parameter between normal servers and problematic servers. Normal servers had a value of 44, while underperforming servers had a value of 32.

NVIDIA"s official website also provides guidance on tuning this MXC6 NIC, recommending changing the default value from 32 to 44. Original link: https://docs.nvidia.com/grace-perf-tuning-guide/optimizing-io.html

Modifying this parameter can achieve performance exceeding 200G



Solution

Parameter setting method; use under the system
1. sudo mlxconfig -d query | grep MAX_ACC_OUT_READ-----Check whether this parameter is enabled and its default value
2. sudo mlxconfig -d set ADVANCED_PCI_SETTINGS=1-----If this parameter does not exist, use this command to add it
3. sudo mlxconfig -d set MAX_ACC_OUT_READ=-----Set the parameter value

Note: This parameter requires a restart to take effect.

Please rate this case:   
0 Comments

No Comments

Add Comments: