Mac Drift Case in Multi-level M-LAG Networking

2023-12-13 17:07:50 Published
  • 0 Followed
  • 0Collected ,754Browsed

Network Topology

Devices and Versions: S10508X R7596P10

Networking: Two S10508X and two S6800 switches form a multi-level M-LAG network. The two S6800 switches serve as access layer switches, providing pure Layer 2 forwarding. The two S10508X switches are gateway devices. The two levels of M-LAG are interconnected through the M-LAG group (BAGG301). The access layer S6800 connects to a single-hung device downstream. The overall network is shown in the following figure:


Problem Description

During the testing of the new multi-level M-LAG networking service, it was found that the mac address of the single-hung device was learned on the uplink M-LAG interface of S6800-1. Further investigation revealed that both S6800-1 and S6800-2 had mac drift phenomena of the single-hung device. The drift interface on S6800-1 was the uplink M-LAG interface and the single-hung port Ten1/0/3, and the drift interface on S6800-2 was the uplink M-LAG interface and the peer-link interface.

Process Analysis

Under normal circumstances, the mac address of the single-hung terminal on S68-1 should be learned on the interconnection interface Ten1/0/3. On S68-2, the mac of the single-hung terminal under S68-1 should be learned on the peer-link interface of S68-2, as shown by the green star in the figure below. However, on-site, both the uplink M-LAG interfaces of S68-1 and S68-2 learned the mac of the single-hung terminal (as shown by the orange star in the figure below), suspecting that the two S68 devices received packets with the source mac being the mac of the single-hung terminal on the uplink port.



Therefore, we turned our attention to the two S105X uplinks. On S105X, the mac and ARP of the single-hung terminal are normally learned on the M-LAG interface. We performed packet flow statistics based on the source mac on the two S105X switches and found that on S105X-1, only packets with the source mac being the mac of the single-hung terminal were received on the M-LAG interface BAGG301; on S105-2, packets received from the peer-link direction were observed, and packets with the source mac being the mac of the single-hung terminal were sent to S68 from the M-LAG interface, as shown in the figure below:



S68-2 received packets with the source mac being the terminal mac on the uplink M-LAG interface, learned this mac address on the M-LAG interface, and then synchronized it to S68-1 through the rlink packet, so S68-1 also learned the mac of the single-hung terminal on the uplink M-LAG interface. This is how the mac drift phenomenon occurred on the two S68 devices on-site.

So why did S105X-2 send the packet with the source mac being the mac of the single-hung terminal back through the M-LAG interface? After underlying investigation, the cause of the fault was that during the initial debugging at the site, the peer-link interface occasionally experienced manual oscillation. During the oscillation process, an abnormal error occurred in the process of synchronizing the M-LAG interface status between the two M-LAG devices through the peer-link, resulting in incorrect M-LAG interface information recorded on one of the M-LAG devices. In simple terms, on S105X-2, the M-LAG interface information on S105X-1 was not correctly recorded. When receiving the packet from BAGG301 on S105X-1, it was forwarded out again from its own BAGG301.

Solution

Temporary workaround: Oscillate the BAGG301 on S105X-1 to trigger the synchronization of M-LAG interfaces to S105X-2 again.

Solution: The R762X version of the S10500X series switches resolves this known issue. Upgrade to the corresponding version.

 

Tips No.1:

When setting up devices, it is recommended to contact TAC or the local office to confirm the latest recommended version to avoid known version issues affecting the progress of device setup.

Tips No.2:

Have you noticed any inappropriate aspects of the multi-level M-LAG networking in this case? In this example, the two levels of M-LAG are interconnected through the M-LAG group, but it is not a cross-connection but a hub-shaped connection. Under normal circumstances, this is not a problem. However, if one of the devices fails, the peer-link will bear a large amount of traffic, which may cause congestion and packet loss. As shown in the figure below, if S105-1 fails, all traffic can only go down to S105-2 and then to S68-2:



A more reasonable network planning is to cross-connect between the two levels of M-LAG, so that even if one device fails, the load can be shared:




Please rate this case:   
0 Comments

No Comments

Add Comments: