Problem background:
When accessing a web page hosted on a Huawei Cloud server through Wi-Fi, the page freezes and cannot be opened. Accessing other web pages on the Internet in the same environment works normally.
Trying to use local forwarding and centralized forwarding had the same effect. Modifying the TCP MSS under the user network segment int vlan also had no effect. However, accessing the web page through a wired PC works normally.
On-site, repeated testing was performed according to the fault phenomenon, and the same fault occurred when using the V5 FAT AP. Only when all the VLANs of the AP were configured as VLAN1, the webpage opened quickly.
Two experiments were conducted on site:
1. The V5 FAT AP was configured as a whole with VLAN1, and controlled by the POE switch's accessVLAN670. The wireless terminals could open the web pages normally.
2. The V5 FAT AP was configured as a whole using VLAN670, and the POE switch was set to trunk mode. The wireless terminals experienced web page lag when trying to open the pages.
Packet captures were taken on both the server and PC sides for both experiments.
Let's take a look at the HTTP process under normal conditions.
The test involves downloading three images, each of which displays successfully after interaction, and the entire process from opening the web page to completing the download takes about 0.2 seconds.
Now let's take a look at the interaction in the VLAN670 environment.
Only the image 1112.png downloaded successfully, while the other images didn't receive a "server OK" reply from the server and took as long as 1 minute to complete.
It seems that the TCP interaction of image 1111.png is definitely affected. Let's use Wireshark to trace the TCP stream and see where the problem lies. First, let's take a look at the normal situation.
The TCP packet transmissions appear to be smooth with no retransmissions or delays in the normal situation.
In trunk mode:
The TCP packet transmission from the server, sending a 1300-byte packet, appears to be similar. However, after the terminal ACKs, the server doesn't send any more packets for as long as 0.04 seconds.
The packet sent in 0.049 seconds is only one TCP packet. According to the normal process, at least six TCP packets need to be sent to complete the interaction. In the previous interaction, the server took less than 0.03 seconds to complete the first image request.
Looking at the packet captures from the terminal's network card also shows a similar phenomenon.
After the terminal replies, it takes about 0.05 seconds for the server's next packet to reach the terminal.
From what we can see from the analysis, it seems that the server's response is slow. However, we need to further investigate the root cause of this issue. At this point, there is no clear-cut way to associate the issue with the VLAN tag.
Finally, with the joint efforts of the vendors, the root cause of the Wi-Fi issue was discovered.
It turned out that the problem was caused by the acknowledgment (ACK) packets sent over Wi-Fi being smaller than the standard 64 bytes (only 54 bytes). In contrast, the ACK packets sent over the wired connection are 64 bytes in length after being supplemented by the switch. This packet size of 54 bytes affects the compatibility with the cloud gateway, which is a performance bug on the server side. The server vendor will optimize the performance to address this issue.