Mellanox Driver Ignoring Vlans

Recently I have spent more hours than I want to talk about fixing a server that had a Mellanox ConnectX6-lx card, where I could not get OpenShift to get traffic to VMs. I would create Linux bridges just like I normally do, and traffic was working to the main interface, except all the traffic was being sent to the primary Linux interface, and not the sub interfaces on specific VLANs that I needed it to. Other systems, with other network cards were not having this issue. After some trial and error, finding specific kernel messages, and solving it, I wanted to make a quick post in case anyone runs into this.

All of this analysis and guide will assume a trunk port to an interface on Linux (or a bonded interface). If you have an interface in Linux on the native VLAN, and that is a standard interface (example eno1). Then you add a sub interface for tagged traffic, eno1.10, the Mellanox mlx5 driver will – in hardware – ignore your VLAN tag and just send traffic to the main interface.

The smoking gun that helped me find the answer was looking at the dmesg kernel logs; search dmesg for “mlx5”: dmesg | grep mlx5, you may see the following:

mlx5_core 0000:0b:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 156): S-tagged traffic will be dropped while C-tag vlan stripping is enabled

(https://github.com/oracle/linux-uek/issues/20) This line let me find discussion online about this kernel bug, and people discussing ways to resolve.

The Mellanox card thinks there are double tagged VLAN frames and will drop tags on data coming in. The Mellanox card does this in hardware. You can check the settings for 802.1Q kernel module being loaded, and VLAN filtering is disabled in the kernel, but this won’t matter. If you change settings like ethtool -K <interface> rx-vlan-offload off it will say the setting is off, that is correct because the Mellanox firmware is doing the filtering, not the Linux kernel. When you tcpdump the interface, you will see weird results, because you are capturing traffic AFTER the firmware has dropped the header.

The only way I found to fix it is to move all the IPed interfaces off the main interface. Do not use the native VLAN to carry traffic. (Probably a good idea anyway, but this system got into this state by a port being migrated FROM access TO trunk and originally trying to do this with minimal interruption.)

Once you move the IPed interface under its own sub interface and reboot, data will start flowing to your VMs. The kernel module will reload and not attempt to remove the “outer” VLAN tag from the “double tagged” packets. This was a harder issue to solve because of the fact it only triggers on load, which means you have to reboot to find the correct fix. I saw a handful of other people mention this issue in bug reports, and that set me on the correct track.

Another part of the challenge was the system I was using was a blade server with an internal “dumb switch”, and I was never sure what that switch was doing with the tagged packets. In the end, not much, but added a complexity to the problem.

Update from the future: I have written more about networking in OpenShift and this issue in a new post Step-By-Step Setting Up Networking for Virtualization on OpenShift 4.19 for a Homelab | BuildingTents

BuildingTents

Giving the campers something to read while they guard the flag

Mellanox Driver Ignoring Vlans

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply