Mellanox Driver Ignoring Vlans

Recently I spend more hours than I want to talk about fixing a server that had a Mellanox ConnectX6-lx card, where I could not get Openshift to get traffic to VMs. I was creating bridges just like I normally do, and traffic was working to the main interface. After a lot of trial and error I wanted to make a quick post in case anyone runs into this.

All of this assumes a trunk port to an interface on Linux (or a bonded interface). If you have an interface in Linux on the native vlan, and that is a standard interface (example eno1). Then you add a sub interface for tagged traffic, eno1.10, the Mellanox mlx5 driver will – in hardware -ignore your vlan tag and just send traffic to the main interface.

One way to see if your card is doing this is search dmesg for “mlx5”: dmesg | grep mlx5, you may see the following:

mlx5_core 0000:0b:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 156): S-tagged traffic will be dropped while C-tag vlan stripping is enabled

(https://github.com/oracle/linux-uek/issues/20)

The Mellanox card is worried about double tagged packets and will drop tags on data coming in. It does this in hardware. You can see the settings for 8021Q kernel module being loaded, and vlan filtering is disabled, but this wont matter. If you change settings like ethtool -K <interface> rx-vlan-offload off it will say the setting is off, but the underlying driver loaded this at init time, and then the settings you set will be ignored. The only way I found to fix it is to move all the IPed interfaces off the main interface.

Once you move the IPed interface under its own sub interface and reboot, data will start flowing to your VMs.

Leave a comment