There comes a point in every successful and growing infrastructure where the implementation needs of said network will outgrow its design needs. Fair enough; there's only so much one can account for (especially in this industry), and managing a network to be flexible and change with business needs is what separates the men from the boys (so-to-speak).

Our DC infrastructure hit this point fairly recently. We have a number of db servers split out into shards. Each shard is designed with a master/slave replication topology, keeping one of the slaves as an analysis machine. The analysis machines were originally meant to perform some very limited and specific work, but of course this expanded and evolved as time progressed. Eventually we had the need to upgrade the analysis hardware, but since there were only one in each shard it would be hard to keep them down for the length of time needed to do our work.

The decision was made to move a copy of the analysis database onto a db slave, and migrate the analysis IP to the slave as a virtual IP. In theory, this would be simple. Machines from various VLANs would continue to connect to the "analysis machines" by the same IP addresses they always have, but they would now be temporarily served by the slaves.

Creating the virtual aliases were easy. We could use ifconfig to add an additional interface like so:

# ifconfig bond0:1 10.2.2.59 netmask 255.255.255.0

The trouble now came with the routes. We added several routes to the machine in our usual manner.

route add -net 10.3.3.0/24 gw 10.2.2.1 dev bond0
route add -net 10.4.4.0/24 gw 10.2.2.1 dev bond0
route add -net 10.5.5.0/24 gw 10.2.2.1 dev bond0
route add -net 10.6.6.0/24 gw 10.2.2.1 dev bond0

But when anyone tried to connect from these subnets, their connections would time out.

As a test, I tried connecting to the primary IP over telnet:

$ telnet 10.2.2.52 3306

Immediately I was presented with a MySQL banner. Yet doing the same for the virtual IP would simply time-out. We verified the routes, just to make sure everything seemed apropriate.

$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface
10.2.2.0        *               255.255.255.0   U     0      0     0   bond0
10.3.3.0        10.2.2.2        255.255.255.0   UG    0      0     0   bond0
10.4.4.0        10.4.4.2        255.255.255.0   UG    0      0     0   bond0
10.5.5.0        10.5.5.2        255.255.255.0   UG    0      0     0   bond0
10.6.6.0        10.6.6.2        255.255.255.0   UG    0      0     0   bond0
default         10.2.2.1        0.0.0.0         UG    0      0     0   bond0

Then we took packet captures on the network to see where our traffic was going. Packets were indeed moving into the db slaves, but none returned. This smelled an awful lot like a routing issue.

As a simple step, we removed our routes, and added them back using the virtual designation as a test.

route add -net 10.3.3.0/24 gw 10.2.2.1 dev bond0:1
route add -net 10.4.4.0/24 gw 10.2.2.1 dev bond0:1
route add -net 10.5.5.0/24 gw 10.2.2.1 dev bond0:1
route add -net 10.6.6.0/24 gw 10.2.2.1 dev bond0:1

The result here was peculiar. Examining the routing table again, we received the same output as before:

$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface
10.2.2.0        *               255.255.255.0   U     0      0     0   bond0
10.3.3.0        10.2.2.2        255.255.255.0   UG    0      0     0   bond0
10.4.4.0        10.4.4.2        255.255.255.0   UG    0      0     0   bond0
10.5.5.0        10.5.5.2        255.255.255.0   UG    0      0     0   bond0
10.6.6.0        10.6.6.2        255.255.255.0   UG    0      0     0   bond0
default         10.2.2.1        0.0.0.0         UG    0      0     0   bond0

We tried testing conectivity anyway, but unfortunately the result was the same. We next tried removing the routes and adding them in with the iproute2 tools. Again, this was unsuccessful. The primary interface would continue to work, and the virtual interface acted as a virtual black hole. There was a bit of head-scratching next, and quite a bit of reading, but we eventually did come to the bottom of it.

You see, there are a few nasty elements at play here. Back in the golden days of Unix, multiple IPs per interface weren't even a thing, so when the net-tools suite was written (ifconfig, route, netstat, etc) their support didn't exist. Once it became desirable to have them, the idea of creating virtual interfaces was monkey-patched on-top of the existing infrastructure. This was problematic in it's own right, but was compounded by efforts to actually correct this later down the road. The iproute2 suite was eventually written to work more integrally with a completely redesigned network subsystem in the Linux kernel. This subsystem would be far more flexible and advanced that the original model, and as a result much of the administration and support for advanced networking features would become just easier and more reliable to use. In order to facilitate this migration, the old net-tools have been deprectated. In the meantime, they have been mostly patched to map behind the scenes on-top of the new network subsystem. Sadly, this leaves a bunch of legacy components in a sorry state. The new networking subsystem has no real concept of virtual interfaces, so they don't map properly. As a result, trying to use the iproute2 utils with VIPs can have unexpected results (often in the form of virtual interfaces falling back to the primary interface definition: i.e., bond0:1 becomes bond0).

This whole debacle takes one step further and shows broken behaviour with the original net-tools. As it runs out, if you explicitly bind a route to a particular virtual interface, it will always default to the primary interface. However, not defining it allows all interfaces (even virtual interfaces) to use the route. The madening part is, if you don't specify an interface, the routing table still shows you bound to the primary interface. So viewing your routing table will give you identical output whether you have explicity bound the route or not.

The final solution here was to simply remove all of our routes and redefine them without explicit binds:

$ sudo route add -net 10.3.3.0/24 gw 10.2.2.1
$ sudo route add -net 10.4.4.0/24 gw 10.2.2.1
$ sudo route add -net 10.5.5.0/24 gw 10.2.2.1
$ sudo route add -net 10.6.6.0/24 gw 10.2.2.1
$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref   Use Iface
10.2.2.0        *               255.255.255.0   U     0      0     0   bond0
10.3.3.0        10.2.2.2        255.255.255.0   UG    0      0     0   bond0
10.4.4.0        10.4.4.2        255.255.255.0   UG    0      0     0   bond0
10.5.5.0        10.5.5.2        255.255.255.0   UG    0      0     0   bond0
10.6.6.0        10.6.6.2        255.255.255.0   UG    0      0     0   bond0
default         10.2.2.1        0.0.0.0         UG    0      0     0   bond0

From this point, everything immediately started working. Looking at the routing tables before and after the fix, you wouldn't even notice the difference.

The moral of the story? They heyday of net-tools and VIPs has passed. It's time to just start using iproute2 and multiple address interfaces the way the good developers have intended.

NOTE: Addresses and routes have been altered to protect the original infrastructure, but still accurately reflect the original behaviour for each example.