Redundant Multi-Site with NSX-T

September 10, 2020 ronaldpj Comments 0 Comment

One of my customers is looking to create an evironment with DR capabilities, from a networking point of view. We have looked at using Federation, but multiple reasons exist why this is currently not the best way to go forward on this. So I am looking for an alternative method to create redundancy across sites, without the need to renumber IP-addresses on virtual machines.

In my home-lab I have an environment consisting of two (virtual) sites, connected to one (physical) network fabric, so this should be possible, with the use of BGP.

What I am trying to accomplish is the following:

To get there, we need the following steps:

Create four VLAN segments, connected to the correct Transport Zone for transit.
Create T0 gateways on both sites (on an existing edge node cluster, with sufficient resources to host this).
Connect the T0’s with the physical fabric, through dedicated transit VLAN’s
Create an IP Prefix list and Route Map to influence the advertisements of BGP routes ór change the incoming metrics on the physical fabric
Configure BGP from both T0’s to the physical network, with different AS’s for both sites ánd differences to determine the prefered path.
Create T1 gateways on both sites and connect them to the T0’s
Create segments on both sites, with the same subnet and see how the BGP advertisements will create the topology we want
Do some failover testing to see if it works as expected :).

Create VLAN segments, connected to the correct Transport Zone

We need transit networks, to connect to the physical network. I created VLAN’s in the physical network to use, but I also have to create the VLAN’s in NSX to be used in the T0’s.

I create four VLAN-segments, with the corresponding VLAN-id’s and connect them to the existing Transport Zone, so they are automatically available to the Edge Nodes, which are also connected to the transport zone.

Create T0 gateways on both sites

Next I want to create a T0 on both sites, connected to the physical fabric, where the BGP metrics I am using determine which of the sites will be active on the network and which is not.

For this, we set up two T0 gateways on an existing (expanded) Edge Node Cluster and configure it with BGP to interconnect to the physical network. I have created the four additional transit-VLANs, two per site, to accomplish this.

Connect the T0’s with the physical fabric

Next, I create two interfaces per T0, to connect to the physical fabric. The configuration on the physical fabric has already been done.

Configure BGP from both T0’s

Next up, we configure BGP to peer with the physical fabric, to advertise the connected networks to the physical world:

and configure two BGP neighbors:

We configure the route re-distribution, to make sure our T1 connected segments are advertised to the physical world:

Create an IP Prefix list and Route Map

To control the active routes, that are advertised through BGP, we will be using an IP Prefix List and a Route Map.

The IP Prefix List will determine which routes will be advertised (in our case, all routes will be advertised, but we have to create one, because the default list is not available in a Route Map):

Within the All Routes prefix list, we have configured one prefix, stating “Any” and “Allow”, so all routes are allowed when using this list.

Secondly, we create a Route Map:

The member in here, is the IP Prefix List we just created. We select the Action as PERMIT and we set the AS Path Prepend value to “65200 65200, 65200 65200 65200” on Site B.

This way, the path from Site B will be longer than the path to Site A, which will result in Site A as the preferred path for the routes that are advertised from both sites.

An alternative method (but that is outside of NSX) would be to manipulate the weight of the peer on the physical fabric.

Create T1 gateways on both sites

After the hard part is done, creating a T1 is easy:

Create segments on both sites

And the same for the segment:

On Site A:

and on Site B:

Do some failover testing

At the physical fabric, we now see four routes to this subnet. All available, but only one is active (no ECMP on my Mikrotik):

If we look into the routes, we can see the route from Site A:

And from Site B:

The AS Path is prepended by the 5 AS’s we added in the Route Map. So this route is considered to be less desirable.

Now, we deploy virtual machines to both sites, with the same IP-address:

Both with address 172.25.1.2, but with different pages.

When we use this address without any issues in the network, we see:

Then, when we shut down all the Edge Nodes from Site A (effectively simulating a site failure), we see the route to Site B becoming active on the physical fabric:

and when we use the same address as before, we see that the Site B web server is now active:

So, eventually we have what we wanted. A DR solution where one subnet can live on two sites and BGP will make sure that the right site is active for this subnet.

Oh and off course, when we turn on all the Edge Nodes from Site A again: