Disaster Recovery with Prism Central

Disaster Recovery with Prism Central

Studying for the NCP-MCI exam, I ran into the fact that I needed to know more about Disaster Recovery. I did the training modules, but that doesn’t seem enough to pass the exam, I think. I took some of the available test exams on Nutanix University and felt that some hands-on experience would be necessary.

I already have two clusters set-up, so I am going to protect some VM’s from both clusters to the other and try and get some scenario’s tested.

So in essence, I am building this:

You can do this both in Prism Elements and in Prism Central. For this post, I am doing it from Prism Central, I think that is the most used way of doing this.

One thing to note, to be able to replicate VM’s between clusters, the names of the Storage Container on which the VM’s are active, needs to be identical. So for that, I created a new Storage Container on the “target” cluster, with a name identical to the one that already exists on the “source” cluster:

Next, I create a category, that will be used to select the VM’s I want to protect. I create the Category: “ProtectVMs: yes”

When that is done, I head to Data Protection | Protection Policies and create a Protection Policy:

First message I see is the following:

So I click Enable to enable Disaster Recovery for the Local AZ:

This will add some resources to my Prism Central VM. After that is done, I can create a Protection Policy and give it a schedule. I chose to do synchronous replication, with an automatic failover:

Next up, is add some entities to it. This is done, based on the Category I created earlier:

And click “Create” to active the policy.

When the policy is created, I am going to add some VM’s to the category:

We can see that the Category will automatically put the VM’s in the Protection Policy:

And when we now look at the Protection Summary, we can see the VM’s being synced and protected:

We can also see this at the VM page:

Next step is to create a Recovery Plan:

We could set up an automatic recovery plan, with Prism as it’s Witness:

But since Prism Central is currently running on the same cluster, I had a hard time figuring out exactly how to trigger that. So I’m creating a manual failover plan:

We are adding both VM’s to the plan, with two stages:

We have to map the networks. For both sites, we have to define Production Subnets and a Test Failback/Failover subnets. I am choosing not to enable “Static VM IP Mapping”, since all VMs are configured to use DHCP. My two VMs are in different subnets, so I configure two sets of subnet mappings:

Next step is to Validate this plan (under Actions):

Unfortunately I don’t have the licenses to move forward:

But, to give some additional screens, this would be the screen to do a planned Failover:

Or an unplanned failover:

I have to validate that I indeed want to failover:

(and see it fail, because of the license thingy):

If I click “Execute Anyway”, it does however failover, even to my own surprise :). I also notice that the Win01 VM was started before the Win02 VM, honoring the Stages I defined. It shows the succesful failover (and a failed validation):

Time to bring the VM back to the other side. This is basically another failover, which brings the VMs back to the original location.

Next activity to do, is a test-failover. This will bring the VMs up on the secondary site, but in a different network:

Same warning:

Same result, when I click “Execute Anyway”. But this time, instead of seeing the VMs move from the Ronald-Cluster to Ronald-Cluster2, I see two new VMs being created and booted (again, in the right order):

In the test-subnet, with a DHCP address in the right range. And I can test the application that was failed over, if I wanted to. After all the testing is done, I can cleanup the test-VM’s:

Everything can also be seen in the tasks:

Last activity to try out, is a Live Migration. That is more or less the same as the first Failover, but I marked the checkbox “Live Migrate Entities”. One of the prerequisits to be able to do this, is the same version of the hypervisor, running on both source and target. But after I upgraded my second cluster, I was able to do the live migration.

To test this out, I started a “ping” on the Win01 VM, to see if that would keep running (and how many pings it would miss):

Then, I start the Failover:

Get the warning about the license:

But Execute Anyway…

And, as with the earlier migration, Win02 is done first, and after that Win01 is done. When that is done (and the pinging does get a bit less stable, while migrating):

(marking these 4 pings for “future” reference).

But when the migration was done, the ping was still running and it even went without even missing a ping (although it might be related to the pausing of the VM, I didn’t perform a ping from the “outside”, this time):

After this, I failed the VM’s back. This time I had a ping running from the outside:

So, lost 4 pings, in total.

That’s it for the Prism Central part, I also made a blog on Data Protection with Prism Elements: https://my-sddc.net/data-protection-with-prism-elements/

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *