Anti Affinity policy shows “Pending” on Prism Central

Anti Affinity policy shows “Pending” on Prism Central

I received a question from a customer. There were some issues with anti-affinity rules in their environment, where the status would stay on “Pending”, after they had created an anti-affinity rule for VM’s. In my lab-environment I tried this out as well and, at first, ran into the same issue.

This is my environment:

As you can see, I have two clusters, connected to one Prism Central. I usually have my lab shut down and power it on, when needed. When that is the case, I usually try to not use too many resources, so If I can use one of the clusters, for my tests, that is perfectly fine. And that is relevant here.

So what I did was the following (for two VMs):

  • Power on Nutanix Cluster – 1 and all required VM’s, like Prism Central and pfSense
  • Power on Linux based Guest VM’s
  • Move VM’s to the same host for two VM pairs
  • Create a category
  • Assign category to VM’s
  • Create an Anti-Affinity Policy, with the created category
  • Waited for (at least) half an hour (two ADS cycles)

And I did this twice, once as an AD-based user, with Super Admin permissions and once with the Admin account itself. I used one pair of VM’s per policy.

After all that, I found that the Anti Affinity policy stayed on “Pending”, like so, even after waiting for a long time:

Then I tried to find some answers. In one of the internal slack-posts I found around a similar issue, I was refered to a KB-article: https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0VO000000BA450AG.

Which did more or less describe my issue, but I am not using any firewalls. However, I did see log entries like this:

E0113 08:39:04.494000Z 228621 vmm_error.go:1956] vm-anti-affinity-policies/v4_list_legacy_vm_anti_affinity_policy.go:248 Failed to get supported clusters for anti-affinity policy: Error occurred in trying to get Anduril capabilities: Error in sending RPC: 103

This more or less led me in the direction of the second cluster. So I powered on the second cluster and after that was completed, the Policy first moved to:

and after two of the VM’s were placed on different hosts, to:

After resolving this, I shut down the second cluster again, remove the policies, recreate the policies and see if that would recreate the problem. It did, althoug this time I got a different entry in the log-file:

10:29:06.919091Z 228621 vmm_error.go:1956] vm-anti-affinity-policies/policy_engine_apply_task.go:198 7ddf0a50-4e93-52a8-b36a-baf7d5c42444: Failed to CreateOrUpdateVmGroupWithAntiAffinity through Narsil at a cluster with UUID 000640c9-e04b-4b7a-71c7-005056a52cdf: rpc error: code = Unavailable desc = last connection error: connection error: desc = "transport: Error while dialing: dial tcp 172.17.200.103:2121: i/o timeout".

Which seems to indicate that this time it does know where to connect, but just can’t, while at first it didn’t know where to connect. Small distinction, but identical results.

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *