Upgrade Failed NSX-T Edge Nodes (1G hugepage support required)

Upgrade Failed NSX-T Edge Nodes (1G hugepage support required)

When upgrading my home lab to NSX-T 3.2.0.0 in December, I ran into an issue, where I got an error message:

At that moment (3.2 was júst released) I took a look at some support articles (the benefit of being a VMware employee) and saw that I was not the only one having this issue. Shortly after encountering this, I saw that 3.2.0.0 was pulled as a method to do an upgrade (and was only available as a greenfield installation) and I decided to wait for 3.2.0.1.

This came recently so I chose to start over. Unfortunately, the Upgrade itself had started and so I first had to cancel the stranded upgrade, before I could do the upgrade to 3.2.0.1.

In order to do that, I followed: https://kb.vmware.com/s/article/82042. This essentially lets you cancel the started upgrade, by issuing the following command with a REST API client:

DELETE https://NSX_MGR/api/v1/upgrade-mgmt/plan

What I did next, was remove the Edge Nodes, that had started the upgrade (one on each NSX Managers, so two in total), but failed to finish. This seems to be the easiest way to move forward. I chose to only upgrade the other edge nodes and add a new edge node for the removed ones, after the complete upgrade had finished.

After all this was done, I could start a new upgrade, to version 3.2.0.1 only to discover that I ran into the same error message I encountered earlier (so “1G hugepage support required”). Fortunately, during the time I waited for version 3.2.0.1, some vCommunity members found a way to help me through this:

https://communities.vmware.com/t5/VMware-NSX-Discussions/NSX-T-Edge-Node-3-1-3-0-upgrade-to-3-2-0-Error-message/td-p/2884315

The problem was that the hardware I use is quite old (but still very useful for a Lab environment) and this lead to the VM not being able to tell that 1G support was available. The processor I use is a Intel E5-2697 v2, which is part of the Ivy Bridge familiy. It doesn’t show the 1G support by default, so I had to help this along.

The following is necessary for the upgrade to complete:

  • EVC mode needs to be disabled
  • And the following Advanced Feature needs to be added to the VM:
    • featMask.vm.cpuid.pdpe1gb: Val:1

Like this:

(so not just “1” but “Val:1” as the value).

And after that, the upgrade can be retried, and (in my case) was successful:

The final step, deploy a new edge node for the one that was removed, but for my environment, that is not too difficult.

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *