The High Availability feature for HCX Network Extension appliances was introduced in the version 4.3. This was a big deal, because if someone needed HA for their appliances that extend L2 networks (as this is a basic requirement for the resiliency) to the cloud and wanted to use VMware’s stack in particular, they had to deploy a pair of NSX-T standalone Edges on-prem leveraging NSX-T on the cloud side and setup a L2VPN.
VMware documentation mentions the following important prerequisites to consider before deploying HCX NE HA:
Network Extension HA requires the HCX Enterprise license.
Network Extension High Availability protects against one Network Extension appliance failure in a HA group.
Network Extension HA operates without preemption, with no automatic failback of an appliance pair to the Active role.
Network Extension HA Standby appliances are assigned IP addresses from the Network Profile IP pool.
The Network Extension appliances selected for HA activation must have no networks extended over them.
Also an interesting thing about NE in HA mode is the upgrade process:
In-Service upgrade is not available for Network Extension High Availability (HA) groups. HA groups use the failover process to complete the upgrade. In this case, the Standby pair is upgraded first. After the Standby upgrade finishes, a switchover occurs and the Standby pair takes on the Active role. At that point, the previously Active pair is upgraded and takes on the Standby role.
Let’s take a look at this new feature. In the HCX UI 4.3+, in the Interconnect -> Service Mesh -> View Appliances view, there is a new option called ACTIVATE HIGH AVAILABILITY.
First, you need to have a pair of deployed NE appliances, the option won’t work when there is no eligible partner for HA.
Also, system checks, if there are extended networks on an appliance that you select for HA.
It can be a challenge to enable HA in an environment, where networks are already extended. In most cases this would require a downtime because we would have to unextend existing networks on NE appliances for the time of the HA configuration.
The system also checks if there are eligible NE appliances to activate HA feature and we get a button that is a shortcut to edit a Service Mesh and add more appliances. It can also be a challenge, if we don’t have a sufficient number of free IPs in our Network Profile’s IP pool. For 2 additional NE appliances on-prem and at the cloud side, we need one management IP and one uplink IP for each of them.
Once appliances are deployed, you only need to press a button
Activate HA. Everything is configured for us, like in vSphere HA. New HA appliances create a HA group with a specific uuid.
When HA is enabled, we can monitor its health in the HA Management tab. In the example below, we have 2 pairs: us-east-NE-I2 (on-prem) with us-east-NE-R2 (the cloud side) and us-east-NE-I3 (on-prem) with us-east-NE-R3 (the cloud side). Right after creation, the first pair is ACTIVE and the second is STANDBY.
Also in the Appliances view we can quickly check with NE appliance is ACTIVE and which one is STANDBY.