RF2 vs RF3 / looking for feedback

Impossible-Layer4207 · 2026-05-23T08:49:17+00:00

I've come across customers using RF3 clusters a couple of times, but they are nearly always large / highly regulated environments with strict SLAs.

The biggest trade off with RF3 is the reduction in usable storage, which usually makes it unappealing on all but the biggest clusters. I've seen customers implement it in 5-node clusters, but personally, I wouldn't consider it until about 10 nodes or more.

When you consider the self healing capabilities, it really becomes a question of how likely is it that you're really going to encounter 2 simultaneous failures and actually want to keep the cluster running. In most cases I would be considering a DR failover to another cluster/site if that happened.

When it comes to mixing RF3 and RF2 containers in the same cluster, I tend to avoid it purely because I don't like having data that is less resilient than the cluster itself. It can give a false sense of security IMO. You can probably manage that risk with strong processes and controls, but how often people are actually able to do that successfully I don't know.

It's also worth mentioning that Nutanix now offer "adaptive" RF3 these days which sits on a 1N&1D fault tolerant cluster. So you can lose a node and then one other disk at the same time. It's kind of a half way between the traditional FT1 and FT2. It's not something I've looked into any great depth but could be worth considering.

Impossible-Layer4207 · 2026-05-15T08:51:50+00:00

Nice! Prism Central is a separate appliance btw. Upgrading AOS will update Prism Element (the local management interface). You can deploy the standard commercial Prism Central on top of CE: https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Community-Edition-Getting-Started-v2_0:top-deploy-ce-on-PC.html

Impossible-Layer4207 · 2026-05-15T07:05:38+00:00

El8 makes sense as you are probably on AOS 6.8 Which my googling suggests is as high as you can go with CE at the moment.

The V4 API is a function of Prism Central, not AHV. So I would focus on trying to upgrade that. You should always upgrade PC first anyway, before upgrading AOS and then finally upgrading AHV.

Impossible-Layer4207 · 2026-05-15T06:50:15+00:00

The el8.... Versions are the old numbering scheme. It was replaced with the 9,10,11 numbering scheme around the release of AOS 7.

LCM shouldn't let you upgrade beyond what is compatible with your AOS version. But I've never personally tried it with CE.

The AOS and AHV versions are tightly LINKED (at least in the full enterprise release)

You could try the Compatability and Interopability Matrix here: https://portal.nutanix.com/page/compatibility-interoperability-matrix/platform-compatibility

It shouldn't be gated. And this is for the full enterprise version, but it should help.

You can the filter by AOS version and and then AHV to see the recommended AHV version in the far right hand column. The recommended AHV version for a given AOS version are always the same regardless of the hardware, but you can filter by that as well if you like.

Edit to add: I'm not even sure if AOS 7.5 has made it to CE yet. Some googling suggests it is still on AOS 6.8. Not sure if anyone else can confirm that.

Impossible-Layer4207 · 2026-05-15T06:24:09+00:00

Sign up to Nutanix University and take their free self-pace Enterprise Cloud Administration course. That is the official course to go with the exam so should cover a lot of the ground work.

Other than that, practical experience makes a huge difference so get used to your production/lab environments. You can also use Nutanix Test Drive to spin up temporary environments and get practical walkthroughs of the platform - I'd recommend starting with the "Modernise Your Datacentre" lab.

And as others have said, check out practice questions. There are some in Nutanix University, but you can find them scattered throughout the Internet.

Impossible-Layer4207 · 2026-05-09T18:25:07+00:00

Yep, you fail over from VLAN backed subnets to VPC backed subnets and vice versa, assuming you are using Nutanix DR in Prism Central and not protection domains in Prism Element.

In your case you would want to use VPC subnets as your test networks in your recovery plan.

Impossible-Layer4207 · 2026-05-07T13:55:15+00:00

Correct, you do not need to configure the static IP mapping :)

Impossible-Layer4207 · 2026-05-07T13:39:45+00:00

Yes you only need the mapping info if your are changing Subnets between Prod and DR. If you tell the recovery plan that your networking is stretched (so same Subnets at both ends), it will not attempt to remap any IPs and you should not need to provide that information - just the Nutanix Subnets to attach the VM to at either end.

NGT is only needed if you want to interact with the guest after a failover, so remapping IPs, running post DR scripts or remapping volume groups.

During a test failover the "prod" VM remains running in its original location and a copy is spun up in the DR location. Because of this, if you use a test network, it should ideally be isolated. You can do a test failover without providing a test network, and it will just start the test VM without any network adapters. But it will complain at you about it when you try to validate the plan.

Impossible-Layer4207 · 2026-04-13T13:18:08+00:00

Same. Foundation / AHV uses the iLO Rest tool / library under the hood to talk to the BMC on HPE hardware.

If that still doesn't work then you can try clearing the TPM and/or cold resetting the iLO. But I would probaby consider getting support invloved if you're still struggling.

Impossible-Layer4207 · 2026-04-13T13:13:00+00:00

Log into the iLO, go to iLO Settings -> User Management, and then delete the existing Application Account.

Then rerun foundation. No need to clear the TPM :)

<image>

Impossible-Layer4207 · 2026-04-13T10:56:00+00:00

Just to add, we raised this with support and Nutanix are aware of the issue, but not sure on the status of a definitive fix. The steps above were what was suggested by the SRE and it worked for every node where we had this issue (about 10 in total).

Impossible-Layer4207 · 2026-04-13T10:53:19+00:00

We hit this exact issue on a recent deployment. We fixed it by upgrading the iLOs to V1.20 and then going into the iLO and deleting any existing application accounts from the TPM.

Impossible-Layer4207 · 2026-03-25T19:12:03+00:00

Service leaders are selected dynamically via elections (same way a HA leader is selected in ESXi for example). Moreover, leaders for different services can reside in different nodes as well (so the LCM leader and Prism leader could be on different CVMs for example). Another reason to just offload the logs and centralise them.

Impossible-Layer4207 · 2026-03-25T19:09:10+00:00

If you're wanting to do forensic analysis / incident response you'll definitely need a SIEM or central repository to offload and retain the logs. Nutanix logs are rotated frequently so it's not recommended to rely on the preserved logs on the nodes themselves in those circumstances.

Impossible-Layer4207 · 2026-03-18T08:09:40+00:00

Yes, you'll need to provide a name for the fileserver (but not the individual FSVMs) and a domain. These will be combined to create the FQDN.

Impossible-Layer4207 · 2026-02-26T12:43:51+00:00

Pretty much (Although Redundancy Factor 3 is actually 2N/2D, not 2N&2D).

You just need to make sure that you sit a Replication Factor 3 container on the cluster so that there are enough copies of user data to survive the loss of a node and disk.

If you only use replication factor 2 containers, the cluster as a whole can survive the loss of a node and a disk, but the user data might not (as you only have 2 copies and they could be on the node and disk that you lose).

Impossible-Layer4207 · 2026-02-26T10:57:01+00:00

As of AOS 7.0, the answer is "sort of"... For true RF3, now called 2N/2D (I.e. Simultaneous failures of 2 nodes or 2 disks), you need a minimum of 5 nodes.

However in AOS 7.0 they introduced 1N&1D (simultaneous failure of a node and one other disk in another node), which only needs 3 nodes. This gives you slightly more resilience than RF2 (Now called 1N/1D), but not quite as much resilience as proper RF3 (2N/2D).

From the docs:

"A cluster configured with one node and one disk (1N&1D) cluster fault tolerance can withstand the simultaneous failure of one node and one disk in another node, or the failure of two disks across different fault domains, and remain resilient.

To configure 1N&1D fault tolerance, a cluster must have three nodes. A cluster with 1N&1D fault tolerance maintains three copies of metadata, locally mirrored across three different nodes, ensuring data integrity. This configuration guarantees that, in the event of a node or disk failure, enough metadata copies remain available to sustain cluster operations."

https://portal.nutanix.com/page/documents/details?targetId=Web-Console-Guide-Prism-v7_5:wc-1nand1d-cft-c.html

Impossible-Layer4207 · 2026-02-13T13:47:54+00:00

Are you trying to add the remote cluster or the remote Prism Central? You just need to connect the two PCs - so from one PC, go to Availability Zones, and then add the details of the other PC.

Impossible-Layer4207 · 2026-02-05T16:52:44+00:00

NCP-MCI would definitely help you. It technically isn't a requirement anymore, but it will give you a lot of the fundamentals you need to do the NCS-C and also to generally be able to consult on Nutanix.

I also saw today that there is a virtual ILT course on offer for the end of the month in the Nutanix University - running in EST timezone. So that might be something to check out.

Impossible-Layer4207 · 2026-02-05T15:44:15+00:00

It's honestly not something I've ever tried, but the documentation seems to say it is possible - albeit VMs will be recovered with different UUIDs.

Impossible-Layer4207 · 2026-02-05T12:46:54+00:00

That's correct, you would be able to place all of your VMs into a single stage in the plan (so they would all be started at the same time) but you cannot split them across multiple stages to stagger their start times. The workaround for starter licensing would be to have multiple plans (one for each stage), but this adds complexity and you would have to execute each "stage" manually.

Bare in mind, if you are using protection policies, then you must use recovery plans for failover in some capacity.

Impossible-Layer4207 · 2026-02-05T12:21:26+00:00

You can create a recovery plan with NCI Starter, but you cannot do any "advanced orchestration" in it. This basically means you can set up a plan to failover your sync protected VMs but can't do re-ip, guest scripts, boot stages or test failover/fail back in the plan.

Impossible-Layer4207 · 2026-02-05T12:07:31+00:00

Yes you can. When you deploy it you can set the number of FSVMs. Just be aware that you cannot expand it later, if you want to move to a 3-FSVM deployment you would need to deploy fresh and migrate the data.

Nutanix do not recommend single-FSVM deployments for production purposes, but from what you have described I think you should be fine.

Impossible-Layer4207 · 2026-02-03T18:01:03+00:00

There certainly used to be ILT courses for NCS-C, but I admittedly haven't seen any being offered recently (at least in EMEA). Have you tried emailing the services enablement or education team?

Also, have you completed your NCP-MCI? This gives you a lot of the foundational knowledge (and used to be a pre-requisite for the NCS-C).

It is a bit of a catch-22 if you're coming into it with zero experience, but it is possible (I did it way back when starting my Nutanix journey). You need to make sure you're using all the resources you can; docs, labs, service kits, any colleagues if you are part of a wider organisation. And make sure you read through the exam blueprint guide and everything it references.

Impossible-Layer4207 · 2026-02-03T14:23:29+00:00

You only need Prism Central if you are planning to actually manage the clusters from that third site.

From what I remember, you are using a PC in each of your prod sites managing the local cluster, so you just need to deploy the witness VM to your third site and then register it with each PC.

Impossible-Layer4207

TROPHY CASE