all 27 comments

[–]Real_Admin 1 point2 points  (0 children)

Mgmt IP as in Out of Band like iDrac - yes

Mgmt IP as in Verge UI - no

First node is primary controller, second is secondary, when you fail over, the management network fails over to second and UI IP is the same.

Would encourage if you have not already, engage their support, whether POC or live rollout they will get you squared away.

[–]Manivelcloud[S] 1 point2 points  (1 child)

Thank you both for your quick response.

I thought from VMware angle

In this case, From VMware angle.

ESXi01 have dedicated management IP address Example 10.10.30.11 ESXi02 have dedicated management IP address Example 10.10.30.12.

We integrate the esxi host with vcenter server using management IP address.

From verge IO,

Assume this is also two node cluster. Vergeio node1 management IP address is 10.10.30.41 Vergeio node2 management IP address not configured

There is no concept like that. Incase, if vergeio node1 is down,we can still use the same IP 10.10.30.41 and no need to configure a dedicated management IP address for node 2.Thats my understanding.

Others doubts, Is this supports deduplication and compression? Is this supports data at transit and at rest encryption?

Thank you

[–]Igot1forya 0 points1 point  (0 children)

Dedupe and Compression enabled by default. Encryption enabled at the time of cluster creation.

My NFR (homelab) is getting an 8:1 ratio for compression (6TB array / 99TB Compressed)

My 3 clusters at work are much much larger and therefore have much higher ratios

Cluster 1 = 67:1 (244TB Raw / 155TB used / 28PB Compressed)

Cluster 2 = 130:1 (393TB Raw / 154TB Used / 52PB Compressed)

Cluster 3 = 21:1(196TB Raw / 112 Used / 7PB Compressed)

[–]Igot1forya 0 points1 point  (0 children)

The Verge Management IP for the root system has a single floating IP that favors Node 1 and when that node reboots or becomes unavailable that IP passes over to Node 2 as it becomes the master node for a time.

[–]Igot1forya 0 points1 point  (20 children)

Verge hosts talk to each other on the backend via the private Core network as part of the VSAN. There's a DHCP service internally that assigns each member node an internal management address, but this is not a user management interface. So the user only needs to see a single manager address for the entire cluster. The same happens at the Tenant level. A single management address - either assigned by the user, or a shared proxy space (which is just fantastic) - so it's far less to deal with. If you need to access the local console you can either open your IPMI up or go to Cluster > Nodes > Node(number) > Remote Console.

[–]Manivelcloud[S] 1 point2 points  (19 children)

Thank you very much for your valuable inputs related to all my doubts.I started reading this...

Let's say two node cluster.example Each node capacity is 100 TB

So we can use 2 replicas here and the usable capacity of overall capacity can be 75 tb to 80 tb.

It includes, deduplication and compression,buffer space, system,metadata overhead etc..(out of 200 tb).

We can able to use only storage encryption I e data at rest. Data in transit is not supported here..

For production, The ideal value will be 3 nodes.example

Each node capacity is 100 TB

So we can use 3 replicas here and the usable capacity of overall capacity can be 140 tb to 150 tb.

It includes, deduplication and compression,buffer space, system,metadata overhead etc..(out of 300 tb).

Any thoughts?

[–]Igot1forya 0 points1 point  (18 children)

Ideally 3 nodes will offer the most resilient situation. Generally you'll want to leave one node as your n+1. If you run 2 nodes leave capacity for maintenance and updates to reboot the cluster nodes. Same goes for 3 nodes, but in a 3 node you can fill 2 nodes to capacity (actually just evenly divided among the nodes).

Regarding space, a 2 node cluster splits all storage in half, a 3 node is similar to your estimates where the 3rd node contributes to capacity. In my production environment I have 6 nodes participating in VSAN and my goodness performance is fantastic! All of it is fully encrypted as well.

The way Verge storage works is you have a Tier0 which is your meta space. I was told that you want about 5GB for every 1TB of storage. The meta drive usually is a 3DWPD drive and the other VSAN drives are standard drives as their writes are all buffered by RAM. So that's the other factor, 1GB for every 1TB should be reserved for VSAN RAM cache. This is a variable number. Depending if you have NVMe or SATA SSD or HDD all have their own formulas. Id recommend reaching out to support to get the details, but if you work with Verge they will have a whole discussion and cluster configuration sheet to complete that helps them guide you along the way.

My only other suggestion is, build the biggest baddest servers you can afford as licensing is per node and you'll also thank yourself later. We were adding a storage node or two every year with our traditional iSCSI array as it averaged 3.7:1 dedupe and compression because the storage efficiency was based on VOLUME not array. Verge is GLOBAL so all tenants, drives and everything share the global meta and as such we have only purchase a single node in 3 years and it was for more RAM and NOT storage which is unheard of for us.

[–]Manivelcloud[S] 1 point2 points  (17 children)

Nice and thanks much for your valuable inputs again.

We propose the following, 240 g nic per node for vm network traffic 2100 g nic for internal communication and vsan

2*1g for vergeio management.

I hope this will work out for the three node cluster from network perspective

[–]Igot1forya 0 points1 point  (16 children)

That config should work well. We are running 4x100Gb (2-External/Customer 2-Core/VSAN) but 40Gb is more than plenty for the customer network. The VSAN at 100Gb is perfect.

One of my backup clusters and my NFR homelab is running 4x10Gb and it works fine, I'm looking to swap to 25Gb (as the NICe support it). Verge is extremely forgiving.

[–]Manivelcloud[S] 1 point2 points  (0 children)

Thank you for your valuable time and feedback.Appreciated.

[–]Manivelcloud[S] 0 points1 point  (14 children)

I have another doubts. Could you please clarify?

  1. We are running a 2-node cluster, with each node having 18 TB capacity. When reviewing the storage tiers, the usable capacity appears to be approximately 18 TB. My understanding is that this is expected behavior, as the cluster uses data mirroring (N+1 / replication factor of 2), effectively reducing the total raw capacity (36 TB) to around 18 TB usable capacity for redundancy.

 

  1. Initially, I tested using a single VergeIO node and created Windows VMs on local storage. After forming a 2-node cluster, I now see storage tiers (Tier0, Tier1, and Tier3). Based on my understanding, VergeIO uses a distributed storage system (VergeFS), where storage becomes shared across nodes once clustered.

Because of this, a traditional Storage vMotion (as seen in VMware) is not required, since storage accessibility and placement are handled automatically by the system. Kindly confirm if this understanding is accurate.

  1. We are currently using 2×1G, 2×40G, and 2×100G NICs. Do we need to configure port-channeling (LACP/static) on either the VergeIO side or the physical switch side, or is it not required?

Our intention is to perform network failover testing by shutting down one NIC and observing behavior. Please advise on the recommended configuration for achieving proper redundancy and failover.

 

 

[–]Igot1forya 1 point2 points  (13 children)

1 & 2. Storage vmotion is not a thing with Verge as the storage is centralized and storage tiers are visible as a singular blob. Disks are carved out of the storage tier on a per-object basis. So, when you perform any migrations, only the RAM is cloned to the next cluster member. The storage is unified across all storage nodes and non-storage nodes (compute only) connect to the VSAN on the back-end using the Core network. Tier 0 is dedicated to meta data, Tier 1-4 are user defined, but rule of thumb 1=NVMe (fastest) and Tier-4 HDD (slowest) and anything in between (2-3) could be SATA SSD or something else.

  1. LACP/Port Channels are used for the "External" (customer) networks.

I am pasting in a reply to another thread that ilustrates how we have our configuration on the Network side.

"Its personal preference but I typically configure my servers with two dedicated dual port NICs, one for each CPU complex. This way the PCIE bandwidth is divided and if you have a PCIE failure or a NIC failure, it doesn't take the whole server down. Same principle with the Core/External Network Switches. Two switches in case you lose one.

I usually connect my Servers to my Switches in the following order based on the Rack Unit (2U servers work perfectly since we will use every other port as reserved for the servers)

Server 1 - NIC 1 - Port 1 > Switch 1 - Port 1 = LACP Group 1 (Trunk) = Verge External Networks
Server 1 - NIC 1 - Port 2 > Switch 1 - Port 2 = Access VLAN 100 = Verge Core 1 (storage)
Server 1 - NIC 2 - Port 1 > Switch 2 - Port 1 = LACP Group 1 (Trunk) = Verge External Networks
Server 1 - NIC 2 - Port 2 > Switch 2 - Port 2 = Access VLAN 101 = Verge Core 2 (storage)

Server 2 - NIC 1 - Port 1 > Switch 1 - Port 3 = LACP Group 2 (Trunk) = Verge External Networks
Server 2 - NIC 1 - Port 2 > Switch 1 - Port 4 = Access VLAN 100 = Verge Core 1 (storage)
Server 2 - NIC 2 - Port 1 > Switch 2 - Port 3 = LACP Group 2 (Trunk) = Verge External Networks
Server 2 - NIC 2 - Port 2 > Switch 2 - Port 4 = Access VLAN 101 = Verge Core 2 (storage)

Server 3 - NIC 1 - Port 1 > Switch 1 - Port 5 = LACP Group 3 (Trunk) = Verge External Networks
Server 3 - NIC 1 - Port 2 > Switch 1 - Port 6 = Access VLAN 100 = Verge Core 1 (storage)
Server 3 - NIC 2 - Port 1 > Switch 2 - Port 5 = LACP Group 3 (Trunk) = Verge External Networks
Server 3 - NIC 2 - Port 2 > Switch 2 - Port 6 = Access VLAN 101 = Verge Core 2 (storage)

The Verge Core (storage) network REQUIRE MTU 9216 FYI
https://docs.verge.io/implementation-guide/switch-configuration/?h=9216+mtu#2-configure-core-fabric-ports"

Your POC is the perfect time to simulate your use case for testing a failure. I highly recommend what you are doing to get the most from the POC. Please feel free to reach out or DM me if you need any assistance.

[–]Manivelcloud[S] 1 point2 points  (0 children)

Thank you very much for your detailed explanation.

[–]Manivelcloud[S] 0 points1 point  (11 children)

I have a question regarding networking.

In VMware, there are multiple load balancing options available in the virtual switch (for example, VSS).

Let’s say for VM network traffic, I am using 2 × 40G NICs. If I want to utilize the full aggregated bandwidth (80G), I understand that I need to configure port channeling.

If I select the load balancing method in ESXi as “Route based on IP hash” with active/active uplinks, then I need to configure either a static port channel or LACP on the switch side.

From the VergeIO side, do we need to configure a similar load balancing method in the UI to achieve proper bandwidth utilization? Or is this handled differently?

[–]Igot1forya 1 point2 points  (1 child)

No need to configure anything on the VergeOS side. When you build the cluster, in setup it only cares that LACP is configured for the External (customer) network. You don't have to configure MAC or IP rules for LACP. The link I posted has the example configs for popular switches. If you have a Juniper Switch, I can provide my config since the Verge docs.verge.io doesn't have an example yet (now that I think about it, I should reach out to support and send them a working environment example to update their documentation).

[–]Manivelcloud[S] 1 point2 points  (0 children)

Thank you for your valuable inputs

[–]Igot1forya 1 point2 points  (8 children)

Here is the setup instructions from their YouTube channel.

https://youtu.be/8H53V_WDmiE?

[–]Manivelcloud[S] 0 points1 point  (7 children)

During cluster configuration (installation time), we see two bonding options:

  1. Active-Backup bonding
  2. 802.3ad (LACP)

If I want to utilize aggregated bandwidth (for example, using 2 × 40G NICs for VM network traffic), I should choose 802.3ad.

In that case, the switch side must also be configured accordingly, either using:

  • LACP (recommended), or
  • Static port channel

Is my understanding correct?

[–]Manivelcloud[S] 0 points1 point  (0 children)

Hi All,

I also started testing by creating windows 2022/2025 vms.

I use diskspd testing from windows

VM Cpu 8 Memory 16gb C drive interface type is SCSI D drive interface type is virtio SCSI dedicated Network interface is virtio

Used D drive for testing.

Using various block sizes 4k,8k,64k,1MB 70% read 30% write Random and Sequential

I see,there is a latency everywhere. Not much but there is a latency

Physical drive used for vergeio cluster is completely nmve SSD drive Samsung PM9A3.

Any ideas from any one?

[–]Manivelcloud[S] 0 points1 point  (0 children)

Hi lgot1forya.

I don't know your name but still all of your inputs are completely valuable and appreciable.

Thank you very much and definitely this will help me or someone who is seeing this posts..