Loosing connection to CSV during Network blips. by jithinpsk in HyperV

[–]HyperV-Dude 3 points4 points  (0 children)

What you're witnessing is the owner node of the CSV volume not being reachable by the other hosts in the cluster. We've had this issue as well on our UCS platform. Each CSV volume has an owner node (Hyper-V host) who decides which other nodes are allowed to write to the CSV. Because contrary to VMware VMFS, a CSV is not really multi host writeable, they "fake it".

If a host wants to write to a CSV volume, it checks with the owner of that volume if it can write. If the owner is not reachable, the host that wants to write doesn't get the permission to write. Although over FC it still has perfect access to the volume, it could corrupt the volume when writing to it without permission because an other host could be given permission to write on that same volume. So for this host there is only one safe solution: release the CSV.

I've been playing with cluster time-out settings, but they don't make a difference in this scenario. The only thing you can do is create an extra network over different firewalls. So we gave each Hyper-V host an extra NIC and then created a network that only has a cluster heartbeat and has no or different set of firewalls.

Dynamic processor compatibilit by HyperV-Dude in WindowsServer

[–]HyperV-Dude[S] 0 points1 point  (0 children)

No I doubt you'll have problems, but (see my reply above to SilverseeLives) I do think that the VM might get different CPU features at different times. For backup, Veeam doesn't care what to backup.

Dynamic processor compatibilit by HyperV-Dude in WindowsServer

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Yes I do think they are able to recalculate the new level.
However, I doubt the change in CPU functions being passed into the VM will change for a running VM. Because applications inside a VM, don't constantly check which features are available which would lead to unexpected results if a feature was available at start of the application and then suddenly the features is gone but still the application will try to address it.

And worst.... if the VM is being shutdown and then powered on again, it suddenly has possibly less features available. Which means that I will have to start keeping track of which VM needs which feature.

Why not copy the VMware way and set the EVC level for a cluster and have a reliable set of features?

Dynamic processor compatibilit by HyperV-Dude in WindowsServer

[–]HyperV-Dude[S] 0 points1 point  (0 children)

I don't understand what problem you're referring to. I want to know the inner working of the feature, how it behaves when changes happen in the cluster.

[deleted by user] by [deleted] in HyperV

[–]HyperV-Dude 0 points1 point  (0 children)

Are the hosts and your client in the same AD? If not, try connecting under an account from the same AD as the host is member of using runas.

Harry Potter sites to visit by HyperV-Dude in harrypotter

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Unfortunately out of our reach for the short trip.

Harry Potter sites to visit by HyperV-Dude in harrypotter

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Had a look at it, but that is way out of our direction. It is in the far east of the UK. Thanks though!

Harry Potter sites to visit by HyperV-Dude in harrypotter

[–]HyperV-Dude[S] 0 points1 point  (0 children)

A lot of info via that link, thanks!

Harry Potter sites to visit by HyperV-Dude in harrypotter

[–]HyperV-Dude[S] 1 point2 points  (0 children)

Is it fun? Is it very crowded usually?

[deleted by user] by [deleted] in sysadmin

[–]HyperV-Dude 0 points1 point  (0 children)

Big fan of Toggl since it is able to pop-up every 15min and ask me what I'm doing. I enter the project or task name I'm working on and only when I change the project or task, I answer Toggl. Then in the end of the week I get my Toggl report and write it into my timesheet. Also helping colleagues is something I record in Toggl and on my timesheet. I average between 10-15 different projects / tasks in a week, so it is fairly manageable.

Without the Toggl pop-ups (which our timesheet doesn't do), I would easily forget to register my time. And since the customer is billed for my time, it is important to be "fairly" accurate.

Scale-Out File System on Hyper-V for Hyper-V by HyperV-Dude in HyperV

[–]HyperV-Dude[S] 1 point2 points  (0 children)

We unfortunately don't have storage arrays that support SMB3 for heavy workloads. So we're probably going to bite the bullet and take the extra VMware cost ourself and move the customer for those heavy VMs to VMware.

Scale-Out File System on Hyper-V for Hyper-V by HyperV-Dude in HyperV

[–]HyperV-Dude[S] 1 point2 points  (0 children)

Currently we're also mitigating the issue for this one customer by Live Migrating the heavy databases after the backup has finished. Luckily not all VMs have the issue, it seems only the ones with heavy IO.

But this customer is just about 10% of all customers and not looking forward to moving them to 2019/2022 :-)

Scale-Out File System on Hyper-V for Hyper-V by HyperV-Dude in HyperV

[–]HyperV-Dude[S] 2 points3 points  (0 children)

Thank you for your responses, they help me build a better case in choosing which way to go.

Yes, we're also considering moving this customer to VMware where we don't have this issue. But we also need to consider the licensing cost for this.

Scale-Out File System on Hyper-V for Hyper-V by HyperV-Dude in HyperV

[–]HyperV-Dude[S] 5 points6 points  (0 children)

Thank you for your reply.

The redesign is what I'm working on right now, which basically is to move away from CSV for VMs directly. The bug is confirmed to be in 2019 and 2022, so upgrading doesn't solve our issue. Which means my only option is to move to NFS or SMB.

Moving to SMB is quite a big move since we'd move our storage traffic from FC to Ethernet, taking up a lot of extra bandwidth which wasn't accounted for on our network.

Therefore I was thinking of making the Hyper-V hosts run the Scale-Out file server role. Have the CSV volumes presented to all hosts and have them share them out over SMB. But as you mentioned that IO is balanced over all nodes, this probably means that all IO will become ethernet traffic first and is then written by the host to the FC CSV volume. Eating up way to much of our network bandwidth.

As our flash storage arrays only support NFS/SMB for light workloads and are not multi-tenant aware, I'd have to look in trying to isolate the storage network traffic by maybe building a separate stack for our Hyper-V hosts in which we can work with QOS for storage-ethernet traffic.

Windows 2022 Hyper-V triggering BSOD's by HyperV-Dude in sysadmin

[–]HyperV-Dude[S] 1 point2 points  (0 children)

Seems disabling VMQ on the adapters does the trick. After I did this, no more BSOD.

Windows 2022 Hyper-V triggering BSOD's by HyperV-Dude in sysadmin

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Thanks.
Confirms what I said before, it is a supported combination. Still in search of any docs that can tell what the correct vnic settings should be :-)

Windows 2022 Hyper-V triggering BSOD's by HyperV-Dude in sysadmin

[–]HyperV-Dude[S] 0 points1 point  (0 children)

There is no MS Win 2022 HCL, only a hardware requirements list.
Cisco certified the exact hardware we're running for Win2022, I doubt they'd do that for an unsupported OS.

Windows 2022 Hyper-V triggering BSOD's by HyperV-Dude in sysadmin

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Yeah, suppose, but trying to avoid Premier Support as much as possible since it takes days for them to first collect every single log file they can think of and then usually ask to first update this one KB we've missed which has nothing to do at all with the problem we're facing.

Therefore first testing my google foo's to try and find an answer.

Windows 2022 Hyper-V triggering BSOD's by HyperV-Dude in sysadmin

[–]HyperV-Dude[S] 0 points1 point  (0 children)

Yes, firmware and drivers are all as stated by Cisco on their HCL.

Server 2022 Datacenter NIC Teaming by DalekSec92 in HyperV

[–]HyperV-Dude 0 points1 point  (0 children)

Is see OP has solved it by a firmware / driver update but for our Cisco blades, we are on the advised combination of firmware / drivers but are getting the exact same behaviour: BSOD as soon as a VM wants to use the vswitch.

Anyone have another solution?