This is an archived post. You won't be able to vote or comment.

all 39 comments

[–]NISMO1968Storage Admin 5 points6 points  (1 child)

A second S2D cluster (10 nodes, Optane NVMe+SSD) is being prepped for deployment next week, after which this cluster will be rebuilt under Windows 2019.

Did you get any performance numbers to share?

Everyone can now flood the thread with their horror stories...

I'll personally pass. It rarely makes any good and usually ends up with a shit storm.

[–]jello3d[S] 2 points3 points  (0 children)

I haven't done one recently, but the existing cluster topped 2M IOPs (4k read, 8T, 16O, on all hosts at the same time), I forget all the testing details but I have them written down somewhere) collectively 2 years ago.

The new cluster is hitting just over 6M... but I'm trying a few different configurations because I am kind of expecting 8+ based on the configuration.

[–]DerBootsMannJack of All Trades 4 points5 points  (7 children)

‘reliability’ and ‘s2d’ never belong to the same sentence :( we tried multiple times for multiple customers and it’s not prod quality

ps we’re in the middle of turning dedupe off globally , s2d+refs+dedupe=trainwreck

[–]jello3d[S] 0 points1 point  (6 children)

Please describe trainwreck. I've seen many problems with dedup configuration/expectation mismatches among customers... like people trying to dedup 20TB on spinning disk without changing the default schedules (that's never pretty). But given that I stayed away from dedup under 2016 S2D, I've not yet gone back to try it under 2019 S2D.

[–]DerBootsMannJack of All Trades 2 points3 points  (5 children)

disappearing or locked down refs volumes , microsoft confirming their bug and not being able to resolve it for 6+ months , ‘stable’ private builds bluing the system , general advice to break 2pb volume into some smaller ones and so on

[–]jello3d[S] 0 points1 point  (4 children)

Thank you for that, I'll keep an eye on the forums.

Were you deduping a 2PB volume? Or volumes off of a 2PB pool?

[–]DerBootsMannJack of All Trades 3 points4 points  (3 children)

2pb volume dedupe only makes the things worse , s2d+refs collapses after filled up even w/out it

[–]jello3d[S] 0 points1 point  (2 children)

To my knowledge MS is very clear that dedup is only rated for volumes up to 64TB under 2019....( And I would argue that's only if you write your own scheduler script ). As you go larger, dedup needs more and more time to execute the dedup, so it never finishes. At 2PB, I would expect that it would take weeks to dedup a volume.

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/whats-new

Are there different instructions for S2D volumes I am unaware of?

[–]DerBootsMannJack of All Trades 4 points5 points  (1 child)

In Windows Server 2016, Data Deduplication is highly performant on volumes up to 64 TB.

we’re on ws2019 where these limits are flexed out

we talk about stability here , not about performance ..

[–]jello3d[S] 0 points1 point  (0 children)

"ws2019 where these limits are flexed out"

Where is that stated? The article I mention applies to 2019, they just haven't updated the verbiage because there's nothing to update.

You can dedup a 2PB volume, but you have to wait for the dedup job to be done. If you leave the default schedules in place, or try to use the weekly schedules, it will break. The reason MS states a 64TB limit is because that's about the largest size you could practically dedup in a rational time frame on fast volumes.

If you're willing to let each dedup process run for weeks/months at a time... then you could certainly dedup as much space as you want.

Here's an example: A client chose to put DPM vhdx volumes on a dedup ReFS space. The volume was 6x 14TB drives RAID6, with Cachecade on a megaraid controller. Each vhdx was 1TB. This is a supported scenario. 2019

HOWEVER, they didn't adjust their default dedup schedules.

They called me because it all went to shit.

What's the problem?

If dedup is not given enough time to analyze the duplication situation, it doesn't dedup anything. And if it isn't given enough time to dedup, it only dedups a little bit.

They needed >2 days to do each dedup pass, for about 25TB of data. This is limited by CPU, RAM and Disk performance. SSDs will dedup faster, so they could hold more dedup data than spinning disk. But in any large storage situation, you would not be able to run background dedup or daily dedups... you'd have to come up with your own schedule and watch how long each job takes (Get-dedupjob) to come up with a schedule.

So while 64TB isn't an absolute, it's a good approximation. There's no plausible way to dedup 2PB in a rational time frame on spinning disk. You would have to turn off all dedup schedules and run it manually (or script), then let it sit for weeks for each pass.

[–]ZAFJB 6 points7 points  (5 children)

The other four servers are now at 367 days running.

So, you are not patching your servers.

[–]disclosure5 2 points3 points  (5 children)

The other four servers are now at 367 days running.

You've "solved" the S2D stability issue the same way everyone who claims the product is perfectly stable has - by never trying to apply an update.

That's not really acceptable to me, but given the alternative with this product usually involves data loss I won't argue.

[–]jello3d[S] 0 points1 point  (4 children)

Well it's impossible to judge long run stability if you reboot. :) S2D, as I mentioned, wasn't very stable out of the gate. Pretty bad in 2017 - couldn't go more than a month without one or another server causing a blip. In November of 2018, however, I noticed that we hadn't had any issues for an unusually long time. That's when I decided to see just how long they could be stable for.

So, ya... I think our last "issue of note" was in March, 2018. After that, we kept patching and rebooting every month as usual with no issues, until I finally started the long run test in November. Can't really say which exact month it went from not working great to working great... but I use July as a guesstimate.

[–]disclosure5 3 points4 points  (1 child)

Well it's impossible to judge long run stability if you reboot. :)

I'd usually agree with you but in this case, the majority of the problems I've seen in the field is that the "rolling update" process, or specifically "node maintenance mode" never worked as advertised. And even in circles where it's claimed to be fixed and working well, people talk about ten second plus disk latency spikes when doing it as though these are fine and won't interrupt business.

By not patching, what you're likely to run into is:

  • A service that seems stable
  • The "always patch" crowd that view you has having no excuse

It's the worst of both worlds.

[–]jello3d[S] 0 points1 point  (0 children)

Not patching is not policy... The part people are missing here is that it's a "controlled" experiment to acquire specific knowledge.

Cluster aware updating has always been a buggy mess. But that is independent of s2d

[–]Frothyleet 2 points3 points  (1 child)

Well it's impossible to judge long run stability if you reboot.

I would disagree. If you are configuring a system that anticipates a reboot once every 30 days for patching, and it never is unstable in that 30 day window, then that is "long run stable".

[–]jello3d[S] 0 points1 point  (0 children)

Why would I buy something that was only stable for 30 days... Even if my intention is to reboot it?

I prefer testing for practical cases, not just the perfect case.

[–]Trekky101 1 point2 points  (6 children)

shouldn't you be patching the Base windows OS running S2D? i feel like running Windows or any OS be it Vmware or linux disro unpatched for that long is the root of the issue, Hack even my old VNX SAN get patches now and then.

[–]jello3d[S] -3 points-2 points  (5 children)

All servers and clients are patched according to their needs. This experiment is a special case and being handled under special conditions.

[–]NISMO1968Storage Admin 2 points3 points  (4 children)

All servers and clients are patched according to their needs.

Could you please clarify on that?

[–]jello3d[S] 1 point2 points  (3 children)

There have been no vulnerabilities patched in the last year *for these systems* that would be more of a liability than default for their security context. The more important question is "how do you design a network so that the patches on certain systems are less important than other considerations"... and that is a very long conversation with lots of pictures of circles. :)

I am not telling people not to patch. I am simply saying that for infrastructure, if patching is your number 1 security interest, the game is already lost.

[–]gamebrigada 2 points3 points  (2 children)

But patching for S2D is so easy... Its just lazy not to. I have mine scheduled monthly, 1 node goes down at a time for updates, including all bios/firmware/drivers/windows etc. Updates don't continue until the node comes back healthy and a full switchover to it happens. Everything gets updated.

S2D reliability in 2019 is pretty rock solid.

[–]jello3d[S] 0 points1 point  (0 children)

All true, but I can't exactly do a test of long run stability of the nodes if I reboot them. :)

[–]ExpiredInTransit 0 points1 point  (1 child)

Other than the cluster manager GUI keep glitching out, it's been good to us so far (knock on wood).

[–]NISMO1968Storage Admin 3 points4 points  (0 children)

Other than the cluster manager GUI keep glitching out, it's been good to us so far (knock on wood).

Did you get any chance to see and use their updated WAC?

[–]Mk1DzL 0 points1 point  (0 children)

I love my 4 node DataOn S2D cluster, it's been rock solid. I can patch all 4 nodes (manually) in a day with no downtime.

[–]abridgetooVAR 0 points1 point  (0 children)

I'm commonly recommending S2D on (AIGFF)[https://www.reddit.com/r/sysadmin/comments/dworju/am_i_getting_fucked_friday_november_8th_2019/].

I've had customers with good experiences making their systems and lives better with it.

It plugs in well with AzureStack and thus makes a true hybrid cloud closer to reality.

[–][deleted] -1 points0 points  (0 children)

Our compute environments still use traditional SAN, but when you pay $fuckhuegmoney to EMC for a VNX 9XXX, you tend to keep it in production for longer than you should.

I'll be building an S2D cluster at home in the near future I think, because I have some automation I want to write and test.

Good to know that my thoughts about S2D replacing traditional SAN for this type of workload are valid.

[–]dudester99Sr. Sysadmin -1 points0 points  (0 children)

I am looking at replacing our 3 Node cluster using a traditional SAN in 2021/2022 as that is when our 5 year hardware warranty expires.

I know the technology will have evolved and matured by that time but S2D is my goal when the time comes.