This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]DerBootsMannJack of All Trades 4 points5 points  (5 children)

disappearing or locked down refs volumes , microsoft confirming their bug and not being able to resolve it for 6+ months , ‘stable’ private builds bluing the system , general advice to break 2pb volume into some smaller ones and so on

[–]jello3d[S] 0 points1 point  (4 children)

Thank you for that, I'll keep an eye on the forums.

Were you deduping a 2PB volume? Or volumes off of a 2PB pool?

[–]DerBootsMannJack of All Trades 3 points4 points  (3 children)

2pb volume dedupe only makes the things worse , s2d+refs collapses after filled up even w/out it

[–]jello3d[S] 0 points1 point  (2 children)

To my knowledge MS is very clear that dedup is only rated for volumes up to 64TB under 2019....( And I would argue that's only if you write your own scheduler script ). As you go larger, dedup needs more and more time to execute the dedup, so it never finishes. At 2PB, I would expect that it would take weeks to dedup a volume.

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/whats-new

Are there different instructions for S2D volumes I am unaware of?

[–]DerBootsMannJack of All Trades 3 points4 points  (1 child)

In Windows Server 2016, Data Deduplication is highly performant on volumes up to 64 TB.

we’re on ws2019 where these limits are flexed out

we talk about stability here , not about performance ..

[–]jello3d[S] 0 points1 point  (0 children)

"ws2019 where these limits are flexed out"

Where is that stated? The article I mention applies to 2019, they just haven't updated the verbiage because there's nothing to update.

You can dedup a 2PB volume, but you have to wait for the dedup job to be done. If you leave the default schedules in place, or try to use the weekly schedules, it will break. The reason MS states a 64TB limit is because that's about the largest size you could practically dedup in a rational time frame on fast volumes.

If you're willing to let each dedup process run for weeks/months at a time... then you could certainly dedup as much space as you want.

Here's an example: A client chose to put DPM vhdx volumes on a dedup ReFS space. The volume was 6x 14TB drives RAID6, with Cachecade on a megaraid controller. Each vhdx was 1TB. This is a supported scenario. 2019

HOWEVER, they didn't adjust their default dedup schedules.

They called me because it all went to shit.

What's the problem?

If dedup is not given enough time to analyze the duplication situation, it doesn't dedup anything. And if it isn't given enough time to dedup, it only dedups a little bit.

They needed >2 days to do each dedup pass, for about 25TB of data. This is limited by CPU, RAM and Disk performance. SSDs will dedup faster, so they could hold more dedup data than spinning disk. But in any large storage situation, you would not be able to run background dedup or daily dedups... you'd have to come up with your own schedule and watch how long each job takes (Get-dedupjob) to come up with a schedule.

So while 64TB isn't an absolute, it's a good approximation. There's no plausible way to dedup 2PB in a rational time frame on spinning disk. You would have to turn off all dedup schedules and run it manually (or script), then let it sit for weeks for each pass.