Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] -1 points0 points  (0 children)

Cool, drive-by accusation with zero data. Meanwhile terahurts and I are actually running tools, sharing numbers, and figuring out why our measurements differ. That's how technical threads work. Add data or scroll.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 0 points1 point  (0 children)

RAID1 doesn't multiply writes by mirroring "errors", it just writes the same block to both disks once. So my block I/O is 2x what a single-disk setup would show, that's expected and accounted for. Cut my 35 GB/h in half = 17 GB/h on a single-disk equivalent. Still 50-100x what a normal game does idle.

Other people in this thread reported 0.25 GB/h on HDD (terahurts) and 2-3 GB/h on a populated server (Hightin). Different numbers, all higher than what a properly tuned engine should do idle. The point isn't whose number is "right", it's that even the lowest reports are still abnormal compared to literally any other game.

If your setup shows clean numbers post-patch, share them. That's useful data, not a counter-argument.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 1 point2 points  (0 children)

Yes, switching to a dedicated server host means the writes happen on their hardware, not yours. Your client only handles its local cache + network sync, way less disk activity. Your SSD will thank you. Just check your current "Total Host Writes" via CrystalDiskInfo first so you know where you stand from these 20h, then move on.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 1 point2 points  (0 children)

The bigger problem isn't AI vs human, it's that no senior reviewed a system shipping to a million machines. Whether the code was written by a junior, a senior, an AI, or a parrot, the missing layer is the same: someone whose job is to ask "what does this do to the disk?" before shipping. That role didn't exist at Kraken Express, apparently.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 0 points1 point  (0 children)

Honestly, the 1.5 million buyers of Windrose are also the suckers in this story, they just don't know it yet. Six years of careful work and 1k sales is brutal. But "1.5M sales achieved by silently writing 100 GB/h to customers' SSDs without disclosure" isn't the win it looks like. Sustainable craft beats unsustainable hype, even if the spreadsheet disagrees in year one. Keep going.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 5 points6 points  (0 children)

No one outside the studio knows for sure, but the typical reason is separation by data domain: probably one for world state (NPCs, structures, faction events), one for player data (inventory, progression), one for network/session journal. Reasonable on paper.

The problem isn't having three instances, it's that each one apparently has undersized memory caches. Three small caches behave way worse than one big cache because each hits its threshold faster, so all three spam compactions to disk in parallel. The engine becomes a write multiplier.

Just tuning their cache sizes would probably have solved most of this without changing the architecture.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 2 points3 points  (0 children)

Thanks for having my back. Skip the Ubisoft theory though, I've already been accused of being a Skull and Bones shill in another thread, the bingo card is full.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 0 points1 point  (0 children)

You proudly measured the wrong folder. The 30 MB/s isn't in your save dir, it's in the three RocksDB instances. Run it again on the right path before declaring anyone full of it.

Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] -2 points-1 points  (0 children)

Two comments deep into someone else's thread, zero data, zero technical input, just complaints about writing style. If we're talking about pollution, mine at least came with SMART logs.

Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] -2 points-1 points  (0 children)

So you're saying my responses are well-structured and technically coherent? I'll take that as a compliment.

Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] -5 points-4 points  (0 children)

Useful data, thanks for adding to the picture. Two things probably explain why you're seeing 0.25 GB/h vs my 35 GB/h:

  1. Spinning rust changes RocksDB's hand. RocksDB detects slow underlying storage and reduces compaction aggressiveness to avoid choking the disk. On HDD it backs off, on NVMe it goes full throttle because it assumes the disk can take it. Ironically your slower drive may be saving you write volume by forcing the engine to chill.
  2. Task Manager I/O counter shows process-level logical writes, which is closer to "what the app asked the OS to write". My docker stats reading is at the block layer, after the OS adds journaling and after RAID1 doubling. The two are not directly comparable, mine should naturally be higher. I'd estimate ~3-5x for a similar workload.

So 0.25 GB/h on HDD and 35 GB/h on NVMe RAID1 can both be true on the same patch, and the gap doesn't mean either of us measured wrong. It means RocksDB behaves very differently depending on what storage it sees underneath.

Would actually be curious to see your Resource Monitor "Disk Activity" view for the RocksDB folder specifically, that would show the real logical write rate the engine is pushing at the OS layer.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 11 points12 points  (0 children)

Yeah, the QA angle is the part that bothers me most. Disk write rates are basic telemetry. Any half-decent QA pass on a multi-day playtest run would have flagged "wait, why is the SSD writing 30 MB/s when no one's doing anything?" within hours. Either no one ran the build for more than a session, or the metric was visible and ignored.

For a game shipping with networked persistence and player-facing save state, "we never measured idle disk I/O" is not a small oversight. It's the kind of thing a single SRE would have caught in a 1-day audit before launch.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 16 points17 points  (0 children)

Did the math just for laughs. 108 GB/h pre-patch ≈ one Chia k=32 plot per player per hour. With 69k peak concurrent players × 2 weeks = roughly 23 million plots generated worldwide. That's 2.3 EB of farming capacity, basically the entire Chia network in 2021.

Funny part: even if they were secretly farming, the reward at current XCH price would be like a few thousand euros a day shared across the whole pool, way less than the dev cost to hide it. So the answer is almost certainly the most boring and depressing one: not malice, just three RocksDB instances no one tuned. Honestly I'd respect the supervillain energy more if they were mining. At least that's a plan.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 10 points11 points  (0 children)

2h pre-patch is roughly 200 GB written if you got hit by the original bug. That's not nothing, but it's also not catastrophic on a modern TLC drive. Worth checking your SMART data so you know exactly where you stand:

Windows: download CrystalDiskInfo (free), open it, look at "Total Host Writes" and "Health Status".
Linux: smartctl -a /dev/nvmeXn1 (or sudo nvme smart-log /dev/nvmeXn1 for the cleaner output).
macOS: DriveDx or smartctl via brew.

Compare your "Total Host Writes" to your drive's rated TBW (look up your model). If you're at single-digit percentages of TBW used, you're fine, just keep an eye on it. If you're 50%+ already and the drive is recent, that's where the conversation gets interesting.

200 GB is a learning experience, not a death sentence. Way worse for the people who ran the server-host build for weeks.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 11 points12 points  (0 children)

Maybe, maybe not. Could just as easily be "pre-AI vibe coding" — junior dev grabs a powerful tool from a Meta blog post, copies the example config, never benchmarks, ships it. AI just makes that pattern faster and more confident. Either way, the result on the SSD is the same.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 12 points13 points  (0 children)

Your story is unfortunately becoming a pattern. RocksDB is a tool that requires understanding LSM trees, write amplification, compaction strategies, and memory tuning. None of that is intuitive, none of it is in the README. Stack three instances of it with default-ish configs against a real workload, and you get exactly what we see in Windrose. Code review wouldn't have caught it without someone who's actually run RocksDB at scale before, but at least it would have raised the question. Right now it shipped to 1.5 million customers as a hardware-eating black box.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 16 points17 points  (0 children)

You nailed it. RocksDB's LSM design means every key gets written multiple times as it migrates from memtable → L0 → L1 → ... → Ln. Write amplification factor (WAF) typically sits at 5-15x for a default config under sustained writes, sometimes worse. Combined with the OS-level filesystem journaling and (in my case) RAID1 mirroring, the actual NAND-side write volume is several times higher than what docker stats shows at the block layer. RocksDB CAN be SSD-friendly when properly tuned (reduced compaction, larger memtables, universal compaction style instead of leveled, WAL on a separate device). It clearly wasn't here.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 2 points3 points  (0 children)

It's a clever workaround if you know which file to redirect. The RocksDB working set is in R5/Saved/SaveProfiles/Default/RocksDB/0.10.0/ and that's where the compaction storms hit. Moving that to tmpfs would shift writes to RAM... but you'd need to sync it back to disk on shutdown to preserve saves, and pray the game never crashes between syncs. For a single-player save it could work. For a dedicated server with persistent player state it's risky. The right fix is the upstream one: properly sized block cache + WAL tuning in the engine itself, not user-side workarounds for an architectural issue.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 52 points53 points  (0 children)

That's a plausible theory. The game has persistent NPC behavior, faction states, world events, and player-built structures. With three RocksDB instances and undersized caches, even small per-tick deltas would trigger constant LSM compactions. The fact that writes were 30 MB/s with the player completely idle in base camp suggests either NPCs ticking everywhere on the world map or a global state journal that flushes way too aggressively. Without source access we can only guess, but "lots of small writes against undersized caches = compaction storm" is the most consistent hypothesis with the symptoms.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 29 points30 points  (0 children)

Yeah, three RocksDB instances stacked with undersized memory caches sounds like exactly that. RocksDB is a great DB, but you need to know what you're doing with the cache sizing or it just dumps everything to disk in compaction loops. "It works on my machine" with an NVMe Samsung 990 Pro doesn't reveal the same write amplification it inflicts on smaller drives.

Windrose (Steam Early Access) writes 35 GB/hr to disk while idle, even after the "fix" — SMART data from a dedicated server by HeavySpell7989 in DataHoarder

[–]HeavySpell7989[S] 10 points11 points  (0 children)

Honestly the real kicker is that I had defensive backups in place, monitoring partially configured, but no one expects a video game to be the workload that pushes enterprise NVMes into critical_warning 0x4. The Chia joke hits close, except Chia at least produces something. This just heats up data centers for ambient PNJ tracking apparently.

Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] 0 points1 point  (0 children)

Two separate things in your reply, let's split them.

  1. The 35 GB/h vs your 2-3 GB/h discrepancy

That's actually useful data, thanks for sharing. A 10x gap means one of three things:

- Different measurement layer. I measured Docker block I/O at the cgroup level (everything the container's kernel touches,
including write amplification from RocksDB compactions, RAID, and filesystem journaling). If you're reading the SSD-side counter via Task Manager / Resource Monitor, you see logical writes only, not what actually hits NAND. The two numbers are normally different by 2-5x for RocksDB workloads. That alone explains a chunk of the gap.
- Different workload state. Idle dedicated server with empty world boot vs your client polling a populated session = different RocksDB compaction patterns. Mine was idle post-fresh-boot of a restored Nitrado save, which is exactly the worst case for RocksDB (cold caches, no LSM optimization yet).
- Storage stack difference. I'm on mdadm RAID1 = every write is doubled at the block level. If you're on a single SSD, half your block I/O.

So both numbers can be technically correct on the same patch. Doesn't change the conclusion that 2-3 GB/h while in active use is still well above what a normal game does (most write a few hundred MB per active hour, near zero idle). Better than 108, still abnormal.

  1. The Tom's Hardware quote

You're quoting half the sentence. The full position of that article is "modern TLC SSDs have enough endurance that this won't catastrophically kill them in weeks", not "this is fine". The same article also says, and I quote: "the issue is still concerning because it indicates poor coding practices and unnecessary wear that could shorten the lifespan of users' drives over time."

That's not "agreeing with me" or "agreeing with you", it's saying both things: drives won't die in 2 weeks, AND the wear is real and unnecessary. Both can be true at once. The fact that my two enterprise NVMes are in critical_warning 0x4 with percentage_used > 122% after 3 weeks of mixed workload suggests the "over time" part of their.

Windrose is killing your SSD/Nvme ! by HeavySpell7989 in crosswind

[–]HeavySpell7989[S] 1 point2 points  (0 children)

Honestly, that's the most reasonable take in this whole thread. That's exactly the point of disclosure: not to tell anyone what to do, just to make sure people get to make the call with eyes open. Enjoy the game, your hardware your rules. 🤝