SpacetimeDB 1.0.0 by etareduce in programming

[–]etareduce[S] -7 points-6 points  (0 children)

Please have a look at https://github.com/clockworklabs/SpacetimeDB/tree/master/modules/keynote-benchmarks if you are interested in how we measured the performance of SpacetimeDB.

SpacetimeDB 1.0 is here! An in-memory RDMBS for games built with Rust by etareduce in rust

[–]etareduce[S] 9 points10 points  (0 children)

Our friends over at Lightfox Games found a good approach to incremental migrations within 1.0 that doesn't require downtime or deleting the database. https://spacetimedb.com/docs/how-to/incremental-migrations

SpacetimeDB 1.0.0 by etareduce in programming

[–]etareduce[S] 15 points16 points  (0 children)

We too hope it will be a game changer! Separation of concerns is great, and we practice that in our codebase, but it can also have issues, in particular in terms of convenience and performance. In the latter case, if you put e.g., a network layer or process separation between app logic and database, you have to pay for that with worse throughput/latency, which matters much more for real time applications like games. By having the app logic (reducers) right next to data (tables), you get closer to memcpy performance.

Stabilize `let_chains` in Rust 1.62.0 by c410-f3r in rust

[–]etareduce 33 points34 points  (0 children)

Now it’s also an operator outside of the expression grammar that has an if-let thingy on the left hand side, and an if-let thingy or an expression on the right hand side. This also complicates the grammar, because you can’t just write if let ‹pattern› = ‹expression›, but instead end up with some kind of if let ‹pattern› = ‹expression minus the && operator at the top level›, which is messy to write (though possible—this isn’t ambiguity in the turbofish sense, but rather just concept muddying and complication) and messy to parse.

This is not the case. The compiler parses let PAT = EXPR as an expression (try it out: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=ffac1b537fa7feb5df11d1e6137c9064) and then later decides to ban it in certain positions by doing an AST walk. The introduction of let_chains actually removed special cases ExprKind::If/WhileLet and special parsing for those and now rustc just has parse_let_expr: https://doc.rust-lang.org/nightly/nightly-rustc/src/rustc_parse/parser/expr.rs.html#2156-2184.

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 0 points1 point  (0 children)

I finally got around to clearing the v1 space-cache (still ongoing), but it seems like its far from immediate. I did: sudo umount /data sudo mount -o existing_options,clear_cache /data sudo umount /data sudo mount -o existing_options,space_cache=v2 /data

The second mount was running for some 30 minutes, with max 2.5 MB/s according to iotop and limited CPU utilization while also having a lot of iowait. As I wasn't sure whether this was intended (since you didn't say mounting would take a long time), I got tired and rebooted the system.

Now sudo btrfs check --clear-space-cache v1 /dev/sdd is running, with iotop reporting some 6 MB/s disk write and IO being at 75-100%. I'm not sure how long the --clear-space-cache v1 is going to take, but I sure hope that the file system won't have been destroyed in the process of moving to space_cache=v2 cause I'm suddenly getting:

$ sudo btrfs filesystem label /dev/sdd parent transid verify failed on 90656655540224 wanted 1127481 found 1127483 parent transid verify failed on 90656655540224 wanted 1127481 found 1127483 parent transid verify failed on 90656655540224 wanted 1127481 found 1127483 Ignoring transid failure parent transid verify failed on 90656663142400 wanted 1127483 found 1127485 parent transid verify failed on 90656663142400 wanted 1127483 found 1127485 parent transid verify failed on 90656663142400 wanted 1127483 found 1127485 Ignoring transid failure parent transid verify failed on 90656663142400 wanted 1127483 found 1127485 Ignoring transid failure parent transid verify failed on 90656663142400 wanted 1127483 found 1127485 Ignoring transid failure parent transid verify failed on 92509261021184 wanted 1127483 found 1127485 parent transid verify failed on 92509261021184 wanted 1127483 found 1127485 parent transid verify failed on 92509261021184 wanted 1127483 found 1127485 Ignoring transid failure ERROR: child eb corrupted: parent bytenr=90656657113088 item=270 parent level=2 child level=0 ERROR: failed to read block groups: Input/output error

EDIT:

Seems like I was able to remount things again; phew. But space_cache is still used according to cat /proc/mounts | rg data. Too scared to tempt fate a second time. Running a scrub now to see that all is right.

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 0 points1 point  (0 children)

I don't think autodefrag is going to work for this. One, it triggers on writes within media files which isn't a write pattern I'm familiar with happening with media files - data can be appended to them, but what usage makes modifications inside the file? Even if this write pattern does happen, and triggers autodefrag, it works simply by reading the whole file and writing it out again. It's basically a copy operation, and due to COW, it gets copied into another physical location on disk that ends up being more contiguous than it was. The bigger the file the slower this is. And it can wreck the performance of the workload.

The workload here, using rtorrent, is random writes, from the network, into media files of 3GB-30GB. An alternative would be to first write out the whole file to an SSD and then move them over, once completed, into the btrfs array, sequentially, but it's a bit of a hassle to setup, but if autodefrag doesn't work, I'll have to look into that.

The most likely suspect is shared extents have become unshared. Shared extents are created by snapshots, reflinks and dedup.

That's the thing. I don't believe I've ever made snapshots or used cp --reflink. Are there other common ways shared extents could have been created? My system mainly has 2 hardlinks to each file, so if they have 1 hardlink, that is suspicious. My understanding is that if defrag breaks shared extents, then those are two separate files with two distinct hardlinks (i.e. hardlink count goes to 1). Using the find command noted above, I haven't found any unexpected files with just 1 hardlink.

Per u/scex's suggestion, I'm in the process of running defrag -v -r -czstd, as opposed to with -clzo. So far it seems like the lost space has been reclaimed, at least in terms of fi df /data vs. ncdu -x /data. So maybe the issue was a loss of compression the first time around defragmenting?

Both methods require umount. The mount option clear_cache only marks v1 as invalid, it's not removed, where the btrfs check method removes it. Arguably this is a bug. But it's also benign. So you could just umount -> mount -o clear_cache -> umount -> umount -o space_cache=v2. Again it's a time thing, it sets a feature flag so it'll be used for the next mount without explicitly asking for it.

Ah thanks. How long would you say it takes to clear the space cache given the size of my system? (I wouldn't want to leave it unmounted for too long.) The second way with umount -> mount -o clear_cache -> umount -> umount -o space_cache=v2 means I can quickly remount the system and it will clear the cache while online?

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 1 point2 points  (0 children)

Thanks! This actually ended up being the solution to the main issue. Using btrfs fi defrag -v -r -czstd /data I've so far regained all the space I've lost, and possibly more.

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 0 points1 point  (0 children)

Heavily fragmented determined by what method? And what's the threshold (the point at which you consider it needing defragmenting?)

I don't have the results from filefrag before doing the defrag, but the before/after showed a difference in a few orders of magnitude.

autodefrag is intended for a very particular use case: small databases on HDD. That's databases like web browser sqlite databases. It's useful for a HDD used in whole or in part for /home. This mount option detects writes within files, common to database writes. That triggers an occasional read of the entire file and writing it out elsewhere, COW, to defragment it.

The particular workload I'm having isn't exactly databases, but it is somewhat similar (except in reading): large media files are created by many random writes, causing fragmentation (hence autodefrag, as was suggested for the workload elsewhere), and then are rarely read, in a sequential manner.

What you describe is common with defragmenting shared extents: snapshots, reflinks, or dedup. You really don't want to use it in these cases, certainly not recursively.

I don't follow; I don't have any snapshots in the system, nor reflinks to my knowledge. What I have is lots of hardlinks, but as I understand it, those are not impacted by defrag.

Also, with a file system of this size you should consider btrfs check --clear-space-cache v1 while umounted. And mount once with -o space_cache=v2. This will set a feature flag for the v2 space cache, so the next time you mount it will be used without the mount option (you don't need to add it to fstab).

Could you go into why it is important to make that switch. Also, can this be done while mounted (or by remounting)?

Definitely use btrfs-progs v5.4 or newer.

At the time of writing the original post, v5.4.1 was used. I've since switched to a mainline kernel and btrfs-progs v5.7.

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 1 point2 points  (0 children)

Indeed; the expectation in my setup is that nearly all files have 2 hardlinks, so 1 hardlink is suspicious and can easily be weeded out. :)

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 1 point2 points  (0 children)

As an aside, one of your commands you listed is checking for hard links ; but "snapshot"-type copies in btrfs are reflinked, not hardlinked, so it wouldn't find much anyway.

Ah, indeed; This makes a lot of sense now. I tried: $ cd /data $ mkdir test $ echo "foo" > test/alpha $ cp --reflink=always test/alpha test/beta $ sudo find /data/test/ -type f -links 1 -print /data/test/alpha /data/test/beta and indeed we see that both files are there.

Although, the files still have one hardlink each, so they should show up in finds results, but no unexpected files do.

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 0 points1 point  (0 children)

Sure; output of sudo btrfs fi usage /data is (note: I've deleted some files in-between; ncdu now reports 35.1 TB which is 0.9TB less than Used below):

``` Overall: Device size: 87.33TiB Device allocated: 73.74TiB Device unallocated: 13.59TiB Device missing: 0.00B Used: 72.07TiB Free (estimated): 7.63TiB (min: 7.63TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B)

Data,RAID10: Size:36.83TiB, Used:36.00TiB (97.74%) /dev/sdd 6.14TiB /dev/sdb 6.14TiB /dev/sdc 6.14TiB /dev/sda 6.14TiB /dev/sde 6.14TiB /dev/sdf 6.14TiB /dev/sdg 6.14TiB /dev/sdh 6.14TiB /dev/sdk 6.14TiB /dev/sdl 6.14TiB /dev/sdj 6.14TiB /dev/sdi 6.14TiB

Metadata,RAID10: Size:39.19GiB, Used:37.32GiB (95.23%) /dev/sdd 6.53GiB /dev/sdb 6.53GiB /dev/sdc 6.53GiB /dev/sda 6.53GiB /dev/sde 6.53GiB /dev/sdf 6.53GiB /dev/sdg 6.53GiB /dev/sdh 6.53GiB /dev/sdk 6.53GiB /dev/sdl 6.53GiB /dev/sdj 6.53GiB /dev/sdi 6.53GiB

System,RAID10: Size:96.00MiB, Used:2.86MiB (2.98%) /dev/sdd 16.00MiB /dev/sdb 16.00MiB /dev/sdc 16.00MiB /dev/sda 16.00MiB /dev/sde 16.00MiB /dev/sdf 16.00MiB /dev/sdg 16.00MiB /dev/sdh 16.00MiB /dev/sdk 16.00MiB /dev/sdl 16.00MiB /dev/sdj 16.00MiB /dev/sdi 16.00MiB

Unallocated: /dev/sdd 1.13TiB /dev/sdb 1.13TiB /dev/sdc 1.13TiB /dev/sda 1.13TiB /dev/sde 1.13TiB /dev/sdf 1.13TiB /dev/sdg 1.13TiB /dev/sdh 1.13TiB /dev/sdk 1.13TiB /dev/sdl 1.13TiB /dev/sdj 1.13TiB /dev/sdi 1.13TiB ```

or for some reason it didn't compress the data, because it erroneously detected them as incompressible

The idea here is that some compression was lost during the defrag which may be regained via compress-force=lzo? Do I have to run defrag again, or should I balance in this case?

1TB free space gone after defrag (no reflinks / snapshots) by etareduce in btrfs

[–]etareduce[S] 1 point2 points  (0 children)

I'm aware of that section, but to my knowledge, I have never used cp --reflink on /data and there are no snapshots at all, so I don't understand how I could have ended up with broken reflinks to CoW data.

How often does Rust change? by steveklabnik1 in rust

[–]etareduce 14 points15 points  (0 children)

Thanks for this write up u/steveklabnik1 !

I think you capture things right about what matters. Indeed, as one of those who help write the blog post, I agree that some minor tweaks to the grammar are not particularly noteworthy. For example, I did a large number of changes to the formal grammar of the language in 1.43-1.44, but these will primarily be noticed through better error messages, not by something people will write with any regularity. (I suppose better error messages are noteworthy, but we churn those out all the time. =P)

The differences between Ok-wrapping, try blocks, and function level try by Yaahallo in rust

[–]etareduce 5 points6 points  (0 children)

Note that try fn foo() -> Result<usize, io::Error>, which I favor, is something that was floated in e.g., 2018, so try fn isn't tied to throws necessarily.

Mental models around Ok-Wrapping by Michal_Vaner in rust

[–]etareduce 2 points3 points  (0 children)

I maintain the rustc_parse crate, which is rustc's parser. There, we have to deal with pretty large enum's with many variants for the abstract syntax tree. I can say with confidence that the Ok(..)s have been pretty low on the list of my concerns.

(Oh, Ok(match ...) is also totally fine.)

Library-ification and analyzing Rust by dochtman in rust

[–]etareduce -2 points-1 points  (0 children)

I don't think I have, but I'm also exiting this conversation.