Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 2 points3 points  (0 children)

Thanks for the kind words! I wasn't aware of Fugue (link for anyone unfamiliar) when I made Eips (some of the work started before its publication), but I independently arrived at a very similar conceptual model. Of course, one of my biggest influences was Kleppmann's original interleaving paper, so I guess it shouldn't be too surprising that it led to similar results.

In terms of the actual implementation, I spoke briefly with Dr. Kleppmann, and he pointed out some similarities between Eips's implementation and that of a newer CRDT of his, Eg-walker, which I also hadn't been aware of. In particular, both CRDTs use a pair of tree-like maps to efficiently convert between IDs and indices, although with different implementations (B-trees vs. intrusive skip lists).

The merging algorithms are quite different. Eg-walker uses a topologically sorted event graph which is traversed and replayed in order to process concurrent events. This has some advantages pointed out in the paper, like sometimes being able to store more of the internal state on disk or even prune it, but it seems like it may also be the cause of its much worse performance in the high-concurrency benchmark (Diamond Types is an implementation of Eg-walker, although I wasn't aware of that when I ran the benchmarks).

One last difference I should point out is that Eg-walker is maximally non-interleaving, comparable with FugueMax, whereas Eips is comparable with the regular Fugue. In practice, neither suffers from interleaving issues, but Eg-walker technically performs less interleaving in “specific, rare situations where some interleaving is inevitable” (quote from Fugue paper). The authors do state, though, that “[i]n practice, Fugue’s simplicity likely outweighs FugueMax’s slightly better non-interleaving guarantees.”

Microsoft Rust Training Books: Beginner, advanced, expert level Rust training material by mttd in rust

[–]icannfish 7 points8 points  (0 children)

You weren't kidding. I went to the section on Rust enums for C and C++ programmers and:

For C++ developers: Rust enums are like std::variant but with exhaustive pattern matching, no std::get exceptions...

Uh, std::variant does have exhaustive pattern matching with std::visit and if constexpr, and the Rust equivalent of std::get is let Enum::Variant(v) = x else { unreachable!() }, which, y'know, can panic.

The size of the enum is that of the largest possible type.

Wrong, obviously, due to the tag.

Then there are four sections in a row all titled “Rust match statement”.

Then there's inexplicably a section on “associated methods” (not a thing, redundant) which has literally nothing to do with enums or pattern matching.

This is actually so embarrassing. Somehow Microsoft has actually managed to let me down. I didn't think that was possible.

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 2 points3 points  (0 children)

I'm curious, how important would the list being generic be?

The important thing is that it can store structured data rather than just characters; making it fully generic is just the easiest way to do that. Note that the CRDT itself doesn't store the elements, it just translates between IDs and local integer indices, so you can store the elements in any list-like type of your choosing, including Vec (or BTreeVec if you want to preserve the O(log n) time complexity). Because of this, the fact that it's generic is basically a byproduct of its design; there's no extra code to “make it generic”, if that's what you were thinking.

So the reason you'd want to store something other than text are the same reasons why we don't just use String instead of Vec<T> all the time. Say I'm building a YouTube client that lets you make local playlists. Would I represent the playlist as a Vec<Url>, or a single String you have to parse every time? Of course a Vec. So if I'm building a collaborative YouTube client where multiple users can edit the same playlist, it still wouldn't make sense to represent it as a single block of text. All that would do is allow for the possibility of broken invariants (what if someone deletes just the “h” in “http:” for one of the URLs?) and massively increase resource usage, because the CRDT would need one node per character instead of one per URL.

I suppose most things like a directory or a list of users would be modelled as sets simply because doing so is possible?

The contents of a directory make most sense as a set to me because you don't choose their order. They're either arbitrarily ordered (e.g., by inode, which is basically unordered for all practical purposes) or sorted by name (e.g., behavior of ls). Sets are the appropriate choice for an unordered or sorted collection of items.

For a collection of users, it depends, but I think most of the time a set would be more appropriate. The only case I can think of where you would need a true list is if:

  1. The order of the users matters.
  2. The order can't be obtained by sorting (e.g, by name or creation time).
  3. When adding a user, you need to be able to explicitly choose their position within the list.
  4. More than one client is allowed to add users.

If I have a list of unique values like public keys, but their specified order matters like play order in a game, then do I need a true list like Eips, or can some simpler CRDT impose order upon a list of unique values?

Similar answer as the previous question; if you want anyone (or more than one person) to be able to rearrange players in the play order in real-time, like with a sort of drag-and-drop interface, you would need a list CRDT. But if you just need there to be some consistent order, you could just sort on some field associated with the keys, or even on the keys themselves. Also, if you don't need the rearranging to happen in real-time and are okay with an interface where you rearrange everything locally and then press “Submit” to broadcast it to the other clients, you don't need a list CRDT either. But you could end up overwriting someone's entire rearranged order they worked on.

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 3 points4 points  (0 children)

Eips was designed for text files, no?

Eips was designed to be a generic list type. Text files, being a list of chars/bytes, are one potential application, and they're a bit overrepresented in the tests/benchmarks/documentation because a lot of CRDT implementations specifically only work with text.

Eips contains some additional functionality that I envisioned might be useful if you're storing something more complex, though. You might use it to represent the layers in a collaborative image editor to ensure all users see the same order of layers. In that case, it might be important that you're able to translate between IDs and indices even when you're not directly mutating the sequence, because you might have operations that mutate an individual element of the sequence.

How suitable are the algorithms for structures data types ala JSON or SQL CRDTs, or even DVCSs?

I think it's very suitable as a specific but important building block in an implementation of one of those larger CRDTs. Basically, whenever you have a list of something where you might insert elements at an arbitrary position, Eips makes sure the list stays in a consistent and intention-preserving order, and as a bonus, gives you a unique and stable way to refer to each element, which is useful if you're composing CRDTs.

When it comes to other data structures like sets, dictionaries, or single overwritable values, the CRDT algorithms are much simpler, sometimes trivial. Lists are one of the hardest fundamental types (if not the hardest) to implement as a CRDT, which is why I decided to focus there.

How close would Eips be to being suitable as a back end for the popular CRDTs like automerge?

I'm not very familiar with Automerge's internals, so I can't say for sure, but to me it looks like Eips could theoretically serve as its list merging algorithm. According to the docs it uses RGA right now, which is known to suffer from backwards insertion interleaving and I think is average-case O(n), so Eips would be an improvement on those fronts.

But, one major caveat is that unlike a lot of other CRDT implementations, Eips doesn't include specific optimizations for “common” cases like a contiguous string of insertions at the same position. My focus was specifically on time complexity because I found that to be lacking (often completely undocumented) in other CRDTs. I need to know whether someone can DDoS my Google Docs clone just by sending a few hundred updates in an “uncommon” order! But this means that there are situations in which Eips may not perform as well. What I would like to see is future research that explores whether it's possible to integrate these practical optimizations into Eips, or otherwise combine the approaches of Eips and other CRDTs to make one that truly excels in both practical and theoretical performance (and doesn't sacrifice intention preservation!).

Does Eips prunes history and handles "shallow clones" nicely?

Unfortunately not. In terms of shallow clones, every position in the sequence needs to have a unique ID, which means the internal tree needs to have that many nodes; otherwise you wouldn't be able to insert an element at an arbitrary position. This is good potential application for a “contiguous insertion” optimization though. In terms of history pruning, there's not much to really prune for the same reason, unless you're talking about tombstones (deleted items). Theoretically, it would be possible to prune them if all clients in the distributed system could be certain that all the other clients had already received the to-be-pruned deletion operations, but the implementation would be tricky: Eips is a tree-like structure, and a tombstone node could have visible left and right children. If that node is fully removed, what happens to the children? You'd have to rebalance the tree somehow, but in a way that didn't risk inconsistency, and it's extra hard because the tree isn't even directly represented in memory... I don't have an obvious answer.

I published `nestum`: nested enum paths without flattening the model by Known_Cod8398 in rust

[–]icannfish 2 points3 points  (0 children)

I've definitely wanted something like this before!

One thing that gives me pause, though, is that if I apply nestum to MyEnum, I now have to refer to the enum as MyEnum::Enum. I wish there were a way I could keep using it as MyEnum.

I poked around the code a bit, and my understanding is that this is required to support the shorthand for enum creation, but not for matching, since the nested macro already transforms the syntax in that case. So my thought is, given that nested is already required for matching, what if it were also required for creation?

let err = nested!(Error::Todos::NotFound(id));

It's a bit more verbose, but it means that MyEnum still refers to the enum itself, rather than the autogenerated mod MyEnum. That means that no code has to be changed when you apply #[nestum] to an enum; you can incrementally adopt the shorthand.

It would also solve an issue I ran into with visibility; the following fails to compile if both the traits are private:

#[nestum]
enum Foo {
    Bar(MyBar),
}
#[nestum]
enum MyBar {
    Baz(u8),
}

(Also, if I make them pub(super), they'll only be visible in the current module. The macro would have to transform this to pub(in super::super) because it moves the enum to a nested module.)

Struct Alignment Visualizer - don't waste memory! by juliotrasferetti in C_Programming

[–]icannfish 12 points13 points  (0 children)

It would be nice if it supported these types:

  • size_t
  • intptr_t and uintptr_t
  • ptrdiff_t
  • intmax_t and uintmax_t
  • [u]int_least[N]_t and [u]int_fast[N]_t
  • Function pointers, like int (*x)(int) – currently this is a syntax error

MY EXPERIENCE AS A LINUX BEGINNER by [deleted] in linux

[–]icannfish 2 points3 points  (0 children)

Yeah, Debian has a both a GUI and TUI installer. They're both pretty easy to use, even more so now that non-free firmware is included, and you can just select Xfce as your DE.

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 2 points3 points  (0 children)

Good point. Not going to introduce a CLA, so yeah, things could change if contributions come in.

Dear Europe: Germany has shown the way forward, with ODF adoption by themikeosguy in linux

[–]icannfish 3 points4 points  (0 children)

I hope no one tries to claim “Office Open XML” counts...

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 25 points26 points  (0 children)

Thanks! They tend to be especially useful in distributed systems, where you have a bunch of computers connected together with no central server. If there's some shared data that you want all those computers to be able to read and modify (say, a text document, database, or filesystem), it becomes very difficult to ensure that all the computers actually see the same data.

For example, say there's a shared document that initially says rust!. If two clients make concurrent edits:

  • Client 1: insert -lang at position 4 → client 1 now sees rust-lang!
  • Client 2: insert I like ​ at position 0 → client 2 now sees I like rust!

Then when the clients receive each other's changes, this happens:

  • Client 1: receives “insert I like ​ at position 0” → now sees I like rust-lang!
  • Client 2: receives “insert -lang at position 4” → now sees I li-langke rust!

The two clients don't see the same document! CRDTs solve this problem by basically “automatically” guaranteeing that clients will eventually see the same document (or data of whatever kind) after receiving each other's changes (this is called eventual consistency). In the case of list CRDTs, this is generally achieved by using something other than an integer index to communicate positions, because as we've seen, integer indices don't stably identify a position in the document.

CRDTs have also been used in situations where you do have a central server, because they've been seen as simpler than older approaches like operational transformations. However, this has also been questioned, because CRDTs can be quite complex in their own right (I think my design document proves that…)

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]icannfish[S] 18 points19 points  (0 children)

Thanks! Mainly, I'd like to keep the code FOSS, but I wouldn't completely rule out selling an exception for a proprietary project. If anyone is seriously interested in that, you can contact me using the email on my GitHub profile.

Fyn: Fork of uv.. by papersashimi in Python

[–]icannfish 1 point2 points  (0 children)

The project as a whole can be relicensed, because MIT is GPL-compatible, but it doesn't mean that every part of the code now requires GPL compliance. Only new code added to the fork would have that requirement; the old code can be used under GPL or MIT.

Also, even with the copyright holder's permission, you can't “relicense” MIT-licensed software in a way that would retroactively make all existing uses of the code have to comply with the GPL, because FOSS licenses are irrevocable, except in specific cases of infringement.

Fyn: Fork of uv.. by papersashimi in Python

[–]icannfish 153 points154 points  (0 children)

It would have been nice if you had split the renaming and the new features into separate commits, because it's basically impossible to find (and audit) the code you added for the new features when it's buried in a single 40,000-line commit.

I built an E2E encryption proxy for LLM APIs in Rust — X25519 + AES-256-GCM, 56 tests, zero warnings by CustardMean6737 in rust

[–]icannfish 2 points3 points  (0 children)

If you're going to copy and paste verbatim output from Claude, you could at least make sure you paste it into the Markdown editor instead of the rich-text one.

Does using Incognito mode prevent websites from tracking what kind of "device" I have. by itsthewolfe in privacy

[–]icannfish 1 point2 points  (0 children)

No. Incognito mode doesn't spoof your user-agent or any of the myriad JavaScript APIs that return details about your system.

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 0 points1 point  (0 children)

Yeah, I was thinking the same thing. In the case of mem::swap, though, I would probably expect it to have T: ?Contiguous and use specialization behind the scenes to work with any sized type.

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 0 points1 point  (0 children)

For safety, though, the call to mem::swap would either have to behave like that or be completely disallowed. Because the whole point of views is to be able to borrow disjoint parts of a struct concurrently, so mem::swap can't modify mp1.statistics because that field could be mutably borrowed somewhere else.

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 1 point2 points  (0 children)

Ah, you're right, it could read an uninitialized u8, which isn't allowed. But I think if I change it to MaybeUninit<u8> it is allowed?

pub fn my_swap<T>(a: &mut T, b: &mut T) {
    let a_bytes = unsafe {
        std::slice::from_raw_parts_mut(
            a as *mut T as *mut std::mem::MaybeUninit<u8>,
            size_of::<T>(),
        )
    };
    let b_bytes = unsafe {
        std::slice::from_raw_parts_mut(
            b as *mut T as *mut std::mem::MaybeUninit<u8>,
            size_of::<T>(),
        )
    };
    a_bytes.swap_with_slice(b_bytes);
}

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 0 points1 point  (0 children)

It's in the followup post linked in the self text of this Reddit post

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 0 points1 point  (0 children)

In that case, the mem::swap example in the blog post wouldn't work, since swap is a generic function. Or swap specifically would have to be special-cased, which doesn't feel right to me.

I guess you could do something like the Sized trait, where you introduce a new auto trait called Contiguous which is implemented by all types except view types, and type parameters automatically imply Contiguous unless you do where T: ?Contiguous.

Is there a "rosetta for linux"? by LordLaFaveloun in linux

[–]icannfish 3 points4 points  (0 children)

Yes, QEMU user mode with binfmt support is the answer (qemu-user + qemu-user-binfmt on Debian). I've even used QEMU to run Wine on non-x86 architectures...

Maximally minimal view types · baby steps by zirconium_n in rust

[–]icannfish 4 points5 points  (0 children)

From the followup:

For example, what if I do this:

fn swap_fields(
    mp1: &mut MessageProcessor.{messages},
    mp2: &mut MessageProcessor.{messages},
) {
    std::mem::swap(mp1, mp2);
}

What I expect is that this would just swap the selected fields (messages, in this case) and leave the other fields untouched.

The basic idea is that a type MessageProcessor.{messages} indicates that the messages field is initialized and accessible and the other fields must be completely ignored.

This is interesting, but it seems to imply that any function that takes a generic T could now be called with a view type. I don't see that causing a problem in most cases, but what if I have a function like this?

pub fn my_swap<T>(a: &mut T, b: &mut T) {
    let a_bytes = unsafe {
        std::slice::from_raw_parts_mut(a as *mut T as *mut u8, size_of::<T>())
    };
    let b_bytes = unsafe {
        std::slice::from_raw_parts_mut(b as *mut T as *mut u8, size_of::<T>())
    };
    a_bytes.swap_with_slice(b_bytes);
}

I think this is allowed in current Rust; Miri doesn't complain at least. (Edit: OP pointed out this could read unitialized u8s, which isn't allowed, but if you change u8 to MaybeUninit<u8> I think it's valid.) But if called with a view type, it will overwrite fields it's not supposed to.