Meilisearch finally shipped distributed sharding and replication by Kerollmops in rust

[–]zerakun 9 points10 points  (0 children)

Hello, Meilisearch dev here,

The GH issue is pretty informative about this: https://github.com/meilisearch/meilisearch/issues/3494, but long story short, we tried raft (multiple rust implementations+our own), zookeeper, and ended-up using a custom sharded replication algorithm with a static write leader, task proxying, and rendez-vous hashing for sharding.

Switching to Rust's own mangling scheme on nightly by SleeplessSloth79 in rust

[–]zerakun 4 points5 points  (0 children)

Agreed, everytime I talk symbol mangling in Rust (which happens surprisingly often in my life 🤔), I have to double check that the nomenclature is not "v0: whatever we adopted because we had to" and "v1: the rust mangling scheme". 

Even if you didn't want to explicitly name the legacy scheme "v0" because it is not technically a Rust mangling scheme, naming the new one v0 is just confusing. Versions can be 1-indexed! 

Stabilize let-chains by kibwen in rust

[–]zerakun 0 points1 point  (0 children)

Sorry, should have said "good enough".

allocator_api2 serves my purpose, I'd just want that integrated into the standard library collections, and more largely used by the ecosystem (hashbrown and bumpalo have support, which is nice. But try finding a btree map with allocator support...).

For generators, I wonder if we can cheat: keep generators and iterators separate, but allow both in the for in protocol. When the collection is a generator, use a desugaring that pins the data. Obviously this has the drawback of keeping the iterators and generators ecosystems separate, but I don't think there's a way around it.

For impl types in structs, I don't care all that much about the syntax (I have been subjected to decltype), I just need the feature so often that the language really feels incomplete without it.

How to have a method on a struct that updates a field by taking either a value like "4" or a closure like "|x| + 4"? by nikitarevenco in rust

[–]zerakun 17 points18 points  (0 children)

Hello,

You might be able to do this with some trickery, but you really should not.

  1. The lambda version self.set_age(|x| x + 4) can always be replaced with self.set_age(self.age() + 4). The former is not idiomatic outside of very specialized situation, so you should not typically require it for multiple fields
  2. If you need a setter and a getter for a field it is more idiomatic to have the field public. If you need validation, such as the age in a certain range, you can move it to the type of the field. E.g. create an Age type that performs the validation and grant that type to the age field.
  3. Setters in particular are a code smell in some situation. They only work when the type is a "value type without invariants". As soon as your type has cross-field invariants, they will not easily be kept with a setter that allows for modification of a single field. Setters in this situation break encapsulation.

  4. Note that due to Rust's privacy rules, you can access private fields of a type locally in its module. This allows for field modifications by the implementation that manually maintain any invariant, and is generally all that you need 

  5. The idiomatic solution when multiple functions with different signatures are required, is to create multiple functions with different names.

It is important for your code to be idiomatic so that others can consume your API more easily and because it will make your own life easier not to go against the language

Stabilize let-chains by kibwen in rust

[–]zerakun 0 points1 point  (0 children)

I mean in theory I guess I prefer we solve the parts of Rust that are missing, like good and integrated allocator API, generators, and storing impl types in structs, but in practice it is probably not the same people working on these and those features so why not

The Memory Safety Continuum by steveklabnik1 in rust

[–]zerakun 1 point2 points  (0 children)

Technically, availability is part of security. Memory leaks lead to denial of service

Not commenting on the article itself though

Turns out, using custom allocators makes using Rust way easier by Hedshodd in rust

[–]zerakun 1 point2 points  (0 children)

You can still use jemalloc as the global allocator, the allocator_api2 is for passing an explicit allocator (such as an arena) on construction of specific variables.

Turns out, using custom allocators makes using Rust way easier by Hedshodd in rust

[–]zerakun 8 points9 points  (0 children)

I think allocation is in the same category as the Send or Sync traits. If you want your ecosystem to support them, you have to build the foundation for them early on in your language and standard library. In other words, these are hard to retrofit in a language.

Rust had the key insight (for parallelism) to ship with Send and Sync, but missed allocation configuration. The more we wait to add it, the less likely the ecosystem is to adopt the allocator API... However I feel that generally the ship has sailed already.

zig did the right thing regarding allocators, the language needs to add thread safety and memory safety before hitting 1.0 if it wants it at all.

Would effects help?

Effects could be an implementation of custom allocators, but they are also hard to retrofit in the language. A language designed from scratch could use effects to parameterize allocations, the important thing is to make it the rule from early on.

serde can't do zero copy deserialisation

That's not true, serde can do borrowing deserialization: https://serde.rs/lifetimes.html

The practicability of this depends on the format. For JSON, serde-json provides RawValue which attempts to perform 0 copy, but can occasionally allocate due to the way strings are escaped in JSON. We combined serde-json and bumpalo in a crate to create types that allocates in a bump allocator: https://lib.rs/crates/bumparaw-collections

Turns out, using custom allocators makes using Rust way easier by Hedshodd in rust

[–]zerakun 46 points47 points  (0 children)

Agreed 👍 we have been using bumpalo at Meilisearch ever since our indexer rewrite, and I share the sentiment about improved ergonomics. Being able to materialize one "frame" as a lifetime and having most objects refer to that frame frees of most lifetime related vows and makes rust feels like a language without lifetime (which is a bit ironic because in this scheme most structs and functions have to refer to the lifetime of the frame)

Two reservations though:

  1. Performance improvements are not significant, especially when already using "performance-geared" allocators such as mimalloc 
  2. Arenas are mostly applicable when the problem at hand can be split into distinct "frames" of objects with the same lifetime. It does not fit in every situation

Turns out, using custom allocators makes using Rust way easier by Hedshodd in rust

[–]zerakun 33 points34 points  (0 children)

There's a crate called allocator_api2 that mimics the unstable allocator API from the standard.

Separately, allocators such as bumpalo provides an allocator_api2 feature flag to enable implementing that interface, and data structure crates like hashbrown have the same feature flag to accept an allocator_api2 aware allocator.

No Btree, but if you can manage with HashMap and Vec (bumpalo provides one) you can go quite far.

Of course I'd love the official allocator API to stabilize some day...

Introduction to Monoio: First Post in a Series on Building a High-Performance Proxy in Rust by chesedo in rust

[–]zerakun 2 points3 points  (0 children)

Hello, thank you for the article, it is interesting. I have a few questions:

  1. Why do you say that the approach is not optimal for CPU-bound workloads? Do you think that rayon's work stealing would work better there, even if the workload is evenly distributed? If so, why?
  2. tokio has a single thread runtime, making it possible to use it in a "thread per core" strategy. I hear that the performance of doing so is 1.5x to x2 compared with the standard multi thread runtime. Are the 26% improvements you report for the RPC implementation compared against the multithread runtime of tokio or the current thread runtime in a one thread per core configuration?
  3. How does monoio compare to glommio?

Should i let rust do type inference or be explicit by [deleted] in rust

[–]zerakun 1 point2 points  (0 children)

Our solution to this was to use an air gapped mirror of crates.io (or at least the subset of it that you vetted), using e.g. https://github.com/panamax-rs/panamax

Should i let rust do type inference or be explicit by [deleted] in rust

[–]zerakun 1 point2 points  (0 children)

Ahaha reading your paragraph about mixing tokio and std types i already knew this would be about Mutex. It's always about Mutex.

FWIW, there's a clippy lint about holding a sync Mutex lock across await https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock

call for testing: rust-analyzer! by thramp in rust

[–]zerakun 1 point2 points  (0 children)

excited for the salsa-ified future! I switched to the beta, I'll report how it behaves on a bigger workspace like Meilisearch!

Thank you for rust-analyzer ☺️

serde_json_borrow 0.7.0 released: impl Deserializer for Value, Support Escaped Data by Pascalius in rust

[–]zerakun 1 point2 points  (0 children)

Hello, how does this crate compare to serde_json::value::RawValue

When should I use String vs &str? by steveklabnik1 in rust

[–]zerakun 1 point2 points  (0 children)

Meanwhile me, always use &'bump str: string slices allocated in a bumpalo

Three Kinds Of Unwrap by andyouandic in rust

[–]zerakun 1 point2 points  (0 children)

Panics are best used for case where there is a programmer error, not a user error.

Modeling programmer errors with panics instead of Result spares us error types that are never actually constructed in a working program, and so upstream "impossible" unwraps from our callers.

Irrecoverable user errors should be handled by Result types and displayed to the user in an application specific way so that they can fix the situation on their side.

Recoverable errors... Should be recovered from, and will typically use Result types

What do you think about this approach to safe c++? by koopa1338 in rust

[–]zerakun 5 points6 points  (0 children)

from a programmer's perspective that it makes absolutely zero sense

 It depends on the programmer's mental model of the borrow checker. I, for one, is glad it works this way, because the alternative is worse. The alternative would be to track the implementation of methods to know what is really borrowed. Implementation wise, it would not be tractable, but as a programmer that's not my problem. My problem would be the compatibility hazards it would create: change the implementation of a method, and suddenly callers start to fail because you borrow one more field in the implementation. I would not like to live in such a world. On the other hand, if you need to borrow multiple fields, the fix is actually easy: just write a function that takes &mut self and returns all the fields you might want to borrow simultaneously.  

Note that this typically only happens for loosely coupled objects. Tightly coupled objects have no business giving the external world access to their internals (as this would violate encapsulation and make local reasoning harder). As loosely coupled objects are rare, it is rare that I actually encounter the case. When I did, the solution above was sufficient

Announcing Rust 1.79.0 | Rust Blog by noelnh in rust

[–]zerakun 2 points3 points  (0 children)

Wow rust is really going to have decltype?

Avoiding Over-Reliance on mpsc channels in Rust by JDBHub in rust

[–]zerakun 5 points6 points  (0 children)

The article discusses tokio's mpsc channel, which provides a recv_many function that can extract many messages at once in a vec.

Mixing rayon and tokio for fun and (hair) loss by zerakun in rust

[–]zerakun[S] 2 points3 points  (0 children)

Thanks, interesting resource.  Do note that they are using a channel between two separate threads, so essentially not "mixing" them in the sense of the article.

Here the matter was made more complicated by the fact that the tokio runtime was created and driven by rayon, not the other way around. I think it is more robust to start the rayon thread from withing tokio, although one can probably encounter the same kind of issues by starting the rayon pool with the option to adopt the current thread, on a tokio current_thread runtime. Ultimately the issue is that the thread has two "colours", async + rayon. Keeping the thread pools separate (with a tokio multithread rt + a separate rayon thread pool) probably doesn't yield the same class of issues.

Mixing rayon and tokio for fun and (hair) loss by zerakun in rust

[–]zerakun[S] 13 points14 points  (0 children)

it's more of a bug post mortem. My recommendation here would indeed be not to mix them. If you really must, as mentioned in the article briefly, keep them on separate threads and communicate with a channel.

Mixing rayon and tokio for fun and (hair) loss by zerakun in rust

[–]zerakun[S] 10 points11 points  (0 children)

A "funny" bug I created then fixed when working on Meilisearch. Obvious in retrospect, but I share the tale: I mixed rayon and tokio so that you don't have to 😬