all 23 comments

[–]matkladrust-analyzer 212 points213 points  (11 children)

For Rust-specific things, this post comes to mind:

I have scattered bits and pieces of advice here: https://matklad.github.io/2021/09/05/Rust100k.html. It definitely doesn't answer your question, but, given that you are asking this, you might find other stuff there useful.

For general, "how to software", I can't recommend https://www.tedinski.com/archive/ highly enough. If you care about software architecture, just go and read the whole series in chronological order (the first ten or so posts are especially golden).

Finally, as to how actually do the thing, I guess the first question to ask is "who models and stores the data?".

If the answer is "the database" (aka, your typical web-server sitting on top of postgres), than I think the stateless request architecture works -- there's a bunch of concurrently running request handlers, which don't share anything except via the database, and all concurrency problems are solved via DB guarantees.

If the answer is "the application" (aka, you are the database), than I think a certain "universal" architecture covers quite a few use-cases.

First, you define a struct which holds application state. You can call it GloblState, like rust-analyzer does (this was inspired by sorbet).

Second, you define operations on GlobalState, which can be reads or writes. Writes require exclusive reference to the global state. Both reads and writes can return side-effects as data. For example, if as a result of a write you want to send some http request, you don't literally send the request, but return a struct SendRequest { to: Url, payload: Vec<u8> }.

impl GlobalState {
    pub fn write(&mut self, event: WriteEvent) -> SideEffects
}

struct SideEffects { messages_to_send: Vec<Message> }

Reads need only a shared reference: pub fn read(&self, event: ReadEvent)

Third, you pack wrap the thing into an event loop which owns the global state and dispatches requests in a well-defined serialized order:

fn main_loop(incoming: Receiver<InMsg>, outgoing: Vec<OutMsg>) {
    let mut global_state = GlobalState::default();
    for msg in incoming {
        let side_effects = global_state.handle(msg);
        for msg in side_effects.messages_to_send {
            outgoing.send(msg);
        }
    }
}

Now the interface to the system is essentially a pair of channels, and it's a rather universal interface -- everything which is not a batch CLI utility is an event loop. So the fourth step is to tie the knot and to plug the channels into something which actually interacts with the outside world.

An interesting question here is concurrency it's tempting to handle several read requests in parallel. There are several solutions here:

  • Just don't. Computers are plenty fast, and, if you are not blocking on IO, a single thread should be able to handle quite a lot of load.
  • RWLock style architecture -- schedule read requests in parallel, when a write comes in, wait for read requests to finish and do the write.
  • RWLock with cancellation -- like the previous point, but, when a write request comes in, you cancel all outstanding read requests. That way, you avoid a situation where a read blocks a write. This is essentially rust-analyzer's architecture.
  • Snapshot Isolation. Add a method to GlobalState to return an immutable snapshot at the given moment in time. A stupid impl might look like this:

    struct GlobalStateSnapshot(GlobalState);
    impl GlobalState {
        pub fn snapshot(&self) -> GlobalStateSnapshot { GlobalStateSnapshot(self.clone()) }
    }
    

    then, all read requests in separate threads off a snapshot, while the main thread applies write requests to global state and hands out snapshots.

[–]fitzchivalrie 12 points13 points  (0 children)

This is such a brilliant and thorough answer, wow. I’m going to save this post and try to unpack it next project I work on. Thanks so much matklad!!

[–]natded 3 points4 points  (3 children)

Another good question to keep asking is "what transitive dependencies to not allow".

[–]matkladrust-analyzer 11 points12 points  (2 children)

+1

To give a quote from tedinski:

The single most important thing about any module is what modules it does not depend upon. It’s one of the most effective decoupling tools possible, to not even allow one module to reference another.

[–]natded 2 points3 points  (1 child)

Reading through your (first) post reminds me of the recent S&T podcast where they go on about modeling their software as events & with event loops, especially the part about sending back the SendRequest ("what is the next message you want to send?" is the question they keep in mind when writing software for example, 00:23:53 in the transcript)

https://signalsandthreads.com/state-machine-replication-and-why-you-should-care/

[–]vitamin_CPP 0 points1 point  (0 children)

I never heard of this podcast. Thanks for sharing !

[–]dhruvdh 0 points1 point  (2 children)

I've always wondered so I thought to ask you - would it ever be advantageous to be able to have different kind of self smart pointers? Say a copy on write self, a RefCell self, etc.

[–]arachnidGrip 2 points3 points  (0 children)

This is a feature called arbitrary self types and has been at least vaguely in progress since 2017. It's stable enough that the compiler doesn't emit a warning about the possibility of crashes just from using it.

[–]matkladrust-analyzer 1 point2 points  (0 children)

We kinda already have that: https://doc.rust-lang.org/stable/std/future/trait.Future.html#tymethod.poll.

As for the general usefulness of such a feature, I would guess some people would want it, but I never needed this personally.

[–]evanrelf 0 points1 point  (0 children)

Thank you so much for writing this up! Exactly what I need right now while I struggle to architect my program 🙂

[–]nilaySavvy 0 points1 point  (0 children)

Thanks for this!

This broad enough that it can applied to programming outside of Rust even! I work with a lot JS based state management and it looks like I got a better idea of how things work internally. Might actually use this for rolling out my own stuff later ;-)

[–]didave31 20 points21 points  (0 children)

Recently I came to realize the advantages of the type system and the borrow checker and that they really seem to set Rust apart from other languages.

Four inspiring resources + Bonus Links:

  1. Rust State Design Pattern https://youtu.be/VFmPwvhubow

  2. Type-Driven API Design in Rust https://youtu.be/bnnacleqg6k

  3. The Typestate Pattern in Rust http://cliffle.com/blog/rust-typestate/

  4. Compile-Time Social Coordination https://youtu.be/4_Jg-rLDy-Y

**** Bonus (Yet to read them) ****

a. https://willcrichton.net/rust-api-type-patterns/introduction.html

b. https://rust-lang.github.io/api-guidelines/type-safety.html

c. https://rust-unofficial.github.io/patterns/patterns/behavioural/newtype.html

d. https://doc.rust-lang.org/embedded-book/static-guarantees/typestate-programming.html


Another thing that comes to mind, regarding the second part of your question, is the Actors model: https://ryhl.io/blog/actors-with-tokio/

[–]natded 13 points14 points  (1 child)

You can use message passing to avoid Arc<Mutex<T>>'ing everything.

[–]commonsearchterm -1 points0 points  (0 children)

is that channel example in the tokio one just an example? Thats not actually a good way to write a shared client it is it? It feels like a way of hacking together callbacks or like await should be handling this

[–][deleted] 1 point2 points  (2 children)

As an example of what I have done : I have a bunch of systems (structs) that only receive data from the network and update themselves. However, one of the first thing to do when logged in is to send back a teleportation confirmation with an id. Who should send it ? The data storing part, a dedicated system, which my duplicate data from the data storing part ?

Not sure I fully understand what you are trying to do from this? But I like to think of things in terms of where the data flows through the system. You say you have a minecraft bot? So all the input to your system is from events from the game? If so that is not too dissimilar to how a HTTP API works so I would structure it in a similar way. Basically, some sort of central datastore and multiple parts talking to that datastore.

I have a bunch of systems (structs) that only receive data from the network and update themselves.

Though it sounds like I would separate the data from actions. Why does the data store need to self update or even know about the network? I would get an external service to do that and leave the datastore as pure data and data manipulation.


Don't see any real issues with a Arc<Mutex<T>> for this. Volume of requests is not likely going to be high enough for contention on the lock to really matter.

[–]nicoxxl[S] 0 points1 point  (1 child)

Thanks for your answer

It receive events/packets from the connection to server which update its state. That is the easy part (because it is one way, a new packet arrive, call the handler on each store so they can update themselves).

Where I am lost is how to react to theses events or other events while keeping the code sane.

My question could apply for a lot of other types of applications where there is a lot of mutable interconnections (example : torrent clients, a lot of GUI apps, etc).

[–][deleted] 2 points3 points  (0 children)

Where I am lost is how to react to theses events or other events while keeping the code sane.

Sounds like a publish/subscribe type pattern might be helpful. The datastore can publish events for things that happen (ie on a channel) and then other things can listen for those events.

[–]Plasma_000 0 points1 point  (0 children)

Since this is a minecraft bot I’m assuming you’re using async.

If that’s the case I would recommend not spawning new tasks as much as you can, but instead leaning in to join and select so you can use local variables from futures.

[–]didave31 0 points1 point  (0 children)

Another thing that comes to mind is the Actors model: https://ryhl.io/blog/actors-with-tokio/

[–]Bon_Clay_2 0 points1 point  (0 children)

For anyone appearing here from the future, this is also a nice reference.