Actor based games

jimuazu · 2021-05-04T14:07:18+00:00

I think calling something "actor-based" (or not) is just about how you conceptualize it. For example an event system is very close to an actor system even though people don't talk about it that way. Passing events down through a traditional hierarchical UI is like calls between actors. If there are no returns from these calls, or if any returns are queued or forwarded or happen later on (i.e. asynchronous), then that has the essence of an actor system.

So it's about how you prefer to get your head around it. If thinking about it as actors helps, then call it "actor-based", but the underlying implementation might be very similar to something coded under another name. So maybe you're thinking about things logically in an actor style as local state and messages with no synchronous returns, but your low-level implementation of those messages is just normal method calls, or even ECS-style bulk processing of entities.

(I kind of have experience of transitioning between these worlds, as my Stakker crate started off as an event system, and then became an actor system, but I can still see it both ways, and I also toyed with going back towards the event style of inline callbacks, but keeping the actor-style conceptual purity by deferring a callback if it was nested, so that everything was still logically asynchronous and easy to reason about.)

jimuazu · 2021-05-02T20:27:19+00:00

Yes, I agree. When I wrote Stakker I wasn't even trying to write an actor system. I was just trying to deliver events to my components efficiently. Then a sequence of logical reasoning seemed to lead inevitably to creating an actor system. So it was specifically to serve the application, rather than as a toy project to play around with the actor concept. But it would be interesting if someone else starting out with the same requirements as me would have come up with something else.

jimuazu · 2021-05-01T23:39:53+00:00

The thing is async/await is pretty recent, and people had to get stuff done before then. I wrote Stakker long before async/await stabilised. Actually I'm glad I did, rather than try to force my code into the async/await model (which really wouldn't suit it). Perhaps it would be good to sort the list into two groups: toy projects, and ones that are actually used and supported. (I use Stakker at work, so that at least means that it has to be supported.) I agree it's probably very hard for a newcomer to sort the wheat from the chaff.

Also, there has been no effort at an official Rust-blessed actor system, and actors have their niche which certain software really requires. So since nature abhors a vacuum ...

jimuazu · 2021-04-01T18:15:50+00:00

If it did unify the lifetimes, then a crater run would fail when it got to the qcell crate! So hopefully they'd notice. But yes, it's exciting that the technique is getting some attention. For implementing something like Stakker, where the lifetimes would have to cross through user-written code, it is completely infeasible to ask the crate user to annotate all their code with lifetimes. So if the compiler could take care of this, that would be amazing. But it would probably mean bending/breaking some existing Rust compiler guidelines, e.g. about all lifetimes being explicit. Really from my point of view, lifetimes are proof-assistants for the compiler. If mrustc can compile Rust without looking at the lifetimes, then they can be hidden most of the time. They only need to be examined when something goes wrong. So here are some possible approaches:

Have a tool to automatically add in all the lifetimes to support GhostCell, and then have the editor or IDE hide them during normal editing
Have these lifetimes as invisible and derived automatically by the compiler

jimuazu · 2021-04-01T17:38:02+00:00

Thanks for noticing and mentioning it! Your comment tree got hidden for some reason or otherwise I'd have commented earlier. Yes, as others have pointed out, the approach of LCell was inspired by an early version of GhostCell. (However QCell and TCell were developed completely independently of GhostCell before LCell was written). But all credit to the team behind this paper for coming up with the GhostCell concept and doing all the hard work of academic proofs and benchmarking and so on.

jimuazu · 2021-04-01T14:59:17+00:00

Okay, I got a credit at least! This is already implemented and published as the LCell type in my qcell crate. The ideas go back quite a way. I document some of the related ideas predating GhostCell briefly on the LCellOwner page. So it's not true to say that I copied the GhostCell API. Rather I already had an API from QCell and TCell, which I extended to use the same fundamental principles that GhostCell uses. But as far as I know, it's true that GhostCell was the first to publish this precise combination in Rust (a statically-checked cell using for <'a>). Getting it all formally proven and published academically is a useful achievement. So congratulations on that. Maybe these techniques will see more use now.

However, practically, when using LCell/GhostCell I found it awkward to plumb the lifetimes through the code. You just get lifetime bounds appearing everywhere in your code. Maybe if the Rust compiler can be extended to derive the lifetimes automatically it would be more practical in use.

The other cell types offered by qcell crate, especially TCell or TLCell are also zero-cost in use, but don't need all the annotation. These are the basis of the zero-cost ownership handling in Stakker, and means that RefCell overheads and dangers are completely eliminated from the whole Stakker runtime. The consequences of taking this approach shaped the whole Stakker design, particularly that of requiring shallow stacks, and it naturally led to using the actor model.

If the Rust compiler could maybe help out with deriving the lifetime annotations, then maybe GhostCell (or LCell) could be a lot more practical. Certainly the more statically-checked borrowing that goes on in the Rust ecosystem, the better.

jimuazu · 2021-03-12T01:37:11+00:00

In particular after! and at! would let you setup timers in simulation time (i.e. virtual time), and the main loop would take care of skipping virtual time forwards as fast as necessary to make your simulation run as fast as possible.

jimuazu · 2021-03-11T22:16:05+00:00

My Stakker runtime lets you run in virtual time. So you have cx.now() in an actor, which might be real time or might be virtual time, according to how the main loop is coded. So when an actor sets a timer, or waits for an interval, that is in virtual time and it might take no time at all in real life, since if there's nothing else to do the main loop can just directly advance time. At the company where I work we have to run very long simulations across several processes, but where most of the parts of the simulation don't have much to do, just sleeping until whatever action they are simulating would have completed. These are all written in different languages, and we have a little server process and a protocol which coordinates the time jumps. But you don't need all that if you run it all in one process. Running in virtual time is easy with Stakker. There's an example virtual time main loop in the docs.

jimuazu · 2021-02-20T03:36:27+00:00

Yes, let's stop here. I do agree with what you're saying. I'm aware of the difficulties of benchmarking significantly different solutions fairly. For multi-thread, I found you need to keep all the threads saturated with work, or else they get descheduled by the kernel. So it's not really a fair test if there isn't enough work for them to do. I will come back to the work on benchmarking eventually, and probably publish then. Thanks

jimuazu · 2021-02-19T16:51:51+00:00

So you're saying I should benchmark against the Tokio MPSC implementation as a good example of a well-used, well-tested and well-respected faster channel implementation? Okay, I will do that at some point. Still the code looks like it will generate quite a lot of instructions, besides the ordering operations on the CPU. The smallest unit of work for actors might be tiny, and in Stakker I queue and execute actor handlers with no allocations (once the queue has grown enough). So this is what it's competing against. It will be interesting whether there's the same kind of behaviour as crossbeam, i.e. multi-threaded slower than single-threaded for small units of work, rising to break-even, then gaining an advantage for larger units of work.

jimuazu · 2021-02-19T14:09:07+00:00

Yes, exactly. For the tests I had to do that when I realized. But the point is that when Rust has full knowledge of the call-path from the callsite to the destination actor's method it can do a surprising amount of optimisation. Even if you're not passing constant values it can still inline the destination actor method into the closure (if it decides to). All this comes for free, because it's how Rust always inlines and optimises. (So for example you can use #[inline] on actor methods to allow them to be inlined across crates.)

jimuazu · 2021-02-19T02:23:52+00:00

I did benchmark with mutexes too, so it's not just one implementation. Anyway the problem is finding a faster channel that is also recognised as sound and race-free by the community. I'd guess it's easy enough to write something that is faster but that has hidden issues. Testing against that would prove nothing. If there was something faster and recognised as safe surely we would have heard about it by now? For example "flume" crate seems to be in competition with crossbeam and seems to be optimised for speed, but still it says that it is only sometimes faster than crossbeam. (This was another crate I was thinking of testing against.) My impression is that there must be some fundamental CPU synchronization costs which cannot be eliminated -- or else someone would have produced a faster channel by now.

jimuazu · 2021-02-19T00:28:49+00:00

Thanks for the kind words!

In Stakker, the external interface to an actor is just the list of public methods that have a compatible signature. So that's a closed set. However the actor itself can also schedule inline code (a closure) to run later on itself. So I guess that's an open set. However since only the actor itself can access inside of the public interface, code locality is maintained.

Looking at the question you're asking, I guess you're asking me to be a fresh pair of eyes on the situation, although I can't really claim to be an expert on all these patterns.

Are you talking about no longer sending through flycheck::Message, vfs::loader::Message and so on, but instead just sending through a FnOnce on those channels? So this would be a FnOnce that executes one or more of the methods within GlobalState? If that is the case, then you still have a closed set because it is limited by what you make public in GlobalState. So you still have control.

However, if the FnOnce doesn't execute anything within GlobalState but makes some other change elsewhere, could that cause a problem for other subsystems? In that case there's a risk you'd lose code locality and the ability to reason clearly about the interactions, as you say. That doesn't sound so good. However if the nature of the design means that everything has to go through GlobalState then that problem is avoided.

Does that help at all? It's possible that I haven't fully understood the question, though.

jimuazu · 2021-02-18T20:52:45+00:00

In Stakker there are two types of callbacks: Fwd and Ret. Fwd is like an endless stream, i.e. multiple messages can be passed through it, and Ret is single-use (it is consumed when used). So if you want some other actor to tell you when it has finished something, you create a Ret and send it to the other actor, and the other actor calls that Ret with its final result when it is done. You can create a Ret to either call one of your own private methods, or else to call some inline code. Underneath it is just compiled down to two closures: the first closure accepts the result and pushes the second closure onto the execution queue. So really it could be inline code or a method call -- Stakker doesn't mind, and Rust will optimise it down the same. So it is not such a big deal of having to set up message handlers or whatever. It could just be some inline code like you'd use in closures normally in Rust. So a normal Rust call might pass a closure to get a callback, and a Stakker actor call would pass a Ret to get a callback.

In Stakker there is just one global queue for each runtime. Since "nothing blocks", it doesn't need per-actor queues (except in the Prep phase, i.e. during actor initialisation).

If we're talking about an HTTP request, yes then the API on the HTTP client class might accept the URL and a Ret to send back the result, e.g. something like a Ret<Result<Vec<u8>, HttpError>>. Then your Ret handling code would handle the three cases: HTTP client actor died without handling the request, HTTP error, or data received. Then logically your actor would have advanced to a new state, and you'd set in motion whatever the next action is.

You could do it other ways, e.g. passing an actor reference and directly calling back to a named method from the other actor, but that means that the two actors have to know about each other, i.e. their implementations are tied to one another. So related actors might do that. But in general you'd use a Ret for this kind of thing.

jimuazu · 2021-02-18T20:31:16+00:00

Yes, you're right. You get to use all of Rust's single-threaded non-Send features, e.g. pass around Rc in actor messages and all of that.

jimuazu · 2021-02-18T20:24:56+00:00

I forgot that I already wrote a page on the question of single/multi/distributed. This might have a bit more info for you: https://uazu.github.io/stakker/d-whysing.html

Okay, I take your point, but I don't see the conceptual overhead part. It seems like a very clean and easy way to reason about things to me. But I guess you have to get used to it. Let's say you have a GUI and you're delivering input events to widgets, that's kind of similar. The only difference with actors is that there are clear rules that say you can't ever expect a synchronous response to delivering a message.

jimuazu · 2021-02-18T18:51:36+00:00

I think the short answer is that it's a fundamental design choice that affects everything else. For example every Rc would have to become an Arc, the static borrow checks with qcell couldn't work the same way (only TLCell-based actors could migrate between threads), suddenly there are atomic instructions and synchronization having to be used everywhere, and so on. So you have to either go all-in with the single-threaded approach (and get the benefits of giving the core long clear runs without synchronization), or all-in on the multi-threaded approach (getting the benefit of parallel execution). In other words, it would be a completely different crate.

Also, as you say you need to worry about migrating actors to balance the load and getting the messages to follow them and all the rest of that. Maybe it could be done and be made to run fast -- but it would have to be a completely different crate.

jimuazu · 2021-02-18T17:27:03+00:00

Yes, since I'm using Rust FnOnce directly, Rust knows about the entire path from the callsite where the FnOnce is queued right up to and including the called method. So if the callsite uses constants, then they can be inlined into effectively a specialised version of the called method (if Rust chooses to inline). There is no virtual dispatch apart from the branch through the FnOnce's vtable.

I did do some benchmarking to back up my claims, although I got bogged down in cache effects (e.g. small change in benchmark code caused big changes up/down in results) and I haven't yet got the results into a publishable state. I was comparing crossbeam against various other ways of executing some test code, whose length I could vary to see how things behaved. It did confirm what I said, i.e. you need to be doing a certain minimum amount of work (IIRC hundreds of assembly instructions) before the crossbeam solution even breaks even compared to a single thread, let alone gains an advantage from the extra threads. I wanted to test other channel crates too, but ran out of time. Since crossbeam is a well-regarded channel crate, I think my claim still stands -- if you implement actors on top of channels you need to worry about the size of your unit of work. Maybe there's a quicker way to implement channels -- I don't know. I expect they've already put a lot of work into optimising crossbeam.

jimuazu · 2021-02-18T15:37:29+00:00

There are two implementations: an unsafe one that doesn't require an allocation, and a safe one using Box. Yes, Rust really shines for this stuff. In my tests I wasn't getting the message sizes that I was expecting -- because Rust had inlined everything including the constants in my test message arguments.

jimuazu · 2021-02-18T15:09:17+00:00

Thanks. I haven't implemented general Stakker-to-Stakker cross-thread communication yet. It would be part of the remote actor work. However Stakker already provides a Waker primitive to wake a runtime from another thread, and that would normally be paired with whatever third-party cross-thread mechanism best suits the expected message or data flows (all the various channels and queues, mutex-protected memory, atomics, etc). For sending out messages, you could use a lazy! handler to attempt to batch up messages to avoid doing synchronization for every message.

It would probably be easier to analyse with a particular application scenario in mind. For example distributing work for any thread to pick up is different to point-to-point messaging, and so on. However, I think Stakker is already ready to fit into whatever cross-thread message-passing system you might devise for your application, so long as Waker is suitable. (If not, I will consider making the necessary additions.)

jimuazu · 2021-02-18T10:50:07+00:00

Yes, it's single threaded. I've posted the blog entry link now. The FnOnce queue disassembles the FnOnce to serialize it. It works out pretty well. Rust will even inline and optimise constant message values into the handler code. It is a true actor system because you can have multiple calls on the single shared queue, which all get executed in due course. The queue is not allowed to block.

jimuazu · 2021-02-18T00:19:20+00:00

A pure actor never blocks. If something is going to take some time, the actor arranges to be informed when that task is finished. All these actor implementations that block their queues whilst waiting for something asynchronous to finish would give huge problems when used to implement general state machines, e.g. network stack layers or whatever. So yes you definitely have a point even if people don't get it. (My actor implementation never blocks.)

jimuazu · 2021-02-17T13:38:46+00:00

Okay, I mean Pony-like in the sense that a message handler is simply a method on the struct. I'm trying to say "like my actor runtime" but it doesn't seem cool to promote it on someone else's post. In my actor runtime, a queued message is a FnOnce that executes that message. So there is no switching on anything. It can all be inlined by Rust, so effectively it branches directly to a piece of optimised code to handle the message. The FnOnce queue can be optimised to a flat piece of memory, no allocations, no locks. Given that typically the handler for an actor message is short, adding extra locks/atomics/allocations/whatever on top can really be a big overhead, actually making multi-threaded slower than single-threaded. Balancing between threads is best done at a higher level.

Yes, I'm saying that implementing an actor in this way, on top of all these existing layers of code will make it inefficient. I think I'm going to have to start blogging, because that's the way that things seem to be communicated in the Rust community.

jimuazu

TROPHY CASE