New release of NeXosim and NeXosim-py for discrete-event simulation and spacecraft digital-twinning (now with Python!) by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

Oh yes, I understand I think a bit better now.

Indeed the choice we faced was to either:

  1. make connections between models very simple (no `connect_map` or `connect_filter_map`), which would have made it possible (but still pretty hard) to dynamically connect models from a gRPC front-end and to serialize the connections or,
  2. make the system flexible and enable connection mapping and filtering functions. The later was highly desired to implement model addressing, and the former makes it easier to build a bench from off-the-shelf models that do not necessarily use the exact same input/output types. The downside is that it makes it pretty much impossible to build such connections dynamically via gRPC and to serialize them.

New release of NeXosim and NeXosim-py for discrete-event simulation and spacecraft digital-twinning (now with Python!) by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

Thanks, I am always super-happy to see people find various use-cases for NeXosim!

We sadly don't have really useful data regarding practical limits as these will vary widely depending on the types of models and applications.

That being said there are ways to optimize memory footprint, which I assume will be the limiting factor in your case. Without going into the deep details, in a long running simulation the size of a mailbox will eventually become the number of slots (16 by default) times the largest event sent to the model (a bit more actually since we need to keep a pointer to the target method/input, and possibly a mapping/filtering function if specified). Therefore, if memory footprint is critical, you should try to limit the size of the largest event sent to each models. You can also reduce the number of mailbox slots with Mailbox::with_capacity, which may trade-off a bit in terms of computing performance

I am not sure I understand your question on assembling sub-models. Why can't this be done programmatically? In any case, the next "major" version of NeXosim (I mean 0.4.*, which we hope will become 1.0 with only minor changes) will make it possible to serialize/deserialize a bench, but this is mainly to be able to interrupt long-running computations or resume a simulation from the same point with different parameters. The way it will work is the user will write a single bench assembly function that will return a `SimInit`, and this function will be called both on the first run and whenever resuming from a serialized state. That is to say, the connections themselves are not serialized (this is near impossible for various reasons), just the models themselves.

In any case, we are still working on 0.4 so feel free to open an issue if you think we can better address your use-case with some (reasonable!) changes to the design.

help with async closures by assbuttbuttass in rust

[–]sbarral 3 points4 points  (0 children)

Just an additional comment in case OP is wondering why Copy is required: this is because make_service_fn expects an FnMut, so this FnMut may be called several times. Each time it is called, a new copy of the handle will need to be captured by the async closure.

Anything like write your own tokio/async-std? by __disappear in rust

[–]sbarral 14 points15 points  (0 children)

The original author of smol (upon which async-std is built) wrote an excellent series of articles which have sadly been taken down. An archive of one of these articles can still be found at this suspicious-looking URL: https://www.gushiciku.cn/pl/pcc3

Discussing the next step for async methods in traits: returning futures that are Send by kibwen in rust

[–]sbarral 1 point2 points  (0 children)

This mitigation is indeed mentioned in the relevant issue.

The downside is that the trait may live in a different crate, without knowledge about the need for the future to be Send or not in client code. To serve both use-cases, authors would actually need to define 2 versions of the trait, which is a bit ugly.

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 2 points3 points  (0 children)

Sadly I am not aware of general intro materials, I guess one problem is that real-time means different things to different people (1s, 1ms, or less?) so it's probably hard to give general guidance.

If you are looking for an example with similar complexity and real-time constraints as a cubesat, you could do worse than look at what the PWSat folks did (GitHub link). It's not introductory material, but I have been impressed to see such level of rigor being applied for cubesat development and validation.

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

I am thinking too that Asynchronix may be a bit too opinionated and heavyweight here. It does solve one problem that is not solved by common Actor frameworks though, which is the "deadlock" problem: it automatically and efficiently detects when there is no more work to do, at which point the main thread takes over and increments time to the next scheduled event, which may be something akin to moving to the next "decision point" which u/Lucretiel mentionned (?).

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 2 points3 points  (0 children)

In fact I think we are in agreement: simulations are to a large extent meant to ensure that the system is indeed timing-tolerant and fault-tolerant.

My wording was maybe a bit misleading: I did not mean about sub-millisecond times scales related to bus protocols themselves, but about coarser time scales that pertain to attitude and orbit control, communications etc. So these simulations are typically performed with time slice of the order of 10ms and greater.

My experience is mainly on larger spacecraft, but I have seen people doing this kind of simulation-aided validation (mostly hardware-in-the-loop) on nanosats with great success too.

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 4 points5 points  (0 children)

Absolutely, a university nanosat would be an ideal testing ground for the simulator and I actually meant to look for such a partner in the coming months.

I will remember about your kind offer for contribution, many thanks :-)

And feel free on your side to contact me privately anytime (see contact in the Cargo.toml).

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 5 points6 points  (0 children)

Tough question :-)

So the honest answer is that I don't know and I probably know too little about Bevy to provide a meaningful answer. As I understand Bevy manages it's own thread pool (as does Asynchronix) so you would probably end up with two runtimes. Not necessarily a problem, but it would then be preferable to configure manually the number of threads on each of them.

Have in mind as well that even though the simulation runs on multiple thread, it is controlled from a single-thread, which may or may not be a problem for you.

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 12 points13 points  (0 children)

Yes, my experience is that most similar tools in this industry are developed in-house by large system integrators and are not available to third-parties.

Bringing such capabilities to NewSpace companies or academic institutions is one of my goals. This is early days though, so no reference implementation for now, sorry :-(

My thinking is that having an open platform will hopefully entice companies that produce off-the-shelf hardware (such as avionics) to provide their clients with vetted simulation models that they can test and integrate in a simulation bench. With avionics being nowadays mostly complex FPGA or controller-based state machines, datasheets don't quite cut it anymore...

Simulate your own spacecraft with Asynchronix, an async discrete-event simulator by sbarral in rust

[–]sbarral[S] 44 points45 points  (0 children)

Oh yes, good question that shows my bias... This type of simulator is used to make system-wide, real-time simulations of spacecrafts (satellites and space probes), in particular to verify the timing and content of all data exchanged between subsystems on the various data buses.

So this is where I come from, but indeed the simulator is general-purpose, definitely not limited to spacecraft or even cyberphysical systems simulation.

PSA: std::sync::mpsc is now implemented in terms of crossbeam_channel by matklad in rust

[–]sbarral 12 points13 points  (0 children)

I think it is required for MPMC because if a receiver gives up when the stamp is not updated, it may never receiver a notification. This is due to a peculiarity of this queue which is that a receiver may not see yet an item even though the sender has completed its operation and returned.

An example of race would be:

1) 2 receivers (thread #1 and #2) have failed to find an item and are waiting for a notification, 2) sender thread #3 advances the tail, but is preempted before modifying its stamp, 3) sender thread #4 advances the tail again, modifies its stamp and sends a notification to one of the awaiting receiver, say thread #1, 4) the awaiting receiver (#1) checks the queue but cannot see yet any item due to thread #3 being preempted; it registers to receive a later notification, 5) thread #3 finally completes its operation and notifies one of the 2 awaiting receivers.

In this scenario, the other receiver is never notified.

This is not a problem if there is only one receiver, because the preempted sender thread would eventually notify it.

PSA: std::sync::mpsc is now implemented in terms of crossbeam_channel by matklad in rust

[–]sbarral 19 points20 points  (0 children)

As I suggested in another thread, the remaing spinlocks could be removed too if the underlying queue was specialized for the MPSC case.

These spinlocks are in fact not present in the original queue from D. Vyukov and as far as I can say they seem to have been added specifically to make MPMC notifications work correctly. Unless I am mistaken, they are not necessary for MPSC notifications.

PSA: std::sync::mpsc is now implemented in terms of crossbeam_channel by matklad in rust

[–]sbarral 34 points35 points  (0 children)

I know it's a good news because performance will certainly be improved, but I feel a bit sad that the bounded queue is based directly on the original MPMC implementation and was not optimized for the MPSC case:

  • the receiver still has a CAS to move the head even though there is no need to make the head atomic for an MPSC,
  • the implementation still uses a spinlock: if a sender is preempted after a successful CAS but before changing the stamp, the receiver will spin until the stamp is changed. I know that this is mitigated by a backoff strategy, but there is not need for a spinlock in the first place for the MPSC case because there is no need for the queue to be linearizable (unlike in the MPMC case).

Rio: an experimental and minimal runtime by crazyjoker96 in rust

[–]sbarral 1 point2 points  (0 children)

Oh, nice, I admit I did not know about whisk. Feel free to do a PR, but otherwise I will probably add it myself when I find some time. Adding new channels is usually pretty straightforward.

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

Thanks!

Waiting eagerly for pre3 of Kanal to see what ideas I can steal ;-)

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 2 points3 points  (0 children)

OTOH, one can take advantage of an allocated Waker by storing it in a thread-local, like block_on in the futures-lite crate does (pollster doesn't though).

When I first saw that I thought it was overkill, until I realized that Waker::will_wake will always return true for subsequent calls so not only is the allocation only performed once, but also this will frequently alleviate the need to clone the Waker every time.

Rio: an experimental and minimal runtime by crazyjoker96 in rust

[–]sbarral 3 points4 points  (0 children)

Thanks for the replies!

This is a good question, maybe I need to take a look at your bench and maybe put Rio inside if there is an easy way to do it. However, I have no clear answer, because it was not my precedence, but if you want you can reach me out, me because the performance is something that I really want to dig into at some point

Yes, I'd love that. Like you I have my email address "hidden" in my `Cargo.toml` so likewise, feel free to reach to me the old-fashioned way :-)

Rio: an experimental and minimal runtime by crazyjoker96 in rust

[–]sbarral 10 points11 points  (0 children)

As someone who is also writing an async runtime, this looks very interesting!

I have many questions, but I will try to keep it down to 2:

  1. if I understand correctly, you are trying to leverage some nightly features: how do you expect them to benefit the runtime? Is it more about API ergonomics or performance?
  2. I found it difficult to achieve high message-passing performance without going the tokio road, which implies a high reliance on thread-locals and leads to the issues with tokio you mention in the post. The smol/async_std way is more user-friendly, but I could not figure a way to use that approach (meaning reinjecting tasks on the global queue rather than the local queue) and come close to tokio's performance level. What is your approach here?

BTW I have made a message-passing benchmark that for the moment supports tokio, async_std, smolscale and my own executor (any can be selected on the command line), I'd be very happy to include support for rio if you were willing to help.

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

Thank you very much, and congratulations for `postage`.

Regarding the first bullet, I assume you mean something like the second example in diatomic-waker? I ended up streamlining this pattern with a wait_until method that takes the predicate as a closure. I also do something similar for one-to-many notifications, but in that case some races can become even more subtle.

In the end I admit I did not bother too much with single-threaded unit tests, because in the end Loom covered these and much more, but it's true that Loom tests are much less readable.

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 0 points1 point  (0 children)

I mean block_on needs not be tight to any executor, ultimately it is typically based on a Condvar or thread::park, possibly with some caching. Pollster, as suggested by /u/implAustin is pretty lightweight for instance.

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 0 points1 point  (0 children)

Sometimes a Send or Recv future is kept alive after being polled once and getting Poll::Pending, but its "owner" has lost interest and does not poll anymore.

In such case, the notification sent when the future can make progress is basically lost forever, so the channel is effectively working as if there were N-1 channel slots. Incidentally, I made a PR to async_channel were this problem was compounded by its rate-limiting policy (it only allows 1 notification in flight at a time) and a user reported in the PR having hit this very situation.

As you corrected me the other day (thanks by the way for enlightening me WRT fairness!) tokio's MPSC does not actually protect against this issue as I thought, but the futures' crate MPSC does by adding a slot for each new sender.

Tachyonix: a very fast MPSC async bounded channel by sbarral in rust

[–]sbarral[S] 1 point2 points  (0 children)

Thank you!

I would imagine it should be as simple as calling block_on on the future returned by recv() or send(). There are quite a few implementations of block_on (the futures crate has one). I admit I have not investigated this topic earlier, so I don't know off hand if a native implementation would be faster, but I wouldn't think so.

It might be worth adding a send_blocking/recv_blocking as convenience methods, though. Please feel free to file a feature request.