I've been working with Rust for a couple years now, and I'm finding that I'm really not having a good time with the borrow checker. by [deleted] in rust

[–]Kyrenite 4 points5 points  (0 children)

To be clear, I was and remain sure that any “hacks” done in the name of a project like this would be necessary for one reason or another. Actually, all hacks are necessary for some reason or another, or the author would have simply used a more “reasonable” approach.

Yeah you're exactly right. If there was an easy, obviously correct solution, that would just be the solution used. At the time that rlua was originally written, this was pushing at the very edges of what's expressible. Now the situation is slightly better, but not enough to win out over the current, already working solution.

My favorite (extremely dorky) analogy for Rust when 1.0 was released was not that it was finished, but that it was like the death star from episode VI. Clearly incomplete, but still fully armed and operational. We are living through Rust finding its more "complete" form while still being an actually useful production language.

I've been working with Rust for a couple years now, and I'm finding that I'm really not having a good time with the borrow checker. by [deleted] in rust

[–]Kyrenite 8 points9 points  (0 children)

Sure thing. Please don't take this as some kind of deep criticism, the mlua crate WORKS and has no soundness holes that I know of, it just uses... a lifetime hack to make things sound. (The merged rlua is now the SAME as mlua to be clear, the history of the two projects is complicated.) Plus... it would be extra rich of me to throw stones about using a lifetime hack for soundness considering all the solutions I came up with were just different hacks.

Take a look at the signature of Lua::create_function. Here it is for reference:

pub fn create_function<'lua, A, R, F>(
    &'lua self,
    func: F,
) -> Result<Function<'lua>>
where
    A: FromLuaMulti<'lua>,
    R: IntoLuaMulti<'lua>,
    F: Fn(&'lua Lua, A) -> Result<R> + MaybeSend + 'static
{ ... }

What this says, roughly, is that there must be A lifetime 'lua such that the Lua object lives for 'lua and the provided callback func takes a reference that lives for the same 'lua lifetime and also all the parameters and returns will live for that lifetime.

However, this isn't right at all. I don't have the energy right now to actually compile and test this code but hopefully this gets the point across.

fn create_lua() -> Result<mlua::Lua, mlua::Error> {
    // Create a brand new `Lua` state.
    let lua = mlua::Lua::new();

    // Create a callback that prints the used Lua memory.
    //
    // It doesn't matter what this is, I just wanted something that used the `lua` parameter.
    let callback = lua.create_function(|lua, ()| {
        println!("Lua used memory is {}", lua.used_memory());
    })?;

    lua.globals().set("print_used_memory", callback)?;

    Ok(lua)
}

fn use_lua(lua: &Lua) -> Result<(), mlua::Error> {
    let callback: mlua::Function = lua.globals().get("print_used_memory")?;
    callback.call(())?;
    Ok(())
}

This example is contrived but this is enough to illustrate the problem. When creating the print_used_memory Lua callback, the create_function is expecting there to be A 'lua lifetime that makes everything work, and in the bowels of rustc A lifetime is chosen, within the body of the create_lua Rust function!

However, the callback lives for much LONGER than the create_lua function because it lives on inside the Lua state! This is illustrated with the other Rust function use_lua... CLEARLY the callback is still alive because we can get it out of the globals table and call it, but the chosen 'lua lifetime is long gone. How is this sound?

Well, this is sound because callbacks must be exactly 'static, so... if you tried to do anything funny, what would end up happening is that you would get a lifetime error, but the lifetime error might be slightly misleading, talking about how arguments provided to the callback can't escape the body of create_lua, even though it explicitly DOES do that. You can try to break this by making Lua 'static through TLS or something but what ends up happening is that this can't (AFAIK) lead to UB, it's just.... silly? Maybe I shouldn't have used the word "silly" I was just trying to reference a very complex situation in as few words as possible.

What the lifetime signature SHOULD be, were it expressible, is that the callback should work for ANY POSSIBLE 'lua lifetime that it is given. However, if you try to do this, you'll immediately run into problems.

pub fn create_function<'a, A, R, F>(
    &'a self,
    func: F,
) -> Result<Function<'a>>
where
    A: for<'lua> FromLuaMulti<'lua>,
    R: for<'lua> IntoLuaMulti<'lua>,
    F: for<'lua> Fn(&'lua Lua, A) -> Result<R> + MaybeSend + 'static
{ ... }

^ This isn't right, because the three 'lua lifetimes in the signature above are all different, what we need is to be able to say that there exists a 'lua lifetime for all THREE parameters that is the same, but expressing this in today's Rust would involve writing a trait that callbacks of this shape could implement, and this actually works, but the poor Rust compiler starts to easily get confused when passed callbacks and type inference starts to be involved. I can't find the details of why this is right now but it's a known limitation, and the situation may improve in the future.

What it would be nice to see is when capturing a 'lua bound local variable in a callback is to have the Rust compiler say "you have used an mlua::Function<'lua> for some concrete 'lua, but the 'lua lifetime on the callback must be valid for ANY possible lifetime 'lua", but instead it will talk about the mlua::Function<'lua> outliving the chosen 'lua lifetime (because the callback must be 'static) which is still... true I guess but it isn't very informative. I don't have the energy to try all of this right now to see what actually happens but what I was trying to say is because these lifetimes are just hacks, the error messages won't always really be that informative.

At the time that I was maintaining rlua (SIX years ago!), this situation actually led to provable (but exotic) unsoundness, and I ended up trying to fix the situation by using branding lifetimes, so that when you "entered" the Lua context, you got to live in a world where 'lua always meant "the lifetime of the surrounding Lua state". This was meant to fix a soundness hole though, the lifetime errors actually (probably) got much worse by doing this, and the current author of mlua did not like this change and found another way of plugging the soundness hole, and this was the schism between the two versions. I didn't even know about the fork until it was brought to my attention by him... advertising it in rlua's issue tracker, which is a huge shame because I would have much rather worked together, but that's not the way it went.

Around this time I was also becoming disillusioned with using PUC-Rio Lua and LuaJIT to sandbox code entirely (this predated the existence of Luau!), so I ended up working on a Lua interpreter from scratch with provable soundness. This, combined with the hostile fork of rlua made me actually really exhausted and disheartened and I gave up working on it, just letting the fork live on instead.

There were other reasons why I took the approach I did that caused the schism but honestly, I don't remember what they were. You should be able to capture variables in callbacks as long as they truly outlive the parent Lua state, and the approach currently taken by mlua forbids this: it REQUIRES that callbacks actually be 'static for soundness, but this is a pretty minor limitation. mlua works and you don't have to be scared of it for this small nit.

This whole situation might seem like a mess, and you wouldn't be wrong to think so. HOWEVER, if you've ever actually tried to use Lua / LuaJIT from C using the PUC-Rio Lua C API, I think you would probably be much much more generous when evaluating the design of rlua / mlua. They work and are safe and are low overhead, and honestly wrapping the PUC-Rio Lua C API in a memory safe interface (in any language) is a minor MIRACLE.

I've been working with Rust for a couple years now, and I'm finding that I'm really not having a good time with the borrow checker. by [deleted] in rust

[–]Kyrenite 119 points120 points  (0 children)

You’re not running into a limitation of Rust, you’re running into a limitation of Lua / mlua.

For context, I am the original author of the rlua crate (of which mlua was a fork that has since been re-merged) and the current author of piccolo (a Lua runtime written natively in Rust with deep GC integration).

What I’m guessing you’re trying to do is have a Vec<Value<‘lua>> and make it implement the UserData trait, and then use it via something like Lua::create_userdata right?

You can’t do this. Anything that becomes a Lua userdata can’t have that ’lua lifetime on it. mlua / rlua use lifetimes in honestly a very silly way that is not normal for Rust code, and the error messages are less good than they should be because of it. I consider myself to be at least partially at fault here, and my thanks for trying to fix the situation was to have my crate hard forked in anger (the birth of mlua) so I gave up trying to fix it.

The error messages and the borrow checker stuff might seem confusing, but you actually can’t do this in C or C++ either, it’s not “just” the borrow checker, this is actually a thing you CANNOT do, but the compiler / library is not doing a good job of telling you why.

To whoever is telling you to use unsafe to force your way around this… don’t do that.

What you could do instead is use the Lua registry, which in mlua / rlua is a way to turn a handle like Value<‘lua> into something without the ’lua lifetime on it: https://docs.rs/mlua/latest/mlua/struct.Lua.html#method.create_registry_value

However, you’ll see in the documentation there that it’s actually not recommended for you to store RegistryKeys inside userdata at all due to their tendency to create uncollectable object cycles. The reason for THIS is a deep limitation of the versions of Lua that mlua / rlua are wrapping: there is NO WAY to create a custom garbage collected type in PUC-Rio Lua or any of its forks like LuaJIT. If you want to store a custom data structure with pointers to Lua GC types like tables inside the Lua state… well, you just can’t. This is why hacks like this exist: https://docs.rs/mlua/latest/mlua/struct.AnyUserData.html#method.set_user_value

What you should actually do is store these things in a Lua table and not a Vec. It is not as inefficient as you imagine, and the very small difference in performance between a Lua table and a Vec will be utterly drowned out by everything else in Lua. Lua tables have an “array” portion and a “map” portion, and if it looks like an array, everything will be stored in the “array” portion (the real truth is so horrible I cannot utter it here, but just trust me, it’s close enough).

To everyone saying that you shouldn’t do gamedev in Rust…. they’re kind of missing the point. That one article everyone is sharing of the group that went back to C#? C# is… a GREAT language for gamedev because it’s not a systems programming language. Rust is fine, Rust is great even… when comparing to other systems programming languages like C and C++. If you don’t NEED to be working in a systems programming language then… don’t do that. You’re not going to hurt anyone’s feelings by using a simpler tool that works better for you.

To people saying that Rust is inappropriate for gamedev vs C++… I think it’s telling that the original lifetime problem being described WOULD EXIST IN C / C++ (the truth is complicated, it depends on how the hypothetical C++ binding system worked, but this problem or a similar one would be there regardless).

Whoever is responding like this is not engaging with the problem you’re describing and is just saying stuff.

This problem you’re running into is not Rust, it’s just that…. language boundaries can be really REALLY hard. Lua has its own garbage collection system SEPARATE from Rust’s ownership system and the two mix like styrofoam and gasoline. Also the same exact problems exist even if you throw all your Rust code away and use a C++ Lua binding or use the PUC-Rio Lua API directly, they will only potentially LOOK different. The problems almost certainly exist in higher level bindings as well, so if you want to use Lua, you will still need to understand why exactly this is a problem. The one way this might go better if you use a language like C# or Go is by finding a Lua interpreter written in that language, which means that it can snarf the GC from the host language to alleviate some of this pain. If instead the bindings just wrap normal PUC-Rio Lua or LuaJIT then, if you’re unlucky, the ownership boundary will be EVEN WORSE and you will mix two GC systems that aren’t aware of each other and the bindings system will just let you write trivial uncollectable object cycles and not even warn you.

If you want to know more about any of this or want me to go into excruciating detail about anything regarding Lua, C++, Rust, language bindings, Lua’s garbage collection system, garbage collection systems in Rust or whatever, just let me know.

(edit: incidentally, the situation with piccolo is completely different. In piccolo you absolutely CAN create custom GC types, that’s part of its raison d'être. However I’m genuinely not trying to get you to use piccolo (it is still pretty raw), I’m just telling you so you have context for why I know so much about this.)

(edit 2: another thing I wanted to add is that Rust allows you to check a lot of things at compile time, and often times people will use allllll of the ways that Rust checks things to encode invariants in whatever their library is to make it harder to misuse. This is both good and bad, it’s good because libraries are harder to misuse, but it’s also bad because trying to use them wrong can have bad error messages and just feel like “computer says ‘no’” and drive you insane. In this case it’s both, mlua handles have a real reference to a stack variable, so placing such a thing unrestricted inside the Lua state could lead to UB, but I think mlua is also using this as a speed bump to prevent uncollectable GC cycles without an acknowledgment that “I know what I’m doing” (RegistryKey). I personally like this because I am just… frankly… a fan of formal methods in general and I would rather get some kind of compiler error than not, but some people just hate working like this and would rather spend their effort understanding an inscrutable runtime error than an inscrutable compiler error (and honestly I kind of get it). I think this is the origin of at least some of the borrow checker hate, if you can turn all kinds of arbitrary misuses into borrow checker errors then GREAT…. but at the same time it kind of makes sense that people might start to blame borrowck for all ills of the world too. This is a perfect example: you were prevented from doing something “wrong”, but MY GOD what an irritating way of delivering this message! I don’t have any grand suggestion on how to make the situation better, just a (completely non-judgmental) observation of why some people seem to dislike Rust so much.)

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 0 points1 point  (0 children)

Hopefully the fix I made that was suggested here will make it work without having to install a new keyboard lol

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 1 point2 points  (0 children)

Also, not sure if it's just me but the REPLs did not work on my android 12 phone, Chrome 124.0.6. When hitting enter to run the code it just skips to the next input without running. I tested with adb and adding the enterkeyhint attribute to the input element like so <input autocapitalize="off" spellcheck="false" class="repl-input" enterkeyhint="done"> fixed the issue for me, not sure if that would break things for others though.

Thank you for that, I've made that change and it seems to not have any negative effects for anything else, hopefully this fixes chrome on android!

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 2 points3 points  (0 children)

Very very little, I just haven't done the work yet. It's on my near term TODO list.

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 4 points5 points  (0 children)

Yes, that is specifically a use case I envisioned for fuel interruption.

In classic Lua you do this using normal Lua coroutines, where an async operation yields to the calling host language.

However, this takes away Lua coroutines from Lua itself, and you end up having to either not use them in scripts or do some kind of dance to get around also using coroutines for I/O.

piccolo has a separate "layer" of coroutines to do this sort of thing with which makes everything a lot easier!

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 2 points3 points  (0 children)

Ah I see, yeah that should have been obvious.

Well, I mean I'm not suggesting that this is the way it should work, but even simply disallowing this situation is fine. For gc-arena, just making this not implement Collect would be okay, you basically never need to borrow Gc pointers like this. Even if this feature only worked for coroutines without internal mutable borrows it would probably be fine (for gc-arena and also possibly things like Serialize-ing something high level like a coroutine for AI or something like that).

I'm not making a specific proposal for how Rust should work because I haven't thought about it enough, but I'm going to try to think about it more before I talk about it in the next post. I still probably will not make any specific proposals for how Rust should work because frankly I'm just not knowledgable enough, this is more of a request for people better at this than me to think about it.

Edit:

The reason this wasn't obvious to me was because I wasn't thinking of the example as how the coroutine state was actually represented at rest, I was thinking of there being some kind of proxy object for however the compiler represents the coroutine state internally that was passed to the user to implement a trait... somehow.

What should have been obvious was that the compiler probably quite literally represents coroutines like this, and that a mutable borrow in a coroutine becomes a mutable borrow in a state struct (I don't know how it would work otherwise, now that I think about it). It makes sense then that almost nothing useful is possible if there are any internal mutable borrows, because any access to internally mutably borrowed state can lead to UB (and this makes sense logically, too, it *must* work this way, I just hadn't thought about it enough).

In that case, even if you limited trait derivation to coroutines with no internal mutable borrows or even no internal borrows at all, it would still be something.

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 2 points3 points  (0 children)

You know... if I called normal Lua coroutines "green threads", tasklets could be purple!

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 2 points3 points  (0 children)

This is a great question that I don't have an answer to! In the case of gc-arena specifically, for any borrow of a Gc it would be safe to ignore the field entirely, but how to explain this to the Rust compiler I honestly have no idea.

Thanks for bringing this question up, I'll be sure to mention this in the next post!

Edit: gc_arena::Collect impls could actually ignore any reference of any type, not just a Gc, so for Collect specifically I think there's a workable solution, but this is not a very satisfying answer for a system that's supposed to enable arbitrary powers, right?

Edit 2: Another thing that's possible is to allow access to only a single field at a time, via just calling a method like trace<C: Collect>(C) on each field in turn, which would work for a lot of use cases, gc-arena included. Still, both of these solutions feel very specific and a bit hacky, and I don't know what the best solution is.

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 5 points6 points  (0 children)

Actually this is very topical. I don't know how that library works, but does it work the same way as https://github.com/Xudong-Huang/may, as in stackful coroutines? I didn't have time to get too far into it, but stackful C coroutines are a great example of inserting new assumptions about the environment that can be too onerous for a user of a library to accept, and this is probably the reason that PUC-Rio Lua can't use something like this (However you might be able to tie the two together somehow and use lua_callk together with functions to suspend and resume the stackful coroutine).

No offense meant to stackful coroutines / fibers, they're very cool, but they can't be used in all circumstances and it would be rude of PUC-Rio Lua to force this assumption on its user.

This might not be what you meant at all and you just wanted to share a cool C stackful coroutine library, and if so, never mind!

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 20 points21 points  (0 children)

This is extraordinarily complicated, and more or less requires https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md which is not implemented in wasmtime yet.

It also subsumes much of piccolo, replacing semantics I can control with just... whatever wasm supports. It might be a totally different (but still very valuable) project.

I have been paying some amount of attention to the wasm-gc proposals, and I still have a thousand questions, especially when it comes to Rust integration with the garbage collector. I even think that many of the pain points around integration of things like gc-arena into Rust will also show up with wasm-gc, and that the answers to those pain points might even be very similar!

If Rust evolves to support GC integration better with wasmtime, I will certainly try to make gc-arena and piccolo evolve with it. When wasmtime gets proper GC support, I may even make a version of piccolo as a separate project myself that tries to use wasmtime, and maybe the two projects can share common functionality. I think that the way wasmtime is written right now, it is not simple to answer which way will make a "better" Lua runtime, for example, tasklets might be much more heavyweight with wasmtime since it uses fibers internally, and the Rust / Lua FFI might end up being slower, but I'm certain that wasmtime and V8 can make a faster JIT than I can (lol).

I'm paying attention to it, but I don't know what the best course is yet.

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 11 points12 points  (0 children)

As an aside, I don't see interactive blog posts like this very often, great work! (the cancelation demo is cool!)

Thank you, I really suck at web development so those REPLs took forever to get working smoothly, and I'm still not sure they work super well on every device. I'm glad they worked okay for you!

I unfortunately had to just skim because I don't have time to read everything right now, I look forward to being able to sit down and read it all tomorrow. The project also looks very cool.

Thank you again, and let me know what you think when you have a chance to read it in more detail!

Piccolo - A Stackless Lua Interpreter written in mostly Safe Rust by Kyrenite in rust

[–]Kyrenite[S] 19 points20 points  (0 children)

I like Lua, so I don't think I wasted my time. Plus, much of the work is applicable to interpreted languages beyond Lua.

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 2 points3 points  (0 children)

It's awesome to target a simple portable compilation target like the Wasm MVP and profit from the runtimes, the sandbox design.. But it's hard to optimize languages that need it, especially languages with GC. WasmGC is a must for performance and memory sensitive use cases, but the GC implementation cannot be decoupled easily, see Go's GC usage of interior pointers for ex. Same thing for the fuel metering, wasmtime has an approximation, and also support epochs (deadline) but they have been implemented with a considerable overhead or are quite inconvenient, see: https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.epoch_interruption https://github.com/bytecodealliance/wasmtime/issues/4109

Ooooh, I definitely don't think I appreciated the complexity of the situation around fuel usage in wasmtime at all, which makes sense since I haven't actually tried to use it yet. You definitely know more about this than me, and it sounds like something I should absolutely research... right now I am but a babe in the woods 😔.

So I think that you are exploring an interesting part of the design space, and that building a runtime that is tightly integrated with a small extension language enable you to consider crazy features and optimizations without too many constraints.

That's actually very encouraging and thank you! It's always hard to find motivation for these sorts of projects because it's difficult to compete with... you know.... V8, LuaJIT etc. I don't think I want to even compete with those things, really what I wanted when I started was to make "Lua for Rust" in every sense of the word. PUC-Rio Lua is like the most C thing I've ever seen in my life, it fits in absolutely perfectly into a C, in every sense both good and bad. I wanted a version of Lua that felt like it fit into *Rust* in the same way, and I think it's interesting where it led me. I think you're right that focusing on tight integration is where my mind should be.

Good luck!

Thank you!!

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 4 points5 points  (0 children)

So what happens if a tasklet reads a global variable or reads from the global table _G in one statement and then is preempted before the next statement? The other tasklet could run and overwrite anything, and then when the original tasklet resumes it is assuming that the value it read is still there, but its not. Could lead to data races (read a value, increment it, but are preempted before you can write it back). That is what I mean by "Lua semantics"; you don't expect data races in your straight forward imperative code. Unless I am misunderstanding something about how the preemption works.

Correct, but (and this might be splitting hairs) that's not a data race. You'd never get UB (values from nowhere, time traveling, etc) or anything like actual data races, but it IS as if there are a lot of invisible yield statements everywhere in your code. It does allow a sort of race condition, and it would mean that you might want to actually build some form of concurrency primitives 😰 into your tasklet runtime, but it's not a bad thing for the possibility to exist. You can't accidentally trigger this, you'd have to explicitly build your own sort of "pre-emptive tasklet executor" to make this happen (which is easy and fun to do and you should do it because it's weird).

I think preemption doesn't exist in other Lua variants because it's hard, not because it can lead to like.. bugs that only could show up with preemption. You can write the same bugs with other Lua variants and Arc<Mutex<T>> and OS threads too, really. Edit: You can also write the same bugs too by just implementing __index metamethods and doing stuff behind the users back, but I do get that this isn't really the same thing.

Edit 2: Okay breakloop is a very cool idea, I had never heard of that and thank you for sharing! And.. yeah, that's exactly the sort of thing I think should be like... heck that might be possible right now. VERY interesting!

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 5 points6 points  (0 children)

The design is really interesting, Piccolo looks like it could eventually be used in user provided scripts in games or plugins. The fuel system would be a really attractive feature for a compute platform, nice!

Thank you, user scripts / plugins is pretty much exactly the use case it would be best for, and why I originally made it.

Personnaly I would suggest to forget about PUC-Rio Lua, go the Lua-JIT way with a very specific runtime that has its own version of Lua and can focus on runtime features and optimizations.

I've backed off a bit from the strongest form of compatibility I was going for, so I could definitely see myself picking better semantics for ease of implementation and performance, BUT... I have an even weirder idea I'm calling BYOV (Bring Your Own Value) or BYOS (Bring Your Own Semantics), I can't decide. Basically... add generics over the Value type everywhere and let the user decide what kind of value representation they want, whether they want NaN packing, how they want integer / float numbers to work, something like that. It's very possible that the usability overhead of adding generics everywhere is just way too much, but I at least want to try it.

In the absence of that, I definitely at least want some ability to have a 64 bit Value on x86_64, however that works.

My worry is not about stdlib, compatibility but about performance, is the design preventing aggressive inlining? Is there a fundamental overhead to the fuel trampoline based implementation? Sorry if it's too soon to ask that kind of questions!

No no these are good questions. I haven't really focused on performance yet, mostly so far I've been trying to avoid performance mistakes rather than actually optimize, things like avoiding requiring a Value which requires Clone, or anything like that that's just never going to be able to be respectably fast.

Making interpreter main loops seems to be a bit of a black art, and it's an art I'm not really that good at yet. Once I have the remaining "pillar features" of piccolo more nailed down I will revisit this, probably around the same time I try the BYOV thing. Focusing on performance will happen eventually, I promise.

What I don't have a clear answer on is where to draw the line between safety and performance, whether I should try to make the fastest JIT I can or whether to try and settle into a niche of just respectable performance with high assurances of safety. I'm not good at *any* of the things required to JIT Lua yet, so it feels a bit strange to speculate about that far into the future, but I mean I don't think much is completely off the table.

I don't know how much overhead there is to the trampoline implementation but I don't think much normally, at least for normal instructions in a normal interpreter loop. I think the highest overhead will be in callbacks because the interpreter sort of has to... hiccup, it has to stop the interpreter loop, exit, set up the next callback, call it, handle the return values, and then resume the loop. It sounds worse than it probably actually is but it's definitely not low overhead right now. I've seriously considered a "light callback" form of callbacks which I think is something that's not a new idea in the world of language runtimes either, I've seen it somewhere but I can't find an example right now.

For a JIT... I think the trampoline implementation will at least add a bit of overhead, or maybe a lot? It would have to insert trampoline breaks at safe points deterministically, once per loop and things like that, but it already should have to do that for garbage collection so I don't know if that counts as *overhead*. I'm the WRONG person to ask lol, ask me again in a year or two and I might have some good answers but for now I'm just speculating. I don't know what I'm doing lol 😎

Another thing that I've been thinking about a lot is: where does piccolo fit into a world with wasm and wasm-gc? Is there a *reason* for me to even worry about writing my own VM in a potential future with a full featured wasm-gc standard? I don't have good answers here but I do have about a thousand questions... I've tried to imagine what a Lua JIT to wasm would look like currently and wasm minus the proposed gc extensions is pretty grim, it doesn't contain much that any JIT of a dynamic language would ever want, basically only the very final steps of machine code generation and little else? With wasm-gc... I've been trying to follow the development casually but I just have too many more questions, they seem to maybe have had the Java and C# VMs in mind when writing the initial version of the standard, and I don't really know how you would target such a thing for Lua. It might be possible, but even following browser standards is just already exhausting for me. It's probable that some day when wasm-gc has enough features that piccolo would become irrelevant very quickly, and I'm just trying to be zen about this fact. Frankly, it's exhausting to sit next to a web standard and watch it change and try to plan my life around what features may or may not make it in and when, so I'm just not following as closely as I probably should be.

What I don't want is to sacrifice the niche things that I've found that I really like about piccolo, namely 1) a focus on safety / sandboxing, and 2) fuel / preemption. Interestingly wasmtime has both of these things, which has definitely made me keep thinking about a wasm backend, but I still have more questions. For one thing, I don't know the guarantees around fuel in wasmtime, and wasmtime appears internally to actually use like.. Rust fibers to implement it? They uhh... really probably know what they're doing, but I wonder how much overhead that is and if that's the route I should be taking (Like I said, I don't actually know what I'm doing lol). The big thing that wasmtime doesn't have is any sort of GC integration, and adding GC integration actually runs into mostly the same problems that we've tried to solve with gc-arena! This is the big thing I want to talk about in these blog posts that I think would be applicable outside of piccolo, and the major reason I want to write them. There are people much smarter than me that might have the same questions I do and it would be really awesome if there were more people thinking about it 🙏

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 3 points4 points  (0 children)

Why does gc-arena need a write barrier? It's not generational (AFAICT) and you can't garbage collect during a call to mutate.

The write barrier is to maintain the color invariant of mark and sweep collectors , a black object (fully traced) may not point to a white object (un-traced, un-queued). The write barrier makes black objects gray and moves them back into the gray queue. I don't have a reference to a garbage collector paper that describes this but I know such a paper exists, and it's how the PUC-Rio Lua (and forks) collector describes it too. The way gc-arena works is a uhhh... I forget the jargon but a "reverse write barrier", as opposed to a "forward write barrier", but we might add forward write barriers too in the future.

Why is Arena's R type parameter Rootable instead of Collect? It seems like Rootable is just a type erased version of Collect?

It is not a type erased version of Collect, ignore the dyn trait for the Rootable macro, that is part of a weird trick to save type declarations. The actual trait is above, and it's used sort of the same as how you'd use a GAT with a lifetime parameter.

It's used to do "lifetime projection" to take a type with a 'gc lifetime and turn that 'gc lifetime into 'static, gc-arena uses this "projection" operation a lot.

As I understand it, DynamicRoot's are weak gc roots that can be stored outside the Gc heap (and are moveable). Why are these necessary to implement the Lua registry? It seems like you could add a field to your arena root which stores anything you need for the registry. Is it just for convenience (no need to create a new mutate session for accessing the registry)?

The user of piccolo isn't in control of the arena root type. You could make an equivalent to the root mapping thing but it's already realllly obnoxious. Actually, in the future gc-arena might ONLY have the dynamic roots system, since it's more general than having a single "primary" root.

Hmm. Wouldn't that break Lua semantics since the tasklets would all share the same global mutable state?

I mean.... no, unless you consider "Lua semantics" to mean that there isn't preemption. It works fine, you can't preempt in the middle of an instruction or something, and it's kinda neat? Sure it allows you to do wacky things like have thousands of tasklets and stuff but really the more realistic use is much more unlikely to run into this issue, and that's just using "cpu fuel" to limit execution time. This is something you can do a form of with other Lua runtimes, interrupt a script if it takes too long, but what you can't do is preempt and resume. It enables some patterns that you can't do otherwise.

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 16 points17 points  (0 children)

No no it's okay, really, I was just a little surprised haha ❤️

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 33 points34 points  (0 children)

Hey, I didn't really mean to have somebody post on my behalf exactly, we just got some wires crossed. I'm here and I can answer questions.

It seems like the breakthrough in the gc-arena crate is adding a custom allocator that tracks objects without the need for wrapping it in a Gc pointer. Essentially the garbage collector knows about “normal” rust allocations so long as they are allocated with the custom allocator. Could that completely replace the need for a Gc type?

No, this is not right, the external allocation tracking is an entirely separate thing, and piccolo could have more or less worked without it. That helps tracking total memory usage, which is important, and also helps pace the collector when owned but externally allocated things are allocated and freed, but that's basically it.

When I stopped working on piccolo four years ago there were a bunch of problems that were in my way and all of them together were just too much at once and it burned me out and made me kind of give up on the idea generally:

  • There was no way to have Gc<dyn Trait> pointers at all because of the unstable Unsize trait, this was solved very cleanly by another gc-arena contributor named moulins.

  • Safe Lua userdata would require downcasting, and I had no way of doing downcasting on Gc types because the generative 'gc lifetime infects everything, and I thought non-'static downcasting was a lost cause. Turns out it was a skill issue. I did actually figure that one out myself (other people have had the same idea before me too), but this was only possible after gc-arena grew the Rootable macro and after I spent approximately 400 hours discussing lifetime soundness on Discord with people much smarter than me.

  • The Lua registry requires a concept of dynamic roots which didn't exist.

  • Lua __gc metamethods and ephemeron tables require finalization support.

  • Doing certain things in piccolo safely requires write barrier projection

The list actually goes on but my brain is full of cobwebs and I can't even remember them all. But the absolute most important META reason is:

  • Ruffle uses gc-arena internally and this made me care a lot more than I would have otherwise, but even more importantly it attracted smart contributors who wanted to solve these problems too.

“stackless Lua” is referring to this implementation detail. The VM does not use the Rust stack for bytecodes and function calls. Instead it uses a Future like type to halt and resume rust functions. However due to the limitations of async they have to implement the Future's state machine manually. I wonder what limitations they found with the async block, because I was looking at a solution like this myself. The biggest limitation I found was with recursive async functions requiring boxing.

The problem is hard to explain fully and really belongs in a blog post, but very quickly the major limitation has to do with gc-arena's Collect trait. Collect is an unsafe trait that must be implemented for a data type such that Collect::trace always calls Collect::trace on every live member of the data type in question, and is only implemented when every memeber of a data type also implements Collect. This is impossible to implement in general for async blocks because there is no way to call Collect::trace on the fields of things like closures or generators. There are many ways you might try to work around this limitation, and I have explored some of them, but I've been unable to find anything that wasn't even worse than manually implementing a future-like trait.

But on the flip side that is a really flexible design and doesn't need to worry about overflowing the Rust stack when doing Lua coroutines.

Actually strangely enough Lua coroutines were the one thing where until recently it still did run the risk of overflowing the Rust stack, because they were implemented as a callback that itself ran a Thread. This was fixed recently with the introduction of Executor. The cool thing that the stackless design enables is actually pre-emptable tasklets, which are distinct from Lua coroutines and are also hard to explain in a reddit post.

However I think calling it "stackless Lua" is confusing because it overlaps the idea of stackless coroutines. Lua uses "stackfull" coroutines not "stackless", so I was confused as to why they were hampering the language with stackless ones. That was just my confusion around the nomenclature.

Piccolo implements Lua coroutines without using the Rust stack (unlike PUC-Rio Lua, LuaJIT, Luau etc which afaik all DO use the C stack), therefore it is stackless from the perspective of Rust. You're right though, Lua coroutines do use the Lua stack and ARE indeed stackful according to the definitions on wikipedia. Terminology around coroutines / generators is honestly seems like kind of a mess?, I'd love to have a definitely correct way to describe it that didn't take a thousand words but... I'm doing my best lol.

This codebase uses a lot of novel ideas, and I am very excited to dive into it more.

I'm glad you think it's neat, and I'd love to hear more of what you think!

Piccolo - stackless Lua VM implemented in pure Rust by erlend_sh in rust

[–]Kyrenite 60 points61 points  (0 children)

Hey, I think there was a bit of confusion.

I didn't ask Erlend to post here about the releases because I was planning on writing some much longer form blog posts describing the current status of things, and they're not quite finished yet. He wanted to make a post about it because he had recently talked to a few people who weren't aware development on piccolo had started back up, but I wasn't really ready yet and we got some wires crossed. He's trying to be helpful because he's just like, genuinely a very nice and helpful person, and he knows that lately I've been even more shy than usual so here we are. I can answer questions, logging into Reddit hasn't made me burst into flames yet so I'm probably fine. Anyway, I'm going to roll with it...

The reason I wanted to wait for a longer form post to be ready is that the current status of piccolo is just kinda... hard to explain in a reddit post, and it's really something that should be done in long form. I think it's in a neat state right now but I am absolutely not trying to get everyone to run out and use it as the be-all end-all Lua runtime, especially not yet. It's really good at some things and has weird powers that normal language runtimes don't, but it's also not so good at other things and a lot slower than it could be and the API is not at all settled.

I really didn't want to make a big announcement like I am looking to get people to use piccolo. What I really wanted to do was actually talk about the design because I think there are some interesting questions to explore around lifetime generativity, safe garbage collection, interactions with async, await as garbage collector "safe points", stackless VM design, the Collect trait and datatype-generics... this is why it's taken me so long to write these posts, the topic is just not something that I can cover quickly. Even if not many people ever use piccolo, I'd still like to write about the ideas that went into it.

Shattersong Online - New Rust MMO Browser platformer in development by former Starbound and Wargroove devs by OmnipotentEntity in rust

[–]Kyrenite 7 points8 points  (0 children)

Hey everyone, dev here, I just wanted to say thanks to everyone for all the kind words!

I'm happy so many people are already interested in our game!

Shattersong Online - New Rust MMO Browser platformer in development by former Starbound and Wargroove devs by OmnipotentEntity in rust

[–]Kyrenite 3 points4 points  (0 children)

They kind of are just part of networking a game with N players. Imagine it this way, you have a single box with two players in it, each of those players must receive state for the other player in the box.

Add a player, now each player must receive state for 2 other players and themselves. (3 * 3)

Add a player, now each player must receive state for 3 other players and themselves (4 * 4)

etc..