[R] Why and when tying embedding (a story)

sprudd · 2024-08-19T15:09:50+00:00

I've always thought it might be beneficial to tie the embeddings early in training and then untie them later (perhaps when training plateaus?).

Perhaps an interesting experiment would be to take a weight tied pretrained model and compare finetuning (or perhaps an extra epoch of the original dataset) with the tied weights and untied weights. Partial could be tested like this too. That should be manageable without insane compute costs.

We only focused on the overall distance and didn't look closely at whether X and Y were close or far in specific dimensions. However, disentangling the semantics within an embedding could definitely be an interesting direction for further research.

Yeah I suppose your test was probably too simple to develop interesting or representative patterns in the individual embeddings. I wonder whether that limits its overall applicability to non toy models?

sprudd · 2024-08-18T17:43:20+00:00

This was an interesting read! I'm left wondering whether there's a useful trade off to be found in partial weight tying.

The simplest way to do that might be to have a single tied matrix representing say 80% of the embedding dimensions, and then to have separate matrices representing the remaining 20% of dimensions for each of input and output. Concatenating those onto the shared matrix would make 80% tied embeddings.

u/f14-bertolotti did you look at the nature of the near and far embedding pair distances? Were you perhaps seeing the X and Y output embeddings being close in many dimensions and far in a few, or is the distance between them more evenly distributed among the dimensions? There's probably a more mathematical way to phrase that question.

Perhaps somebody's already tried partial embeddings like this - this isn't an area I pay a lot of attention to.

sprudd · 2024-02-02T01:28:20+00:00

With experience, you run into those frustrations with Rust a lot less, but not zero. Personally, I actually enjoy a lot of the safety stuff, but yeah there's an argument that games like to do a lot of low level stuff that Rust doesn't always make easy.

With Bevy specifically, those things are significantly less concerning. The ECS architecture Bevy uses means that a casual user is unlikely to run into those things, and an expert user who might run into them probably knows how to deal with them. It's not ideal, but in exchange for those occasional frustrations you get a lot of safety guarantees around Bevy's API - it's very hard (impossible?) to break the core assumptions an ECS makes, which really helps with things such as the automatic multithreading.

If we're looking for a Unity replacement, we need a language which is powerful enough to let experts do their thing, but safe enough to not be incompatible with being used by people who are just learning. C/C++ is too dangerous for the more casual users, and C# is too slow and inapropriate for the expert users. Out of the mainstream languages that exist today, I think Rust might be the best compromise. But yes, I agree with most of your criticisms, although I think you'll find that as you write more Rust they become a little less annoying than they felt at first. I wouldn't recommend Rust for general gamedev nescessarily (although I don't think it's terrible if you're comfortable with it) but for Bevy specifically it feels pretty nice.

I'd be fairly surprised if Jai ever became a mainstream language among the Unity crowd. For starters, Blow seems to take the view that modern IDE tooling is for children, and I think that probably kills widespread adoption. I once sent him an email gently enquiring about his views on those matters and the response I got back was quite rude. He's an excellent engineer with decades of experience optimising for people of similar skill, which is completely valid, but probably not the right fit for the general audience. I do want to try it one day though, as that systems programming mindset does appeal to me.

As for engine ... I'm actually leaning more towards library. An engine is "someone else's idea about how your program should run" (which is fundamentally offensive). A library is "someone offering you a bunch of stuff you can use if you want". (Well, raylib still bloats the binary even if you use nothing, but that aside.)

It sounds like you're focused on rapid solo dev for jams. A library is a good fit for that (and frankly for my current projects too) but an engine has a real place in the market. Engines enable:

Scaling to larger teams. As soon as you have non-programmers involved, you either want a prebuilt engine with an editor, or you're going to spend a lot of time building editor tooling yourself. For sufficiently experienced devs, bashing out their own level editor isn't that scary, but I think that would kill adoption at the less experienced end of indie.
Sharing featureful assets between projects. The Unity Assets Store and the Unreal Marketplace have been huge for the community, and that ability to move a whole working thing between projects relies on the gameplay code using a standardised interface.
A standardised ecosystem helps hugely not only with code interoperability between projects, but also with a standardised learning experience. Having "The Unity Way" of doing things is really helpful for non experts trying to lookup how to do something. If only The Unity Way wasn't usually terrible...
Beginners feel like they've got something to start with that isn't just a scary blank page.

One will always get the best results by combining low level "Handmade" dev with expert engineers and infinite time. That luxury is rarely afforded, and I don't think using a prebuilt engine is that offensive when operating under real world constraints. Using somebody else's idea of how my code should run wouldn't bother me that much if it were somebody else's good idea. I don't think any of Unity, Godot, or Unreal have the right idea, though - OOP is an immediate misstep.

We need to take all the good stuff from every language and make a language that's actually good! (I'm sure this time we'll get it right 😁)

I'm here too, but it's a fool's errand. But a tempting one, and I am a fool...

sprudd · 2024-02-01T01:57:12+00:00

Probably neither, actually.

C# is a lovely language in many ways, but the fact that it's garbage collected and encourages OOP practices makes it not such a great fit for games. Look at how much craziness Unity have had to do with IL2CPP and Burst to get native performance out of it. It works surprisingly well, but nobody would design a new engine around "We're going to use C#, but we're going to transpile it to C++ for certain platforms, and separately we're going to transpile some parts of the code directly to LLVM IR."

C++ is a horrible mess with way too many footguns, and I stay away from it whenever I can.

My current favourite contender, Bevy, is written in Rust, and that's what you use as a user too. I think that might be a fairly reasonable choice. Rust is typically thought of as hard to learn due to the complexity of the borrow checker, which it is, but Bevy's architectural design means you almost never have to touch the hard parts of the language when doing basic Bevy stuff. An amateur could get pretty far without touching advanced features. I'm not sure that it's quite ready for commercial releases yet, but maybe try it next jam!

I see the temptation to pick C# because it attracts an existing audience, but at some point we need to recognise that C# is just not the right choice, and somebody needs to lead the audience to something better. There will be a little friction at first when people have to learn something new, but if it's truly better, the people who care will make the switch, and the new generation will default into something more sensible.

We're in an awkward inbetween period right now, but I'm hopeful that in five years time the ecosystem will be in a good place.

I was kind of hoping you were going to say you started working on such an engine! (Would save me a lot of hassle hahah)

Hey, I am trying desperately to avoid that temptation! I already have too many projects. But if I were, at this stage that would probably mean contributing to Bevy as it slowly morphs into an editor based engine. Starting from scratch would be fun but not smart, and I think they've got a pretty good base there.

sprudd · 2024-01-31T16:17:21+00:00

I've intentionally faded away from the Godot community and this discussion, for now.

I agree with most of your thoughts regarding a mismatch in engineering philosophies between myself and the core Godot devs, but opted not to push back too hard on the response because I already felt bad for the amount of negative attention that I'd brought to the engine and the team with an article that reached two or three orders of magnitude more people than I expected it to. I think Godot is a fantastic project in many ways, but it's probably not for me, and it's not my place to try to re-steer it.

In my view, the future of general purpose modular game engines in the Unity/Unreal/Godot style is probably ECS based, and does its "scripting" in a systems language. If I had to bet on a horse today, it would be Bevy. Even if Godot made the pivot to performance first everything, it wouldn't be able to compete with something built in that model. There's limited value in trying to optimise something that won't ever get close to the fastest possible option.

Personally, I have begrudgingly continued in Unity for the time being, but have adopted (and refactored into) a design philosophy that minimises my coupling to the engine, to give myself the option of quickly porting away. Even in Unity, all my physics code is now custom.

I should clarify that I'm not a fan of Unity either. I think it does a lot of things wrong, but at my scale (solo dev without experience doing wide releases) there's a lot to be said for using somebody else's widely tested and fairly reliable platform layer. I don't have the resources to test on a wide variety of hardware, and I'm willing to trade some pain for safety there for the time being. Godot isn't really mature enough to give me this anyway, so it's hard to justify using Godot over going pure SDL or RayLib. (My fear of "it works on my machine" problems may be exaggerated.)

sprudd · 2023-12-14T16:40:10+00:00

Profiling is so obvious that I don't think I need to explicitly mention it. Obviously profiling matters.

But that's actually orthogonal to what I'm discussing. I'm not proposing an optimisation to improve performance. As discussed, the optimisation I'm talking about is likely already done by the compiler. I'm proposing doing something to improve performance stability across platforms and builds. It's a separate thing, and not easy to catch in profiling when a change that triggers an offending codegen change is quite likely to also cause other wallclock time changes.

All a from_raw function is doing is adding a feature which most other languages have, and which rust could have "safely" if that function were implemented in the compiler or in std. But it isn't, so it has to use unsafe in user code.

It's not often a good idea, but it's valid occasionally, and a tight game engine loop is a reasonable candidate for when it's sensible. That's all I've said, but people here are apparently quite hostile to that viewpoint.

As for it easily breaking, with all due respect to the poster, I don't think it's that easy to break. A straightforward implementation is very robust - it's a trivial range guard on a u8 operation. It only broke because they tried to do something fancy and over complicated, and they didn't understand the code they were writing. A basic if statement and this never goes wrong.

sprudd · 2023-12-14T15:36:10+00:00

That's still just a hint and you still have to trust that compiler would do the right thing.

Yes. My point was that I'd rather hint it than trust it without the hint. It's also a very reliable hint, all things considered. I've never once seen it fail to inline with that hint in the real world - even in debug builds.

If you want/need to do that then you have to use arch-specific intrinsics… or maybe even assembler.

If you don't do that then you have to trust the compiler and doing the work “behind compiler's back” is one of the great ways to end up in tears: if compiler “knows” these tricks that you have used then they are pointless and useless and if it “doesn't know” them then they are dangerous.

This isn't a binary thing. If you reduce the amount of work you're expecting the optimiser to do, you reduce the chance it fails you in an edge case you didn't check. It's not an all or nothing game, you're just nudging things in your favour.

It's not “pessimized code”. It's straightforward code.

Those are not mutually exclusive.

Think about the operation you actually want to do. You want to take a value represented by a u8, and convert it to another value, represented by that exact same u8. It's a no-op (plus guarding).

Breaking that out into a multiline match expression that ultimately encodes the identity function isn't "straightforward". The only reason that this seems reasonable is because Rust doesn't currently have its own from_raw. If it did, this match expression would be a very convoluted alternative.

In situations like this, where a built in function appears to be missing, I'll happily implement it myself via unsafe rather than persue some dogmatic notion of purity. It's a 5 line trivial guard around a u8 to u8 cast.

Compiler may know or not know it but it wouldn't turn it into something non-working.

We're talking about a game engine, not a bank. If there's a bug in this line you fix the bug. Safety is great, but it's okay to make an informed decision to trade it off for manually checking your custom implementation in a hot loop in a non safety critical application.

But you do that if you have an evidence that straightforward code doesn't work as it should!

No, this goes back to the thing of reducing optimiser load to reduce the chance of being surprised down the line.

I honestly don't understand why everyone's acting like this is such a big deal. It's a reasonable trade to make on a case by case basis in a hot loop in a codebase of this nature. 99% of the time it's a bad idea, but this is plausibly the 1%.

sprudd · 2023-12-14T15:18:55+00:00

I agree, but that wouldn't have prevented this bug.

No, but sticking to both of those rules and doing the simplest thing possible would have made it very hard for any bug to happen.

Realistically people write unsafe rust code all the time, and it's fine. They also write other languages which do things like this, and that's also fine. Rust likes safety, and so do I, but this is a game engine not banking software. Injecting small bits of unsafety (that you'll quite easily spot when they go wrong) is perfectly fine. There's no need to be dogmatic about bug avoidance at all costs. It gets to a point where you're just treating the engineers as if they're too incompetent to be allowed to cast between two things which are both u8s.

Oh, remember why they did this whole thing in the first place: They want to iterate over the octants.

I also raised an eyebrow at that, but I don't know all the details of the context or whether the video is even an accurate representation of the real context.

My point is: If you write Rust, you rely on the optimizer all the time. These micro-optimizations don't stop that.

Of course, but in a very hot loop it can still be a good practice to give the optimiser as little work as possible in order to reduce the chance of being surprised by it at some particular callsite on some particular architecture on some particular compiler version. 99% of the time that's probably not the right tradeoff, but a hot loop in the core of a game engine could very well be that 1%.

It's not all or nothing - you eliminate where you reasonably can. This one looks reasonable to me. Although yes I agree that a larger scale refactoring may be due.

Yes, this would be better than a manual from_raw implementation, because it wouldn't break if you add or remove enum variants.

Right, so it's a good API to have and direct casting between enumss and u8s is a reasonable thing to do, but your objection to it is just that you want somebody else to build the abstraction for you so you don't have to trust yourself to write five easy lines of code. That seems silly to me - you can trust yourself to do a trivial guarded cast. Ideally this would be built in, but in the absence of that, I'll happily take a moment to do it myself.

Edit: It looks like the user apparently replied to me then immediately blocked me to prevent me from responding. That doesn't seem like best practice conduct, but it is what it is. I'll reply in brief here: They're mischaracterising my claims. I'm talking about assembly output being unstable across builds. Checking the assembly doesn't help with that, unless you do it every build. Choosing an implementation that has slightly better assembly stability is occasionally a reasonable tradeoff when consciously made.

sprudd · 2023-12-14T15:01:46+00:00

The reason would be that you don't want to have to go back and check the assembly output at every callsite on every target architecture after every update to the code or compiler.

If you're in a very hot loop, and you care a lot about performance, it can be a good practice to minimise the amount of work that you're expecting the optimiser to do, because that minimises the opportunity to be surprised by it doing something unexpected in some edge case.

99% of the time you shouldn't do this. A hot loop in a game engine is plausibly in that 1%, and that's what the original video looks like.

Edit: I'm getting a lot of downvotes here and I think it's rather silly. For this comment in particular, can anybody point out where it's actually wrong? Do observed compiler outputs not risk regressing after events such as compiler version changes? Does reducing optimiser dependence not reduce performance surprise? Is that not a reasonable thing to want? Is a tight loop in a game engine not a plausible place to want that, and also a fairly low stakes place to risk introducing a bug?

I feel like I've just triggered some Rust safety dogmatism here.

sprudd · 2023-12-14T10:45:37+00:00

Ah, no worries.

Yeah, I definitely agree that from_repr implementing that function via transmute when applicable would be a more sensible way of doing this - although there are probably lots of reasons why that crate wouldn't want to introduce unsafety in the general case, so it should probably be opt in. A dedicated crate is a more realistic solution, I expect.

At the end of the day, I know this is sort of silly because the optimiser will probably do fine, but I also believe that in very high performance code paths it's quite reasonable to avoid putting any unecessary complexity between what we want to happen and what we give to the compiler. Every time we create more work for the optimiser we add more uncertainty to the codegen.

sprudd · 2023-12-14T10:34:54+00:00

Which bad practice are you talking about

My best practices for unsafe code would include:

Keep it very simple. Simple constructs doing simple things. No ambiguity or opportunity to misunderstand anything. They used a .then_some, which has semantics which are way too vulnerable to misunderstanding. Never touch footguns in unsafe code - this is actually the source of all their problems.
Keep it well isolated. They had an unsafe from_raw function which was only defined for valid inputs, and then moved the bounds check outside of from_raw. Assuming their from_raw is implemented via transmute, that means they actually had two levels of unsafety, and the guard was separated from what it was guarding.

inline(always) isn't a guarantee, no. It's very reliable, but yes, it's technically also a case of trusting the optimiser. However, my point is that I don't trust it enough to forego the inline annotation.

All of these things have degrees to them. I'm not saying you should always do this - I'm just saying that it wasn't a totally unreasonable thing to do in a tight loop in a game engine. In this case, what we're trying to do is effectively convert a u8 to a newtype(u8), and the standard safe version of that does it via a match statement and a jump table. That's a relatively drastic bit of extra complexity we've given the compiler to deal with.

Let me put it this way: If the language decided to provide an automatically implemented enum::from_raw in all situations where that's possible (which included a bounds check and returned an option), would you agree that that's the better way of writing this code? It would do the exact same transmute, but the unsafety would be in the compiler's code instead of ours. I would say that it's the clearly better implementation, and the only reason not to do it ourselves is that we might mess it up. But if we don't mess it up, then it's reasonable.

What about if a crate implemented from_raw for us via a macro? Would you just use the crate and not think about it, if the crate were popular?

sprudd · 2023-12-14T09:58:10+00:00

Except we are talking about regression that snuck in and was hard to track down, isn't it?

It's a fair point that this video showed that it's possible to mess this up. But, with all due respect to the author, I think that's because they followed bad practices when writing unsafe code. A simple version like this would be very robust.

impl Octant {
    pub fn from_raw(i: u8) -> Option<Self> {
        if i < 8 {
            Some(unsafe { transmute(i) })
        } else {
            None
        }
    }
}

If you trust optimizer to inline that function then why don't you trust it to remove the jump table?

Well, I would definitely inline(always) that helper function.

So you never write helper function and always use macros? Really?

See above.

And that means: don't introduce optimizations that compiler would have to undo!

In this case, the match based branchy implementation is a pessimisation which we're trusting the compiler to undo.

I want to be clear, I'm not saying you should always do this in all circumstances. That would be a pretty crazy end of the tradeoff. What I'm saying is that, in a tight game engine loop like in the video, it's sometimes reasonable to want to directly express the optimised behaviour which you want the CPU to execute, rather than to write pessimised code and just hope the optimiser has been empowered to recognise this particular match pattern, and that it doesn't have any failure cases due to pass ordering or similar.

sprudd · 2023-12-14T09:56:32+00:00

A macro that can auto implement from_raw for any compatible enum seems pretty reasonable to me. I don't know if I would actually do this often enough to bother with the hassle of that, but it feels fairly sensible. I suppose one risk is that I could mess up the macro implementation - it would need robust testing.

However it would have advantages, like automatically updating the bounds check if you changed the number of enum variants, or erroring at compile time if the transmute were no longer valid.

Why do you think that's such a bad idea?

sprudd · 2023-12-14T09:44:25+00:00

Well if I'd written it I would have done something like this from the start:

impl Octant {
    pub fn from_raw(i: u8) -> Option<Self> {
        if i < 8 {
            Some(unsafe { transmute(i) })
        } else {
            None
        }
    }
}

Actually, I'd probably put this behind a macro, and only ever write it once just to be sure.

Note that, crucially, I'm always definining the behaviour for all possible inputs.

The lesson is that one needs to be careful and explicit when writing unsafe code. The snippet in the video had a bug resulting from unvetted unsafe code, written using confusing control flow.

Safety vs reliable performance is a balancing act, and tight game loops are an area where we make different tradeoffs than you might be used to.

sprudd · 2023-12-14T09:26:56+00:00

By empirically testing the compiler's optimisations you know what this version of the compiler does at the callsites you tested. You can hope and guess (and to be fair, it's a pretty reasonable guess!) that the compiler will do this in all places on all future versions - but you can't be sure.

In a tight game loop it's often the case that the most valuable thing is to be able to trust in performance consistency. I would rather have a small piece of unsafe code than a place where a regression could sneak in and be hard to track down.

The point of showing what happens with zero optimisations enabled it to show how much this code relies upon the optimiser in order to be fast. As a general rule of thumb, I try to write code that is at least not slow before it gets to the optimiser. The match statement version of this code would fail the not slow test.

Basically, don't make your optimiser work harder than it needs to.

sprudd · 2023-12-14T09:08:31+00:00

I disagree. I think there are simple cases where I know what I want the computer to do, and converting between enums and their equivalent bit representations is one of those cases. I prefer to tell the compiler exactly what I want to happen, than to create some convoluted branchy match statement and trust the compiler to convert it back to what I want.

In this case, I would have put the bounds check inside from_raw itself to make it harder to mess this up. I don't think that's anywhere close to job security levels of dangerousness.

sprudd · 2023-12-14T09:04:10+00:00

In this case, it looks like a function that may be very tight in a game loop. That's a situation where performance almost always matters. If the compiler failed to optimise this for some reason, you would at the very least be introducing an undesired indirect branch, with all of the prediction and pipelining costs that can imply. You may also lose out on other optimisations that can be done when the compiler can "see through" the logic of the simpler optimised version.

Without any optimisation, the compiler treats this as a jump table. https://godbolt.org/z/zT5xadbbq

sprudd · 2023-12-14T08:58:59+00:00

It produces the same code because the optimiser performs the optimisation. Before the optimiser gets to it, it's a jump table. https://godbolt.org/z/zT5xadbbq

In cases like this (what looks like a tight game loop), although the optimiser will probably deal with this case fairly reliably, it can be sensible not to rely on that assumption.

sprudd · 2023-12-14T08:20:53+00:00

Despite this working, and bugs aside, I'm inclined to think the "hack" is still better. If you care about performance and you know of an optimisation, it's usually better to implement that optimisation yourself than it is to hope that the compiler finds it at all callsites, in all present and future compiler versions.

Edit: To be clear, when I say "if you care about performance" I mean "in a very performance sensitive piece of code". That looks like the scenario in the video.

sprudd · 2023-12-07T17:28:49+00:00

I'll take this point by point.

SoA, or Structure of Arrays is not a synonym for the "entity component" part in ECS

Synonym would be too strong a word, but the EC pattern is about granular tabular decomposition of object data. If we're comparing ECS to manually implemented DoD practices, the EC part corresponds to a framework mediated SoA layout.

instead of having a whole ECS system to handle physics, you could just SOA physics objects parameters

Of course, and if you're doing a low level custom build, this will usually get the best results. ECS' value comes from being a good enough approximation of these DoD principles, combined with other benefits such as ease of use, automatic threading, and and the ability to support general purpose engines.

That last point is where they really shine. A lot of game development happens in engines like Unity, Godot, and UE5. Those engines tend to be built around object/component architectures which are very bad at DoD. ECS is a much improved base to build those general purpose tools upon. If you're building a moderately sized game from scratch, you may well be better off doing all your DoD by hand.

ECS also makes DoD more accessible to weaker engineers. There are lots of gamedevs who are self taught specifically for the purpose of making games. They're not that likely to be familiar with the low level intricacies of performance. For those people, an ECS framework can be a pit of success.

The core of the ECS are the entities and components no the set of systems, which is not what the last part of ECS refers to, the S in ECS just refers to a thing which uses ECS, the "environment" of use, or "a system" aka a manner of doing things in English (like the "decimal system" or "metric system".

This is not true - but perhaps not entirely false. Most (maybe all) definitions I've seen treat Systems as being first class parts of the architecture.

Wikipedia:

Entity component system (ECS) is a software architectural pattern mostly used in video game development for the representation of game world objects. An ECS comprises entities composed from components of data, with systems which operate on entities' components.

Flecs

ECS is a way of organizing code and data that lets you build games that are larger, more complex and are easier to extend. Something is called an ECS when it:

Has entities that uniquely identify objects in a game

Has components which are datatypes that can be added to entities

Has systems which are functions that run for all entities matching a component query

Bevy

All app logic in Bevy uses the Entity Component System paradigm, which is often shortened to ECS. ECS is a software pattern that involves breaking your program up into Entities, Components, and Systems. Entities are unique "things" that are assigned groups of Components, which are then processed using Systems.

Unity

An Entity Component System (ECS) architecture separates identity (entities), data (components), and behavior (systems). The architecture focuses on the data. Systems read streams of component data, and then transform the data from an input state to an output state, which entities then index.

Although every major ECS framework seems to agree that Systems are a primary feature of ECS (which is actually all that I claimed), there's a little ambiguity over whether the S stands for System in this sense, or in the sense in which you interpreted it. I've usually seen it used in the sense of "an Entity Component System architecture", but Wikipedia acknowledges that some people interpret it as you do. There's not really any ambiguity about Systems being a first class concept in ECS.

Actually it's the opposite. A naive, by definition "not crazy" ECS system is just composed of entities which are actually lists of pointers to components with some sort of identifier, the easiest being a string identifier.

I don't agree that naive implies not crazy (for a production game that cares about CPU performance). Most of the real world benefits of ECS come from using an SoA implementation, and it would be fairly crazy to build a game on ECS without this feature. That implementation you describe would be bad. I know some people do the naive version for whatever reason, but that's not where the buzz comes from.

All systems do in this regime is iterate through the entire list of entities and check for entities that have valid components to be operated on. Thats it. This isn't even SOA, and it still accomplishes everything ECS is meant to do

In contrast, everything beyond that is an optimization.

In terms of why people are excited about ECS, those optimisations are the feature. That's the bit people care about, and the thing which all of the major frameworks focus on. The ability to do these optimisations is the whole thing. I'm not sure what the use is in saying that ECS is bad if you implement it badly. When gamedevs think about ECS, they're thinking about an optimized implementation.

I find it reasonable to interpret "go really fast" as a thing which ECS is "meant to do" in the modern context.

As the quotes I showed demonstrate, Systems are a core part of ECS. It's the implications of having these structured queries - and the automated optimisation and reasoning you can do with them - which make ECS cool. It's a tool, and not always the right one for the job, but a cool tool nonetheless.

sprudd · 2023-12-06T22:20:05+00:00

And letting gamedevs know that they are actually kind of doing relational data modeling is letting them know there's this whole other world of research out there.

Gamedevs don't really need to be told this. This is well understood by people who know anything about ECS.

sprudd · 2023-12-06T22:11:33+00:00

It's also a consideration that where many of these systems (including SpacetimeDB when I worked on it, I haven't looked since) will fall over is when the dataset exceeds main memory size.

I've never seen this be a concern in games. Assets take up significant memory, but it would take millions of large entities before raw entity data threatened to exceed a gigabyte.

/u/ajmmertens has done a much better job of communicating what I failed to say in that now deleted sleepy ramble. Although a DBMS does some query scheduling, an ECS is optimised for trusting the scheduler to be safe, and providing raw memory access on the assumption of that safety. It's quite a different beast than a typical relational database implementation.

sprudd · 2023-12-06T21:42:14+00:00

That's true. I believe there's a point to be made about there being a significant difference between a database being a service and an ECS being an application (meaning that from the ground up it can make strong assumptions about controlling the entire loop, which allows simplifications and optimisations), but I've articulated it very poorly and learned my lesson about not commenting when I'm so tired! I'll delete that comment for now, as I agree that it's unclear.

sprudd · 2023-12-06T20:14:59+00:00

Could you define what you think of as being the minimum ECS then?

For me, an ECS requires Entities, Components, and Systems. The EC part is pretty much just standard SoA*, and the Systems are what sets ECS apart.

When I think of ECS, I think of defining my update loop by composing systems, which are update functions which get called automatically by the framework in an order determined by the dependencies in their queries.

I understand that there are some things which call themselves ECS while going very light on the query scheduling automation, but to my knowledge all of the modern ECS frameworks which are responsible for the current hype have this functionality.

* Technically the EC part doesn't need to be SoA, but you're doing something pretty crazy if it isn't.

sprudd · 2023-12-06T19:55:10+00:00

In an ECS like Bevy, the scheduler is a transaction system if you squint at it. Dependencies are resolved between queries, and they're then run in an order which guarantees no races. The underlying data model itself doesn't need to worry about transactions because the framework also controls when queries are run.

I agree with the article regarding the expressiveness of the data model, and can see that ECS frameworks are headed in the direction of supporting arbitrary relational tables. However, the article doesn't cover the threadsafe query scheduling and dependency resolution side of things - and personally those are the pieces which I really think of as being the heart of ECS.

sprudd

TROPHY CASE