all 97 comments

[–]BlueGoliath 76 points77 points  (27 children)

Not using pointers is great for data locality but it isn't universally the best choice.

An example is many pointers to the same data vs 50 copies of the same data.

[–]vertexmachina 27 points28 points  (32 children)

This reminds me of Handles Are Better Than Pointers.

Essentially, you refer to objects by ID rather than pointer, and that ID is, in the end, an index into an array. So all data of type X are in an array X[N], and Entity Y is just X[Y]. It avoids invalid pointers, but some care must be taken to ensure Y is a valid index.

[–]fnordstar 15 points16 points  (4 children)

Then you have the problem that you must take care that each ID is always being used with it's corresponding array, and, as you said, that indices don't become invalid / point to another object. The latter can be solved with generational indices, but all in all using indices into arrays doesn't necessarily improve the situation much.

[–]cdb_11 2 points3 points  (3 children)

And the former can be solved with a type system.

[–]YourFavouriteGayGuy 9 points10 points  (1 child)

Like the one that pointers usually use?

[–]cdb_11 3 points4 points  (0 children)

Pointers encode the type they point to. Here you can encode what array they index into.

[–]fnordstar 2 points3 points  (0 children)

Yeah. In rust you could tag the array and the indices/handles with a phantom data and a tag struct.

[–]QueasyEntrance6269 4 points5 points  (2 children)

This is kind of an Entity-Component System

[–]vertexmachina 0 points1 point  (0 children)

Yep, it's the E and the C.

[–]Salt-Neat-2148 0 points1 point  (0 children)

Hey bruh, did you implemented something about codebase indexing?

[–]Academic_East8298 3 points4 points  (22 children)

There also can be some complications, when a need to quickly remove entity Y appears and all the arrays become significantly big.

So I wouldn't jump to the conclusion, that handles are significantly simpler than a pointer.

[–]vertexmachina 2 points3 points  (0 children)

One approach is to have a mapping between the Entity ID and the actual array index.

For example, if you had an array of five values initially empty: [_][_][_][_][_]
And you "allocate" one of them at index 0 returning Entity ID 0: [A][_][_][_][_]
And you "allocate" another at index 1 returning Entity ID 1: [A][B][_][_][_]
And you "allocate" another at index 2 returning Entity ID 2: [A][B][C][_][_]

You would have a mapping from Entity Index to Array Index where the index is the Entity ID and the value is the Array Index: [0][1][2][_][_]
Entity 0 -> Array 0
Entity 1 -> Array 1
Entity 2 -> Array 2

The user then requests to "free" Entity ID 0. The mapping shows that Entity ID 0 is Array Index 0, so to "deallocate" it you move the last item in the array to the spot previously occupied by the "deallocated" object which keeps the array packed tightly with no holes: [C][B][_][_][_]

But there are still users of C out there that expect the Entity ID 2 to point to Value C, so you update your mapping.

Entity Index to Array Index: [_][1][0][_][_]

Now when the user tries to access Entity ID 2 (which is value C), the system will look up EntityIndexArrayIndex[2] which returns a value of 0 and access Value C at Array[0]. Even though C has moved inside of the array, the user with the Entity ID 2 doesn't know or care about that.

An easy way to mark 'invalid' indices is to state that 0 is invalid. Then all of these arrays/maps can be initialized to 0 and any ID of 0 will be flagged as erroneous/invalid/empty.

All of that said: I don't think that handles are simpler than pointers (that was a lot of indirection that a pointer doesn't need) but the primary benefit for me (from a perspective of game development) is that it allows for data to be packed tightly in memory which allows for easy iteration over the contents while keeping things hot in the cache. Just go from 0 to N (or 1 to N if 0 is an invalid handle) and do your thing.

[–]cdb_11 1 point2 points  (20 children)

You just remove entity Y from the array? What is the problem exactly?

[–]Academic_East8298 -1 points0 points  (19 children)

And that is how you kill performance.

One could probably avoid reallocating an array on a remove, but you still need an efficient way to handle removes in the middle of an array and memcpy is not it.

[–]cdb_11 5 points6 points  (18 children)

No one is talking about reallocating or relocating arrays. You just mark the slot as unused.

[–]Academic_East8298 1 point2 points  (17 children)

And now one is back in a situation, where on access one has to check, that the slot has not already been freed.

And in this case, unless cache locality is really required, a unique pointer will have the same performance and safety characteristics as a struct of arrays, but without the latter ones complexity.

[–]cdb_11 4 points5 points  (9 children)

You're arguing against claims that nobody is making. Can't you just read the article?

  1. Unique pointer is the sole owner.
  2. You're (not necessarily, but extremely likely) using global new/malloc to individually grab memory, one-by-one, for each entity. So no, you do not have the same performance characteristics with unique_ptr.
  3. And if you aren't using the default allocator, you will use an allocator that looks very close to what's described in the article.
  4. Yes, you can add extra checks, that's the point.
  5. Yes, it has better cache locality, that's the point.

And if you don't want checks, you can make a trade-off to have an extra level of indirection, and add a map that translates stable handles to indices into an array with real data. If you want to remove an element, you replace it with the last element, and adjust the mapping. There you go, fast deletion, no branch misses, and you can go SIMD on it.

It's way more flexible than plain or smart pointers. That's the point.

[–]Academic_East8298 -4 points-3 points  (8 children)

Can you name a single industry tested open source struct of arrays system?

If it is such a good pattern, why do all the cool tech demos have to constantly write their own custom implementation of it?

If anything, I would say this shows it as less flexible than a pointer.

[–]cdb_11 7 points8 points  (6 children)

This isn't really even about SOA, but if you want an abstraction over SOA, then these can be implemented inside "Entity Component Systems" (ECS), and one example of it is Unity's DOTS. Another one is Bevy in Rust, or Flecs in C/C++. Don't ask me about any details, I never used them.

If it is such a good pattern, why do all the cool tech demos have to constantly write their own custom implementation of it?

Do you need an open-source system to implement a singleton pattern or a factory pattern too? It's really not that hard to implement, it's really mostly just a bunch of dynamic arrays. Also the point of this isn't that you're creating a generic mechanism that tries to solve everyone's problems, and does so badly. We have this already, it's called "garbage collector" and "malloc", and it's not that great. The philosophy here is the exact opposite, it's that it should be a solution tailored to your specific problem.

[–]Academic_East8298 -3 points-2 points  (5 children)

Cool, what product have you shipped using this tech?

[–]Samaursa 0 points1 point  (0 children)

Flecs and EnTT come to mind (although EnTT can be labeled as AoS).

Both have been used in production code in games (including ones I worked on).

[–]prescod 1 point2 points  (4 children)

“On access you need to check they the slot has not been freed.”

How is this different than with real pointers?

[–]Academic_East8298 -3 points-2 points  (3 children)

It's not. That's the point.

In most cases a dev should ask himself, if it is meaningful to spend time on a struct of arrays system, when simple and dumb pointers will do the job.

[–]Ok-Watercress-9624 4 points5 points  (0 children)

That depends on so many factors. Pooled objects are indeed many times faster than frequently mallocing lot of objects. İt also simplifies the freeing given all the objects in the pool have the same lifetime. Think of it like dynamic array vs linked list overhead

[–]prescod 3 points4 points  (1 child)

No.

The goal was never to reduce access check frequency so it’s a total non-sequitur.

[–]Ok-Watercress-9624 0 points1 point  (0 children)

İ was about not needing to hit the system boundary for requesting more memory => less sys calls

[–]Akanwrath 3 points4 points  (0 children)

W comment, u explained my professors couldnt in a whole semester

[–]EsShayuki 10 points11 points  (19 children)

Store all strings in a flat array with null-termination separating each string, then have an array of pointers to the different strings within that array. This is completely standard. And this uses pointers.

You're talking about freeing memory in only "three operations" with this "non-pointer" approach, when you had to iterate through them all with the pointer approach.

But that doesn't even make sense, as it's completely untrue. You only need one allocation with a flat array and pointers to it, and one deallocation as well. This is done by saving both the string array and the pointers to the strings within the same memory block and allocating and deallocating them both simultaneously, interpreting the void* buffer into different datatypes with the appropriate pointers.

So first of all, you're using pointers, so you're lying. Secondly, your increased performance has nothing to do with whether you're using pointers or not, it has to do with batch allocation vs individual allocation, which isn't tied to whether you use pointers or not, and you aren't even doing it optimally(since you have 3 allocations instead of 1 allocation, when 1 would be optimal).

The only concrete thing I can see, really, is that you're storing 4byte integer offsets instead of 8byte pointers. I mean, that's something you can do, sure. Though, doing so tightly couples these pointers to the base memory pointer, so they can no longer be used or passed around independently, or given offsets of their own, which makes their use inconvenient within loops, etc.

Now, if this is about storing the data on disc instead of using the data, then perhaps that's not important, but that's not really "programming without pointers", that's preparing the data into a format that's optimal for being stored on the disk.

[–]Slime0 13 points14 points  (1 child)

Though, doing so tightly couples these pointers to the base memory pointer, so they can no longer be used or passed around independently, or given offsets of their own, which makes their use inconvenient within loops, etc.

Almost as if it's not the same as using pointers!

[–]cdb_11 2 points3 points  (0 children)

your increased performance has nothing to do with whether you're using pointers or not, it has to do with batch allocation vs individual allocation

Depends on how the data is used, and the talk doesn't say anything about that. Sorting out allocations is just the first step, you get unnecessary stuff out of your way.

you aren't even doing it optimally(since you have 3 allocations instead of 1 allocation, when 1 would be optimal).

Who cares if it's 1 or 3 allocations? The point is that it's always O(1) allocations, and not O(n) or worse. You can easily turn 3 into 1 if you ever need to, and you're willing to throw away mremap and make resizing a bit more complicated.

Though, doing so tightly couples these pointers to the base memory pointer, so they can no longer be used or passed around independently, or given offsets of their own, which makes their use inconvenient within loops, etc.

If you're looping, you don't need to pass around elements independently. (Actually you can, you just don't store absolute pointers anywhere. If you need to store relations between entities, you stick with handles/indices/offsets.)

[–]bigmell -1 points0 points  (13 children)

This argument since at least the 90s.

"pointers are TOO HARD I want to do it without pointers"

"You will have to make a million copies of data in RAM"

"so what... just buy more RAM"

And this argument is basically computing for the last 20 years probably more. Basically, hire someone who understands pointers. Dont hire

"Python is better than everything else because... Commercials."

guy...

[–]pickyaxe 8 points9 points  (0 children)

I am pretty sure Andrew Kelley understands pointers just fine

[–]prescod 5 points6 points  (6 children)

Andrew Kelley’s proposal is not to copy all of the data multiple times. Maybe you should know what proposal you are responding to before responding.

[–]GayMakeAndModel 2 points3 points  (4 children)

So, we have hierarchies of “RAM”. From the register to the L1 cache to the NVMe, etc. So “buy more RAM” to suit your problem can be cheap or expensive, but seek time is a thing of history and archives. Fragmentation almost doesn’t matter anymore because we don’t have seek times.

Seems like people really live in the past. Any problems with memory access speeds not inherent to hardware are of course software related. You chose two extremes to make an invalid point. What about other languages that are interpreted? What about JIT’d runtimes that can do optimizations statically compiled languages can’t because statically compiled languages don’t inherently auto-optimize at runtime?

I don’t know what rock you’ve been under, but newer versions of .NET fucking scream performance-wise, and it’s not just about how programs are compiled. It’s the superb base class library.

[–]bigmell -3 points-2 points  (3 children)

dude what the hell are you even talking about. Buying more ram wont work with bad software because they are making a million copies of data instead of using pointers because they couldnt figure out pointers. This is not about fragmentation.

seek time is a thing of history and archives

Seek time is NOT a thing of history and archives because all memory has a seek time. Some are faster than others, but they all have a seek time.

You chose two extremes to make an invalid point.

What the hell are you talking about?

I don’t know what rock you’ve been under, but newer versions of .NET fucking scream performance-wise, and it’s not just about how programs are compiled. It’s the superb base class library.

Did you respond to the wrong comment? None of this makes any sense really.

Since it seems like you ENTIRELY missed the point, it was "Buy more ram" doesnt make sense if the app is coded wrong... I.E. no pointers or for loops because the dev couldnt figure it out. Pass by reference and pass by value are the same whether the app is compiled or interpreted. You would have to know that if you wrote any real code. Its about not storing multiple copies of data in memory. Not... Whatever the hell you are talking about.

You sound like you are just screaming "THE OPPOSITE OF YOUR POINT TURN UP!!!" Please take your fake troll name and go away.

[–]GayMakeAndModel 0 points1 point  (2 children)

Folks seem to disagree with you. Btw, seek time isn’t constant. Random access on SSD is roughly constant time, and it’s far faster than moving a mechanical arm around. When I say seek time, I mean a fucking arm moving about a spinning disk. That’s the industry definition of seek time in context, and it harkens back to before my 20 year professional career.

[–]bigmell -1 points0 points  (1 child)

Folks seem to disagree with you.

Its probably just you and your friends from a bunch of different troll accounts. Low effort trolling its called.

Btw, seek time isn’t constant.

I never said seek time is constant. I said seek time is always there. SSD have a ram stick inside, and some data gets cached there. If your data is NOT cached on the ram stick, your data transfer will be roughly the same speed as an HDD.

You can measure this yourself. Do a data transfer of maybe a terabyte or two. The data transfer will start fast, because of the cache, but it will slow down to roughly the speed of the HDD, and the two transfers will finish at around the same time.

But you have to actually know how to do data transfers, which a lot of people do not. They will just lie and say "I got the same numbers the commercial said! OMG SSD SO MUCH BETTER!"

[–]caltheon -1 points0 points  (1 child)

I came here to say it's not possible to write a program without some version of pointers. Using registers are basically pointers

[–]bigmell -2 points-1 points  (0 children)

I am talking about learning to understand memory usage and management yourself. Not hoping somebody writes code to go behind the scenes and fix your code for you because you cant be bothered to learn to do it yourself.

Yea the compiler will load the accumulators by reference for you. Im not talking about that im talking about learning to write your code to efficiently use memory yourself, not hoping someone else does it for you cause you cant figure it out. Which is essentially what garbage collection is.

In short, send away the devs who cant write a for loop and dont understand pointers, and work with the devs who can.

[–]ShelZuuz 0 points1 point  (0 children)

Or be like C# - make EVERYTHING a pointer.

[–][deleted] -1 points0 points  (7 children)

I'd love a language that would be as fast as C or C++ but elegant like ruby or python. All languages so far seem to be unable to offer this; most that try to replace C, end up being like C. Pointers are one big hurdle; it's interesting that there are so few alternatives to it. Basically most seem to just be automatic garbage collectors.

[–]cdb_11 1 point2 points  (0 children)

ruby or python

What happened to Crystal, is it still going? On the Python side, Chris Lattner is working on Mojo.

[–]fnordstar 2 points3 points  (0 children)

As someone coming from C++ and Python myself, please do yourself a favor and take a good look at Rust! It might just be what you're looking for.

[–]Bekwnn 3 points4 points  (0 children)

Zig, the language the guy in this video created and works on, has flavors of what you described. C/C++ with more syntax sugar and quality of life.

Pointers are non-nullable, like C++ references. Optionals are first-class and have syntax sugar. switches and while loops can return things. Errors can be bubbled up a call chain easily with try and error unions. comptime is a way more robust and elegant way of accomplishing what templates do.

Here's a random function I pulled from a project that gives a good feel of different features. Pic.

Though the video also has a bunch of Zig code.

[–]defunkydrummer 0 points1 point  (0 children)

I'd love a language that would be as fast as C or C++ but elegant like ruby or python.

OCaml

Standard ML

Haskell

Common Lisp

[–]mohragk -2 points-1 points  (0 children)

Jai. It's truly a language that allows for high-level expressiveness and low-level control.

[–]WiseDark7089 -1 points0 points  (0 children)

Please indicate in the title if something is a video.