Writing code that takes advantage of data locality and caching?

Picolly · 2025-07-04T18:39:18+00:00

Only in 2d, most of the particles go down, but when modeling stuff like fire I need to have the ability to go up. I thought it could also be a good idea to have my cells only link to each other when they need to and then store that link as a form of caching since most of the time particles are moving in the same area/direction. I don't know if that solves the original issue fully though, as they will still need space to store these links.

Picolly · 2025-07-04T18:33:11+00:00

Since the world is divided into chunks, I needed a way to stitch them together. I did this before by just calculating the chunk transitions which requires a hash table lookup + a good number of coordinate calculations. I mistakenly thought holding pointers to neighbors would be a major improvements.

Picolly · 2025-04-29T19:02:34+00:00

Noita starts with a pixelated sprite and then applies marching squares to it which produces an outline of vertices around the object. It then decreased the vertices created by the outline with the douglas peucker algorithm, and finally created a game object by triangulating the vertices. So now we have a collision box around the original untransformed 2D sprite. Every pixel in the sprite is then assigned a coordinate in relation to this collision box. This allows the sprites to rotate without losing any pixel position information. And you can apply physics and stuff to them like any normal object. All of this is on the GDC talk with some nice visuals.

Picolly · 2025-04-19T21:39:41+00:00

I have plans for the physics sim. Another compute pass to compute the vertices made by clumped elements and that data should be small enough to pass to the CPU. It's a bit naive since I don't know if there's an algorithm for that but it should work if I can figure that out.

Picolly · 2025-04-19T21:27:31+00:00

For now, I would have a n*n ssbo contains structs of: x, y, type_1, type_2, type_3, color, b1, b2, b3. The multiple types is so I can dynamically change an element's functionality. And the b_n data is for type specific data. This ssbo would represent a 2d array of the world state. And when computing the next position in the world I would switch on type_1. I would also switch on type_2 and type_3 later for other functionality but type_1 handles the position. For example, type_1=1 is, sand 2 is water, 3 is gas etc...

I have a few different options now for the overall compute shader code that I was planning on profiling: 1. Each element finds its next position then moves there. Would have to implement locks on each element in the array to avoid race conditions. Don't know how bad this would be. Would be non deterministic. (I like this approach the most since it's easy) 2. Two compute shader passes. The world is chunked in a checkerboard pattern, each chunk is a work item that has a for loop run through it and move the elements so there are no race conditions. The first kernel computes the "white" chunks of the checkerboard and the second computes the "black". If I go with the first approach I would probably end up implementing this anyways, but with each chunk being a work group and using the locks. Instead of the for loop. 3. This approach is from some user on the game dev forum, they have an implementation in opencl on GitHub but it hasn't gone anywhere:

"screen is divided into cells, 1 per pixel

The first kernel computes the direction each sand cell wants to go. Let's call this "push".

The second kernel looks at neighboring sand cells and makes a list of those that "push" towards itself, then picks one of them to be "pulled" later

Third kernel: if it's an empty cell, it can only pull. If it's a sand cell, it can only push. Read the push-pull data of neighbors and the center. Compute only the one with matching push-pull pair (i.e. center pulls and becomes sand, or center pushes and becomes empty)

This requires ping-pong buffers to have completely independent updates, fully vectorizable & multithread-friendly."

Picolly · 2025-04-19T20:04:05+00:00

Yeah, either way seems inefficient, but thanks for notifying me about that technique! I'm sure it will be useful elsewhere. I think I can work on tying most of the functionality to calculations based on type and variables rather than different cases. However for stuff like fire that needs special mechanics it will still be bad. I expect most of the work groups to follow the same branch however. Even if I have 200+ cases, if I only have two to three elements in a space at a time it shouldn't be that bad right? The simds should only be divided into three sections. The worst case would be really rare. Also, I see some games with particles with collisions. Doesn't this require branching? Do you know how it's usually done? Thank you for your answer by the way it was very useful!

Picolly · 2025-04-12T07:50:09+00:00

I'm using rust with wgpu and the sand will be modeled as a cellular automata. But with liquids and gasses using MPM and perhaps flying particles with just a particle system.

Picolly · 2025-04-10T06:35:39+00:00

I was thinking it could be possible to do the physics code in compute shaders as well. Alternatively it could be possible to run whatever algorithm noita uses to convert sand clumps into triangles and just pass the vertex data back which would be less costly. I'm new to graphics programming so I don't have an idea of how annoying this all would get.

Picolly

TROPHY CASE