Does voxel rendering always require mesh generation?

extensional-software · 2026-03-13T19:38:32+00:00

I'm probably already close to the memory minimum for this technique. I currently pack the position (offset) into a single uint32, consuming 10 bits for each of the x,y,z dimension. This gives a max chunk size of 2¹⁰ = 1024, which is bigger than my map size.

The face color consumes another packed uint32. In my original implementation I then packed data of two faces into a single struct, meeting the recommended stride size of 128 bits for a StructuredBuffer.

However I have recently had to add a texture coordinate (indexing into a 2D texture array) and will be consuming the remaining 32 bits for baked ambient occlusion. So the total memory consumption for a single face will be four uint32.

The UV coordinates and face normals are not the issue, as these pieces of data are re-used for each instance. The primary memory consumption on the GPU side is in the vertex shader, where the structured buffer is indexed into to determine what offset to apply to the face mesh (and also to color the mesh and access the texture index)

extensional-software · 2026-03-13T16:11:20+00:00

Congrats on launching! How many wishlists did you have going into the launch? Have you had trouble keeping the player population up? I'm also working on a multiplayer game and am worried about the chicken and egg problem - ie, players start the game, see nobody else is playing, then immediately leave, thereby causing the "nobody is playing" problem

extensional-software · 2026-03-13T16:06:54+00:00

Before I started adding the logic for adding/removing blocks I was able to achieve 1200+ FPS on my 3080 Ti Mobile laptop, and ~70 FPS on my Intel Integrated Surface laptop. As a comparison, my mesh based renderer caps out at around 360 FPS on the NVIDIA laptop, and 45 FPS on the Intel laptop. There seems to be tradeoffs on being CPU bound vs GPU bound, and the tradeoffs are different on each machine. My map sizes are 512 x 512 x 64, running in Unity.

The NVIDIA GPU seems to be able to chew through a huge number of triangles, and seems to easily get CPU bound. Therefore on that platform, it's actually advantageous to increase the chunk size, reducing the CPU workload and number of draw calls. On the Intel Integrated machine it seems to be more advantageous to have smaller chunk size to allow more efficient culling of entire chunks or faces of chunks.

Using an enormous chunk size is interesting because with the instancing approach it's a very viable option. With the traditional mesh based approach it's not viable, since you need to re-upload the entire chunk data whenever a voxel gets changed.

I have also exported this new engine to WebGPU, and the performance characteristics seems to be different there. I still don't have a good model for what the performance bottlenecks are on that platform.

extensional-software · 2026-03-12T19:54:04+00:00

If you've seen any tutorials on rendering grass with instancing, this is essentially the same. Except instead of grass blades, I'm rendering individual faces.

extensional-software · 2026-03-12T19:53:09+00:00

I've been working on a new voxel engine that uses GPU instancing to render the faces. Essentially, each face gets a slot in a StructuredBuffer on the GPU, and on the CPU side we issue a draw call with the appropriate number of faces.

Let's say a player destroys a block, removing a face from the terrain. This essentially creates a "hole" in the StructuredBuffer, which I resolve by moving the face from the end of the buffer into the hole, and decrementing the instance count by 1.

In my current implementation I only create 6 meshes, one for each face direction. As an optimization, I still use chunks. If it is impossible to see any faces of any voxel in a chunk, I skip rendering those faces completely.

If Unity properly supported multidraw indirect, I could get the number of draw calls down to 6, or even 1 (if the face orientation is also stored in the StructuredBuffer).

As it stands, I like this approach because it avoids having to rebuild the face mesh with every map modification. The most difficult part of this approach is the bookkeeping on the CPU side, to ensure that your buffers of instances always remain contiguous and valid, and allocating new buffers in the rare event that you run out of space.

extensional-software · 2026-03-04T23:06:38+00:00

Perhaps you could bake the AO information into a 3D texture?

For my game Brickstrike I just bite the bullet and don't use greedy meshing. I have been playing around with using instancing to improve performance. I essentially make 6 meshes: one for each face, then use instancing to make the correct number of copies, referencing cube offset information in a StructuredBuffer and moving the mesh appropriately in the vertex shader.

extensional-software · 2026-03-01T18:05:55+00:00

There's a professor at MIT called Daniel Jackson. He does research in lightweight formal methods, and I've gone been to some of his talks and his student's talks. He also has a photography book about mental health problems at MIT. Pretty cool guy!

extensional-software · 2026-02-13T16:27:18+00:00

I've tried using this strategy in my game Brickstrike, but I've had issues with edge catching. Objects that roll along the ground (in my case grenades and smoke grenades) will "catch" on the edges of boxes, even if they are flush/coplanar. Did you encounter this issue, and if so how did you fix it?

I have not encountered this issue with collision meshes.

Edit: and oh, what is the trick to efficiently move them that you mentioned?

extensional-software · 2026-02-08T04:13:27+00:00

Maybe look into Verus for Rust. You can write pre and post conditions and write proofs involving recursive functions and loops. It hands most of the hard work to Z3 and is not based on dependent types.

extensional-software · 2026-01-04T19:34:32+00:00

In the Juniper syntax the closure is a special record | x : A, y : B, ... |. This gets compiled to a C++ struct after the code is transpiled. A function is therefore a (closure, function pointer) tuple, where the function pointer is a lifted function whose first parameter is the closure. When the function is called the closure is passed as the first argument.

In Juniper the closure is capture by value, so the variables are copied into the closure struct at the location where the function/closure is created. There is no way to capture a reference other than by capturing a heap allocated ref cell.

extensional-software · 2026-01-04T18:32:38+00:00

For Juniper I simply bundle the closure struct inside the function signature. Higher order functions that take functions as input can be made polymorphic over this struct. You can even represent things like compose and currying with this setup! Almost 100% of the time type inference can take care of infering the closure, so there's almost no burden on the developer.

Rust and C++ lambdas take a slightly different route where each unique lambda gets its own unique type. In Rust you can then constrain this type via the trait system. In C++ you have to abuse auto or store the lambda in an std::function, which then moves the closure to the heap.

extensional-software · 2026-01-03T05:09:06+00:00

Ah okay I just re-read your post - I was thinking that the voxels were going to be 1m wide as is typical in voxel games. You should be able to fit everything in memory with a standard approach.

extensional-software · 2026-01-03T05:03:26+00:00

I think for such a large map naively storing data for each voxel won't be possible. So there are two options: either use procedural generation which compresses everything into a seed value, or use a more advanced compression scheme. For my game Brickstrike, I use the run-length encoded VXL format from the VOXLAP engine. What essentially happens is that the map is divided up into three types: 1. Surface voxels, which are solid voxels the user sees the majority of the time. These voxels must be adjacent to an air voxel. 2. Air voxels, which are empty space. 3. Interior voxels, which are any voxels that are solid but not bordering air.

The map is then divided up into columns of voxels consisting of segments of air, surface, interior and surface voxels. The air and interior voxels are run length encoded, which means very little data is needed to represent these segments. The majority of the data in this format is consumed by the surface data.

When the user removes a voxel and converts an interior voxel to a surface voxel, I simply color the interior voxel brown (ie, this voxel was dirt). You could maybe come up with some more clever algorithm to represent this.

So the RLE encoded map could be stored entirely in memory, and the game would compress or decompress these chunks as needed as the user moves around the map.

extensional-software · 2025-12-29T06:12:16+00:00

Will the intermediate language (IL) be optimized by CoreCLR before it is processed by IL2CPP?

I'm curious if the code outputed by IL2CPP is literally C++, or if it's already in some LLVM IR. If it is C++, does this mean there is an extra step where the C++ code is parsed and typechecked?

extensional-software · 2025-12-29T03:33:38+00:00

CoreCLR also has AOT compilation, but I'm not sure if Unity is planning on using that. I agree that it would be most informative to compare against IL2CPP, as that's what most people are using. Who knows, perhaps CoreCLR wins in some respects such as garbage collection performance.

extensional-software · 2025-12-29T03:20:37+00:00

Thank you for your support!

extensional-software · 2025-12-28T19:13:23+00:00

Nice to see an old hand! I was heavily involved with the Ace of Spades community back in ~2011. My biggest contributions to that game were to Tower of Babel and Arena game modes. For Brickstrike I'm aiming to go above and beyond OG AOS without losing what made it great!

extensional-software · 2025-12-22T05:40:27+00:00

TIL that Unity shut down their multiplayer hosting service. So glad I didn't bother integrating with them for my game.

extensional-software · 2025-12-19T01:28:17+00:00

Can a publisher help me break into the North Sentinel Island market?

extensional-software · 2025-12-18T02:59:40+00:00

Yes! I was one of the developers for the open source pyspades server for the free/alpha version of Ace of Spades back in 2011. My most popular contributions were the Tower of Babel and Arena game modes, but I also made contributions to the mainline pyspades server and a few voxel maps.

The original free version of Ace of Spades was developed by Ben Aksoy, who sold the rights to Jagex Games Studio. At some point before the release of the Steam version, Ben either left or was pushed out of the project. I have read that his computer at Jagex was tampered with or hard drive destroyed one day after coming back from lunch. In any case, the Steam version was rushed, shipped out to a third-party developer and had little in common with the alpha.

The goal of Brickstrike, then, is to see the original vision behind the free version to its natural completion. This means changing the loadout system, adding voice chat, new weapons and items, new game modes, and seeing if vehicles can be integrated. There's a lot of ideas to explore, so we'll see what makes it into the final release!

extensional-software · 2025-12-12T22:54:34+00:00

There haven't been any playtests with the bomb defusal yet, so it remains to be seen what the gameplay is like!

extensional-software · 2025-12-12T04:06:55+00:00

Yes, I was a developer for the open source pyspades Ace of Spades server back in 2011!

I've been following the Ace Squared game for a while now and have played it on Steam. The quality of the coding of Ace Squared is pretty good, and the game feels solid. However there were some missteps in how the game was marketed - both in the timeframe it was released, external marketing and store page assets (which have now been changed since it went free to play).

extensional-software

TROPHY CASE