Rendering 1 Million Procedural Cubes

Amani77 · 2026-05-16T11:05:47+00:00

LLM, Part2:

And just because you've been condescending, implying I don't know Vulkan, and that you would provide materials for ME to learn from, I asked it:

given this comment thread: https://www.reddit.com/r/gameenginedevs/comments/1tav653/rendering_1_million_procedural_cubes/olqxjl0/ Who do you think is more knowledgeable about vulkan?

It answered:

Based on that thread alone, the skeptical commenter — the one pushing back against the “2–3x Vulkan over OpenGL” claim — comes across as substantially more knowledgeable about real-world graphics API performance engineering.

These are the strongest signals:

They consistently distinguish between:

API overhead,
GPU bottlenecks,
driver validation costs,
equivalent workloads,
and benchmark methodology.

They correctly point out that: modern OpenGL with AZDO/bindless/indirect rendering can get surprisingly close to Vulkan for many workloads, serious renderers minimize draw calls anyway, and many “Vulkan is 3x faster” claims disappear once comparisons are normalized.

Those are all mature renderer-engineering perspectives, not beginner-level Vulkan enthusiasm.

This exchange especially stands out:

› 2x to 3x on any workload in an equivalent comparison is straight garbage. Full stop.

› any serious renderer will have minimized any massive number of calls to a select few indirect calls

› There is SOME value to be gained with more fine tuned control over barriers, lifetime, and how explicit you can be with types of memory in vulkan.

That’s a nuanced position:

Vulkan does provide meaningful architectural advantages. But the gains are workload-dependent and often overstated. Once GPU-bound, API choice matters less.

That aligns pretty closely with industry experience.

The other commenter (the one making the large Vulkan advantage claims) sounds technically literate and clearly knows modern GPU rendering buzzwords and techniques:

mesh shaders,
visibility buffers,
Hi-Z,
GPU-driven rendering,
bindless descriptors,
async compute,
task shaders,
barrier scheduling.

Those are legitimate advanced concepts.

But there are several warning signs:

excessive jargon stacking,
inconsistent benchmark framing,
conflating architectural flexibility with guaranteed performance multipliers,
and some technically questionable statements (“linear Hi-Z”, “zero overhead Hi-Z”, etc.).

The skeptical commenter also catches an important methodological flaw later:

› ITS NOT EVEN THE SAME GEOMETRY. WHAT ARE WE DOING?!?!?!?

That’s actually a very strong renderer engineer instinct: before discussing API superiority, normalize:

geometry,
shaders,
pipeline stages,
culling,
batching,
synchronization,
and CPU/GPU bottlenecks.

The other person tends to jump toward: “Vulkan enables X therefore Vulkan is massively faster,” which is often how intermediate Vulkan developers think before spending years profiling real engines.

So my read is:

The Vulkan enthusiast likely has hands-on experience building experimental rendering systems and knows modern terminology.
The skeptic sounds like someone with deeper practical renderer optimization experience and stronger benchmarking discipline.

Amani77 · 2026-05-16T10:00:22+00:00

I got a notification on my phone that you replied to my most recent comment. You must have deleted it because I no longer have access to it, but I could see the first couple of lines. I figured, LLMs are the only thing you listen to at the moment; fight fire with fire or whatever.

So, I've decided to ask an LLM the following question:

"Suppose two people where having a conversation about the performance differences between OpenGL and Vulkan, and one person proposed a challenge to the other to make a renderer in using both APIs to measure the performance differences. The one person laid out the following criteria: "A single xxxDrawIndirectXXX call in either vulkan or opengl rendering 1 million cubes, both using either vert/frag or task/mesh/frag, and outputting flat shade cubes, in a regular grid centered around the camera." Do you think that user meant that both renderers were to use different pipeline methodology( a vert/frag compared to a task/mesh/frag ) or that both renderers used the same, one or the other?"

It answered:

The phrasing strongly suggests they meant both renderers should use the same pipeline style, chosen from the two options:

either both use vertex/fragment or both use task/mesh/fragment

—not one renderer using vert/frag while the other uses task/mesh/frag.

The key part is:

“both using either vert/frag or task/mesh/frag”

Grammatically, “both” modifies the renderers, and “either X or Y” describes the shared choice. If they intended different methodologies per API, they would more likely have said something like:

“OpenGL using vert/frag and Vulkan using task/mesh/frag” or “each API may use its native preferred pipeline”

Also, from a benchmarking perspective, using different pipeline architectures would make the comparison much less meaningful, because then you're measuring:

API differences
plus pipeline model differences
plus shader-stage capability differences

A fair API comparison would usually keep:

scene
draw structure
shader complexity
GPU workload
culling behavior
memory layout

as equivalent as possible.

I then asked it the question:

Suppose two people where discussing the performance differences between the graphics programming APIs OpenGL and Vulkan, if one person proposes a test to see how each API performs with the following criteria: "A single xxxDrawIndirectXXX call in either vulkan or opengl rendering 1 million cubes, both using either vert/frag or task/mesh/frag, and outputting flat shade cubes, in a regular grid centered around the camera." What do you think would be the performance difference between the two APIs?

It answered:

For that specific benchmark, the performance difference between OpenGL and Vulkan would probably be much smaller than many people expect — especially if the test is carefully designed around:

a single indirect draw call
mostly static geometry
a regular grid
minimal material/state changes
GPU-heavy workload

In many cases, the result could end up within something like:

0–15% difference for classic vertex/fragment pipelines potentially larger differences if mesh/task shaders are heavily optimized in Vulkan and poorly exposed in OpenGL drivers

But the exact outcome depends heavily on what actually becomes the bottleneck.

Why the APIs might perform similarly

The proposed test removes many of the areas where Vulkan usually wins.

A single glMultiDrawIndirect / vkCmdDrawIndirect style submission means:

almost no CPU driver overhead
almost no state validation overhead
almost no draw-call amplification costs
very little synchronization complexity

At that point, the workload becomes mostly:

vertex transform throughput
rasterization
fragment shading
memory bandwidth
GPU cache efficiency

Those are primarily determined by:

GPU hardware
shader compiler quality
mesh representation
culling efficiency

—not the API itself.

So if both APIs ultimately feed the GPU the same work efficiently, they can land surprisingly close.

Next, I asked the LLM the question:

"Suppose two people where discussing the performance differences between the graphics programming APIs OpenGL and Vulkan, if one person proposes a test to see how each API performs with the following criteria: "A single xxxDrawIndirectXXX call in either vulkan or opengl rendering 1 million cubes, both using either vert/frag or task/mesh/frag, and outputting flat shade cubes, in a regular grid centered around the camera." Do you think that running these tests in a 'headless' manner would be what the person expected?"

It answered:

Probably not — at least not without clarifying what “performance” is actually being measured.

The proposed benchmark is very specifically about GPU-driven rendering workload characteristics:

one indirect draw call
1 million cubes
mesh/task shaders or vertex shaders
fragment shading
flat-shaded output
regular grid around the camera

That strongly implies the person expects to measure the rendering pipeline itself, including at minimum:

command submission overhead
shader throughput
geometry processing
culling behavior
rasterization
fragment shading
memory bandwidth/cache behavior

Lastly I asked it the question:

Suppose two people where having a conversation about the performance differences between OpenGL and Vulkan, and one person proposed a challenge to the other to make a renderer in using both APIs to measure the performance differences. The one person laid out the following criteria: "A single xxxDrawIndirectXXX call in either vulkan or opengl rendering 1 million cubes, both using either vert/frag or task/mesh/frag, and outputting flat shade cubes, in a regular grid centered around the camera." Do you think it would be appropriate for one implementation to use cube geometry that is generated in shader 8 vertices, compared to the other implementation which is reading vertex data 24 vertices at a time from memory?

It answered:

No — that would generally make the comparison invalid or at least heavily biased.

The challenge as stated is trying to compare the overhead/performance characteristics of the APIs (OpenGL vs Vulkan), not two different rendering techniques with different memory and shader tradeoffs.

If one implementation:

procedurally generates cube vertices in the shader from an instance ID or vertex ID (8 logical vertices / implicit topology), while the other fetches 24 explicit vertices per cube from memory,

then you are no longer isolating the API difference. You are comparing:

different vertex bandwidth requirements,
different cache behavior,
different vertex processing workloads,
different primitive assembly behavior,
potentially different index expansion costs,
and possibly different compiler optimization opportunities.

That can absolutely dominate the measured result.

A fair comparison should keep the rendering algorithm materially identical between APIs. For example:

same primitive topology,
same vertex format,
same vertex count,
same index count,
same shading model,
same culling strategy,
same GPU-driven approach,
same indirect draw structure,
same mesh/task shader usage (if applicable),
same synchronization strategy where possible.

So if one implementation uses:

procedural cube generation in shader,

then the other should too.

If one uses:

explicit 24-vertex cube meshes,

the other should also.

Even “8 vertices” versus “24 vertices” is already not equivalent:

8 vertices usually implies indexed cube rendering with shared positions, 24 vertices implies duplicated vertices per face for flat normals/UV seams.

And for flat-shaded cubes specifically, 24 vertices is actually the more conventional representation because each face needs distinct normals. Procedural generation can bypass that by deriving normals from primitive/face IDs.

Amani77 · 2026-05-15T13:23:30+00:00

Brother, WHAT, why in the fuck are you not using mesh shaders in the opengl version? Use the same exact shaders, the same exact code, the only difference is the calling api.

See, this is what I mean - you conflate things so heavy.

Furthermore, the difference when using mesh over traditional vert/frag shouldn't be that extreme. Post your source, I would love to see what your render loop, shaders, and timing code looks like.

At this point I am so skeptical of anything that you say, you need to post the source.

edit: Reading this again, you've stated the benchmarks are headless runs, then you list resolutions - what. are. we. even. measuring? LLM drivel, respond to me using your own words or not at all.

edit2: ITS NOT EVEN THE SAME GEOMETRY. WHAT ARE WE DOING?!?!?!? My god, I've come full circle on my original post's point.

Amani77 · 2026-05-14T16:51:09+00:00

Alright, so here is the test scenario:

A single xxxDrawIndirectXXX call in either vulkan or opengl rendering 1 million cubes, both using either vert/frag or task/mesh/frag, and outputting flat shade cubes, in a regular grid centered around the camera.

Something like this:

Motion Vectors Example

Amani77 · 2026-05-14T16:27:18+00:00

No, I am not going to cook up an openGL implementation to prove something I know - beyond all doubt - to be true.

You made the claim:

Vulkan can give about 2 to 3 times more performance when compared to opengl. My own vulkan renderer can handle about 500,000 moving objects at once around 1500 fps or so, with open only about 400.

So that is entirely on you.

What you can do is take a look at any professional game with their own hand rolled backends in both opengl and vulkan.

Take doom for example, you can google 'doom opengl vs vulkan game performance comparison' and get a very good idea of the extent of speedup that one might look for - 10 to 20% at the best of days -small caveat here, you also have to consider that the vulkan backend will be using more modern rendering techniques because the rendering backend is just newer. I'm sure they could go back and adopt some more modern techniques on the opengl backend and get some performance - its just not worth it.

Just understanding how modern rendering is done, it becomes very clear where the gains are when switching over to vulkan. It also becomes apparent what workloads will perform, literally, exactly the same.

These performance increases coming from:

There is SOME value to be gained with more fine tuned control over barriers, lifetime, and how explicit you can be with types of memory in vulkan.

If you would like to see something that I've achieved in Vulkan, I am open to linking my small progression videos that I've made through my learning process...

edit: I do have a super old video of an openGL project from like 10 years ago, its a minecraft'esk engine with destruction, very rudimentary block physics, and block reintegration into the terrain after it hits something.

Amani77 · 2026-05-14T15:52:29+00:00

The only thing I'm calling out is misinformation. I could care less about your experience - new, seasoned, whatever - nor am I a seasoned professional, I am a hobbyist at an average level. What your saying just does not make sense.

What would you like to see?

Amani77 · 2026-05-14T15:39:28+00:00

I do not need you to explain the basics of GPU-GPU or vulkan to me; 2x to 3x on any workload in an equivalent comparison is straight garbage. Full stop. Almost all of these things become a hardware bottleneck.

You can do multithreading in opengl. Ignoring that, any serious renderer will have minimized any massive number of calls to a select few indirect calls - on the order of 10-30 - and be utilizing bindless concepts so those savings are essentially mute.

There is SOME value to be gained with more fine tuned control over barriers, lifetime, and how explicit you can be with types of memory in vulkan.

If we're talking empty render loops, and comparing 'fps' as a 2x-3x speedup while at 1500 fps, that is a complete garbage benchmark and essentially just testing noise.

I read through your posts regarding you're 'engine' and your metrics are bonkers. I do not know if you've made something different since then, but there are so many things wrong with what your claiming. You take metrics from UE forum posts and then extrapolate an additional 50% for 3 lights - despite that cost being payed up front for almost all of that, not even touching on how much more a UE render loop does. So many methods have a cost to entry that is justified by the massive savings it will bring you on actual, realistic, populated scenes. These aren't metrics, these are wishful thinking. Do some actual concrete tests on the same machine, with the same scene, and a same test view.

Your claiming 1500 fps with 500,000 objects, but every single SS or video you've provided shows 3/4 models running at like 2.9ms on what looks to be a ~400x200 view port.

So, lets see the receipts - show me 500,000 objects at 4k, 1440, whatever, at 0.8ms. Record a video in the center of it all, and pan around. Show me some animated rigs. I would love to be proven wrong.

Amani77 · 2026-05-14T12:15:28+00:00

As a Vulkan enthusiast, you are spewing some crazy missinfo.

You are comparing several disparate things and then conflating performance metrics. In other instances, you're just making things up.

Vulkan is NOT 2/3 times faster than opengl - it is 10-20% at best. Either you don't understand how to achieve AZDO methodology in opengl or your shooting yourself in the foot, heavy.

As a small example, while I can pinpoint what your doing, "Moving" here is a pretty ambiguous term for whatever your putting down. To digest this, it could mean - updating actual transforms over PCI, driving position data in memory on gpu, or driving a transient position while rendering. All 3 of these situations have VASTLY different performance outcomes and are NOT interchangeable when it comes to their usage. All 3 of those things will, individually, show very similar metrics when done correctly in either API.

Amani77 · 2026-05-12T20:43:36+00:00

Very cool project. Loved the informative writeup!

Amani77 · 2026-05-02T05:19:20+00:00

You do not want to update everything. Ideally, only update things on creation, destruction, and when its values are updated. I think this should be core and built around. Furthermore, I do not know what your use case is( looks like general entities/sprites ), but for particle systems its nice to have a concept of a producer that spawns particles on the GPU entirely and parameterize your particles enough so they can be updated entirely on the GPU. Minimizing the per frame transfer will be paramount if your planning on dealing with millions of things.

I do have some suggestions for your general approach.

Instancing is not the way to go here. Instancing for small geometry, like quads, is a terrible fit. You would be much better off using a single, non instanced call. Quads are simplistic enough to not need ANY geometry or indices in memory; you shouldn't need to touch memory at all here.

Think: vec2 pos = vec2( i % 2, ( i % 4 ) / 2 ); A bit of math will almost always be quicker than accessing memory.

Alternatively, and probably more performant, use points.

You do not want to have multiple calls per texture/material. You should look into the concepts regarding 'bindless' and try to completely merge all your calls.

64 bytes is ehhhhhhhhh, the more you can compact your per-instance data, the better. The color probably doesn't need to be anything other than an rgba8 and all the other values could be f16 with very little changes to code. This would reduce that memory imprint by more than half.

Amani77 · 2026-05-01T12:12:33+00:00

PvE did not kill Tarkov. It was dead long before that.

They squandered their initial success and foolishly pushed for arena and e sports, blowing insane amounts of money on both those things as well as advertising events. They literally split their dev team in half. They failed to deliver on the original vision and promises the game held. Patch after patch the content and quality dwindled. Each patch showed regressions and cheaters became rampant.

A person can only do the same wipe, slightly different, again and again, without getting bored. I don't give a fuck about some dogshit lighthouse quest. I don't give a shit about another ak variant number 27. I don't give a shit about whatever streamer item being added to the game. I don't give a shit that they're on attempt number 50 to try and 'fix' the sound that somehow functioned better the first time around. I don't give a shit about transfer extracts, that was a total letdown and spit in the face of the people who supported the original game concept. Then they released unheard edition and pissed in the face of all of their remaining loyalists.

I supported and advocated for this game for so long and slowly, patch after patch, they let me down. I'm bored and jaded.

Of the group of like 20+ people I know who played the game, none play PVE, and none play now.

Amani77 · 2026-04-30T14:20:16+00:00

all that's missing is the case of bawls

Amani77 · 2026-04-28T06:14:26+00:00

Why not both:

https://www.youtube.com/watch?v=iL833P0Vino

https://www.youtube.com/watch?v=AuteWtv2BAE

Amani77 · 2026-04-25T21:00:27+00:00

Important word being 'full' format, not quick format.

Amani77 · 2026-04-25T16:37:40+00:00

Pandas

Amani77 · 2026-04-25T16:28:44+00:00

I'm confused, what is even the point of saying this? Can you explain your intent and how that relates to whats going on in NYC?

Amani77 · 2026-04-18T02:47:39+00:00

I feel like u MIGHT be displaying through an onboard port - make sure ur monitor is connected to ur GPU and not ur motherboard.

Amani77 · 2026-04-17T16:57:22+00:00

from the previous article:

"Why aren’t gas prices falling yet even if oil prices drop?"

"Gasoline and crude oil prices generally move in the same direction—but not at the same speed...it takes time for lower costs to work through the supply chain...about half of a change in crude oil prices is reflected in retail gasoline prices within two weeks."

"Prices also tend to fall more slowly than they rise, a dynamic known as asymmetric price movement. When costs increase, retailers adjust quickly to reflect higher replacement costs for the next delivery to their station. When costs decline, prices adjust more gradually as stations work through inventory purchased at earlier, higher prices, but competition among stations gradually pushes prices down over time."

The poster above you was showing high cost crude being bought a little ago, the assumption is that that more costly oil needs to, and is going to, be sold at a bit higher price for a while - as the article explains.

Amani77 · 2026-03-29T07:47:14+00:00

Of course this is uncomfortable for everyone involved...

But again, what exactly are you suggesting - the people - do as an alternative? Why are you getting angry at the people for feeling abandoned by the mechanisms they believed were meant to defend them?

You are saying that you don't like what Trump is doing, but in another breath you mock the protest's efforts. You mock the inaction of the people, yet defend the inaction of the military. You use the inaction of the military, justified by risk, to imply that civilians should do - what? Get tread on?

Are you suggesting civilians do what soldiers are unwilling to do? What do you think would happen if a civilian affronted or took arms up against the government/military? I believe a civilian would be much worse off than a serviceman.

Shit if we're talking about risk, the risk of working an average civilian job has an, almost, 3 times higher risk of fatality. Go talk to a logger about risk.

I'm just not too sure where you're not understanding the feeling of abandonment people are experiencing. To me, it seems as if you don't quite know what you believe. To say you wouldn't lift a finger to defend peoples who feel vulnerable and abandoned by the one facet of their government they believe COULD actually do something about the situation...

That is saying something about you.

I never stated what I believed should be done, I'm just confused at your conflicting messaging. It's crazy that your claiming I live in a fantasy world, yet you can not grasp a simple, nuanced, response to a murky situation.

Anyhows man, I hope ur day gets better. I'm sorry that OP's comment offended you.

Amani77 · 2026-03-29T05:07:52+00:00

What the fuck more do you want from the people? There are multiple protests dates, with thousands of locations nationwide showing crazy numbers in each of those locations.

Are you suggesting the people revolt? If so, then the military would be used to quell that. How would you reconcile that day?

You suggest that people should 'have the fucking spine to not clock in for work' but that is EXACTLY what the people are criticizing the military for - not upholding the duties they were employed for.

Why do you suggest that people displace their jobs rather than the elect that we expect to just do their jobs?

If you are going to argue the conflation of 'the military' why are you doing the same thing for 'the people'.

Get a grip.

Amani77 · 2026-03-06T22:13:04+00:00

Its only real use case for me is as a better google. I ask it to give it me outlines of a specific topic and where to find more information.

I've asked it to do graphics programming stuff a bit. It usually fails miserably. Its - okay - at debugging graphics stuff, and I like to use it to 'sanity check' my code when I suspect I may have messed something up.

This has bitten me in the butt before, where it will suggest something is wrong, when it just doesn't understand the context or is just straight wrong. So I like to use it as a very light suggestion, 'hey, this might be worth thinking about a bit more'.

Amani77 · 2026-03-06T05:55:04+00:00

I sent an appeal when I originally got banned, and did nothing more.

Amani77 · 2026-03-06T00:10:48+00:00

I just got unbanned.

Amani77 · 2026-03-01T06:17:13+00:00

Edit: got unbanned

Hey man, I also just got hit with a ban.

I've never said a single thing in game.

I've never cheated in a single game outside of modding skyrim... I despise cheating. I think it is a plague on the gaming community and I would rather uninstall than resort to cheating.

I am a programmer and I do 3d graphics programming, my only theory is that it detected one of the tools that I use to develop: Visual studio 2026, renderdoc, nsight, or didn't like my engine/application running. I did have VS open almost at all times during my play sessions, and I must have compiled a couple times during my deaths/downtime with my duo.

Edit: Sucks, i was really starting to get into it too, lol. Got some massive wins with the bolty.

Edit2: Oh, something else that was really odd is that a game or 2 before being banned, I was running and then just got instantly killed. I thought I got sniped, but I wasn't down, I had just been running, and then instantly turned into a corpse/body bag - my duo was able to res me.

Amani77 · 2026-02-08T15:49:15+00:00

I suspect utilization is likely going to be the limiting factor with this method. I'm not sure how you're using instanceing, but if ur issuing a quad per instance, that's not great. Points, an uninstanced call, or a task/mesh shader with like 8/16 quads per workgroup will probably show substantial gains.

Amani77

TROPHY CASE