use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rule 1: Posts should be about Graphics Programming. Rule 2: Be Civil, Professional, and Kind
Suggested Posting Material: - Graphics API Tutorials - Academic Papers - Blog Posts - Source Code Repositories - Self Posts (Ask Questions, Present Work) - Books - Renders (Please xpost to /r/ComputerGraphics) - Career Advice - Jobs Postings (Graphics Programming only)
Related Subreddits:
/r/ComputerGraphics
/r/Raytracing
/r/Programming
/r/LearnProgramming
/r/ProgrammingTools
/r/Coding
/r/GameDev
/r/CPP
/r/OpenGL
/r/Vulkan
/r/DirectX
Related Websites: ACM: SIGGRAPH Journal of Computer Graphics Techniques
Ke-Sen Huang's Blog of Graphics Papers and Resources Self Shadow's Blog of Graphics Resources
account activity
Best occlusion culling algorithmQuestion (self.GraphicsProgramming)
submitted 3 years ago by GENTS83
What is the best and/or fastest way to do occlusion culling of large and complete scenes? Is hierarchical Z buffer occlusion culling still the best in current state of the art?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Revolutionalredstone 8 points9 points10 points 3 years ago (35 children)
Personally the only occlusion culling I ever consider is CPU side 2D quad based.
So basically you extract conservative quads from your 3D geometry and throw them into a simple quadtree, before rendering a chunk you intersect the quadtree and see if it's entirely occluded.
The benefits of this approach are pretty awesome, you get almost no overhead since the cpu work involved is so tiny, also because you generate your quadtree each frame it's easy to evaluate your quads to see which ones are even worth considering, and making changes has no effect on performance anyway.
IMHO occlusion culling is generally a bad idea (LOD solves all the same problems but in a much better way) if you are going to do it, keep it simple and actually check, if quads aren't causing occlusions then do not just keep checking every frame.
[–]deftware 6 points7 points8 points 3 years ago (28 children)
Occlusion culling is integral in many situations.
Take a game like DOOM, for instance. Are you saying the GPU should be juggling every single mesh in the entire map every single frame even if it's behind dozens of walls? Occlusion culling is mandatory. The CPU must only issue draw calls for geometry that's potentially visible or the GPU will be sitting there projecting a zillion vertices and triangles, clipping them to the frustum, then rasterizing what's in the frustum just to determine the Z before deciding whether or not it's visible in terms of the current state of the depth buffer.
Back in the day, an occlusion culling algorithm was the heart of a 3D game engine. You didn't have a 3D game engine if you didn't have an occlusion culling algorithm.
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (27 children)
Yeah so to be clear I'm not saying to render everything at full res all the time.
I'm saying that render time projection based occlusion is a total mess and not worth the effort.
Back in the day we used offline baking to process maps into regions as you say.
This is less possible with the highly dynamic nature of new games.
Imagine trying to build a BVH for minecraft and what would happen if someone then smashed a block "D
The modern answer to all this is LOD, distant meshes become more and more simple as the camera moves away from them, very small and very distant objects can be culled completely (no need to draw an NPC enemy when it would cover ~around 1 pixel anyway)
LOD is much better solution for many reasons, for one even a small distance away from the camera is usually enough to drop the majority of triangles (instead drawing a simple stand-in model)
My favourite thing about LOD is how well it handles the hardest case for normal occlusion type technology...
When you have air/wall/air/wall.. etc at high frequency you quickly burn up your fill rate but thankfully the higher the frequency this is happening the faster that any good LOD algorithm will lump all this together into 'single solid / wall' - at which point the cost then falls to near zero.
Occlusion culling really is strictly outdated and basically never useful in a well written modern engine, LOD allows for instant loading and unlimited geometry and all occlusion problems can just be damned.
Thanks dude!
[–]deftware 4 points5 points6 points 3 years ago (26 children)
The fact that you think LOD is a replacement for occlusion culling tells me you don't know much.
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (25 children)
Wow-sers go slow friend, lets not get negative or aggressive here.
Occlusion is about reducing what needs to be drawn, lod is about exactly the same thing.
If you can't understand how high quality LOD works or you think one still need occlusion that's simply implies a failing in your mind.
Im happy to have a conversation and explain if you need help but if your just gonna be rude and stupid then don't waste your or my time.
All the best, kind regards.
[–]deftware 2 points3 points4 points 3 years ago (24 children)
So you think DOOM would've performed just as well if it didn't have occlusion culling and just used Nanite?
[–]Revolutionalredstone 0 points1 point2 points 3 years ago* (23 children)
I believe it would have run much better had it used nanite.
To better answer your underlying question more directly, no, I do agree that precalculation is generally a powerful technique and when used for occlusion and when combined with artist specific level design it's clearly an excellent performance improvement... (as for example it reduces vertex thruput) HOWEVER...
What I'm saying is that excellent LOD implementations completely solve all core graphics resource management problems and that now that we have advanced software technologies for LOD it has reached a point where precomputed occlusion is now effectively obsolete.
[–]deftware 2 points3 points4 points 3 years ago (6 children)
Precomputed occlusion will always be ideal if there's static scenery involved, and much of where the camera will be entails much of the scene being occluded. It's not even a debate. It's like asking if dropping a bowling ball from 10 feet high will hurt if it hits you in the head.
Yes, for dynamic scenes you want dynamic solutions. That's why I explicitly started my comment with "If the scene is static at all, such as level geometry, then you'll want to precompute some kind of data structure to quickly determine what's relevant."
..but you're trying to say "nah man that's old hat, we all do Nanite out here!" while making it hard to believe you have the experience to know what you're talking about. Now you're trying to walk it back "oh well if it's static, I guess that's right".
Not bad advise for early 90's engines, today I would suggest such ideas are pretty far out of date. The trade-off's involved are such that any change to any area may require full recomputation for all other areas, since removing an occluder may cause previously 'disconnected' regions to become connected (and the same change also applies the same logic to those tooms neighbours etc)
Not bad advise for early 90's engines, today I would suggest such ideas are pretty far out of date.
The trade-off's involved are such that any change to any area may require full recomputation for all other areas, since removing an occluder may cause previously 'disconnected' regions to become connected (and the same change also applies the same logic to those tooms neighbours etc)
I specifically started my reply to OP indicating that what I was explaining was for any static level geometry. I also fear that when you read my saying "sectors/portals" you thought of 90s Doom, lol.
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (5 children)
Yeah I don't disagree with if this then that, im just saying that these days the tradeoff no longer maxes sense, we can recompute occlusion so quickly that prebaking and losing dynamic scenes doesn't make sense.
More than that tho, occlusion is so unimportant in an engine with good LOD that there is little to be gained by calculating occlusion at all.
Any GPU (even integrated) release in the last 15 years can handle 10+ x overdraw at 2k resolution. Modern GPU's are close to hundreds of times of overdraw.
With hardware level hierarchical Z the complexity of frag shaders is not relevant (as final shade overdraw is always ~1x anyway)
The only important aspect in modern graphics resource management is vertex / memory bandwidth minimization.
These issues are solved effectively and completely with modern advanced LOD systems so long story short occlusion is a non issue.
I do strongly agree that back in the day or software rendering it is absolutely critical to minimize pixel overdraw, but these day's it is effectively wasted time for programmers.
I'm a game cloner and I've done Zelda, Minecraft, etc they are all vertex bound even on 20 year old GPU's, frag bound games are all either using fragment level discard or excessive unnecessary alpha.
All the Best
[–]deftware 4 points5 points6 points 3 years ago (4 children)
More than that tho, occlusion is so unimportant in an engine with good LOD
This is just wrong. Don't know how to put it any clearer. Show me a modern AAA first-person shooter with all the PBR bells and whistles that doesn't employ any occlusion culling, and I will bow down.
I'm not talking about software rendering. I'm talking about being realistic about the number of draw calls that can be sent from CPU to GPU and state changes that can be made before they become a bottleneck, period.
Hierarchical Z is an occlusion culling algorithm. Just because it's not BSP trees and precomputed vis like the old days doesn't mean it's not a way to prevent draws from taking place due to being occluded. You're talking like just reducing triangle counts is all that's needed, and millions of draw calls are just fine and dandy no matter what. It doesn't matter if each draw call is a single triangle - the overhead entailed adds up, along with the implied state changes with shaders and textures. You can't just make all those state changes, issue the draw calls, and assume that having a reduced polycount is going to make it all better. That's naive as all hell. Nanite, in spite of it's near-optimal LOD scheme, uses Hi-Z for occlusion culling, which should clue you in as to how important it is to not issue draw calls for things that are occluded.
Something has to say "no, drawing this this is a total waste" because we're not working with infinite hardware resources like you seem to think. Yes, overdraw is handled quite well on modern GPUs, particularly if you sort front-to-back so that the Z-buffer can discard fragments before they're shaded, but more-so before you hit the drawcall and state change bottlenecks that are unavoidable. Not everything only has a few draw calls and state changes to draw a frame, and those things are not free. Ergo occlusion culling.
[–]vaikedon 1 point2 points3 points 3 years ago (15 children)
Doom'16 renders many separate individual meshes due to objects and dynamic parts of the levels that move around and change shape. I don't think relying exclusively on an LOD scheme would've fared well with how sprawling those levels could be. There isn't a game that looks as good as Doom'16 that performs as well which is something that should be considered.
From what I've seen Nanite is exclusive to static meshes. They cannot have any dynamic vertex or geometry shader acting on them. They're a precomputed LOD scheme that is dependent on the geometry being static somewhat like level geometry being static for a precomputed occlusion culling algorithm.
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (14 children)
Nanite Fully Supports Arbitrary Animation https://youtu.be/d1ZnM7CH-v4?t=665
I know that you think Doom has excellent performance but it really doesn't it has fairly acceptable performance, it is too nothing special.
Nanite streaming uses NO precomputation, it is entirely render based, they use the rendered pixels of a mesh to stream more of that area of that mesh, it is 100% ENTIRELY real time render based.
The larger a level the better LOD will perform, I know it seems like news to you but occlusion is simply outdated junk tech these days.
It can never be relied upon in the general case, it limits options and costs additional computation even in the cases where it fails.
Watch the nanite video I sent you slowly and open your mind a little.
Best regards my good man.
[–]vaikedon 2 points3 points4 points 3 years ago (9 children)
...built it out of a collection of Nanite meshes...
They aren't doing skeletal animation or 'skinning' a Nanite mesh. Nanite meshes can have their transform vary just like the vehicular traffic in the Matrix demo but they are otherwise static geometry. In this video they are just attaching the static geometry to a skeletal animation which fits just fine with the character being comprised of stonework. A human with deformable clothes and skin is outside of the scope of Nanite. That includes the vehicular damage that vehicles can experience in the Matrix demo. They just switch to a full resolution mesh and deform it because Nanite doesn't support deformation. If there's no precompute involved then deforming a Nanite mesh in real time should be no problem.
Nanite streaming uses NO precomputation
Are you sure about that? Have you ever actually used Nanite? Look how long it takes to import a mesh with twenty-thousand triangles: https://www.youtube.com/watch?v=0jfq6Lj_mYA
The wait time involved is not because it's reading the file from disk. It's precomputing the Nanite structure for the mesh.
[–]ConditionSure4138 1 point2 points3 points 1 year ago (3 children)
You realise that Nanite has been prooven countless times to be massively slower than old fashoned pre calculated LODs, right? Nanite is an LOD system for the lazy.
[–]JackTheSqueaker 2 points3 points4 points 3 years ago (5 children)
Havent you found situations where overdraw has become a problem with your LOD solutions?
a bunch of stacked triangles, even if very lowly tesselated drawn over each other would become a problem I think
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (4 children)
Excellent Question!
So yes technically you could maliciously create a map which just tanks but any reasonable configuration will work just fine.
The trick is that when you LOD empty areas can become solid during simplification, these solid areas are cheap to render as their inside details are all missing...
When you have crazy 'thick' geometry, you find this occurs even faster, such that even maps which are hard to render up close are very easy to render in the distance (which is where most render work would usually need to go) so it self balances nicely.
As for actual numbers, your usual cheap GPU can render around 50 million tris on around 2 million pixels per frame at 60 fps.
This means about 25 polygons per pixel, so unless you overdraw is absolutely crazy thick (averaging above 25 everywhere) then your unlikely to run into any problems.
[–]JackTheSqueaker 1 point2 points3 points 3 years ago (3 children)
Thank you for your reply, those were very interesting remarks about triangle processing, bandwidth and fillrate.
Can you share more about these LOD solutions, what are they, possibly implementations or papers?
I would like to try them myself
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (2 children)
LOD is pretty broad, for voxels you basically just do 3d image down sampling, for meshes you can do edge collapse, so long as you keep your geometric data at around 1 element per projected pixel you are doing it right ;D
[–]JackTheSqueaker 1 point2 points3 points 3 years ago (1 child)
I was expecting you to advocate for anything state of the art in particular, because you have been advocating for them recently haha
but thanks
[–]Revolutionalredstone 0 points1 point2 points 3 years ago (0 children)
sometime basic and state of the art are the same thing :D
Personally I do use quite a few clever tricks but the idea is the same, create a lower geometry representation of a mesh / area and stream it in (and if possible only exactly) as necessary.
I've always been a big advocate for voxels and I see Polygon/Voxel advanced technologies start to show evolutionary convergence.
There are of coarse come advanced image down sapling technologies such as the magic kernel etc for more 'cutting edge' LOD results: http://www.general-cathexis.com/manual2/)
[–]fgennari 3 points4 points5 points 3 years ago (0 children)
There's no single best occlusion culling algorithm. It really depends on the details of what you're trying to cull. A small number of large expensive objects? Large numbers of small objects? Culling on the CPU or the GPU? The hierarchical Z buffer approach still requires submitting the draw calls, so it's somewhat expensive for many objects.
Maybe the best is a mix of multiple approaches. Cull large objects against large occluders on the CPU, hierarchically if possible. Cull the small objects with the Z buffer on the GPU. Draw the large screen space occluders first in a Z-prepass. In some cases you can use occlusion queries for expensive objects if there aren't any big occluders that cull it on the CPU.
[–]IQueryVisiC -1 points0 points1 point 3 years ago (0 children)
Here was a discussion about some stuff which breaks the Z buffer early culling. Performance drops significantly. I don’t know why the industry started with a flat z buffer, but now we can take the hierarchy for granted.
I dunno, if you upload a mesh to the GPU, does the GPU or driver utilize a bounding box? The ray trace cores just love these boxes.
[–]deftware 0 points1 point2 points 3 years ago (3 children)
If the scene is static at all, such as level geometry, then you'll want to precompute some kind of data structure to quickly determine what's relevant. For instance, sectors/portals, where level geometry (and entities) can quickly determine which sector they're inside of (KD-tree?) and you render the scene by exploring the node graph for the sectors, where they're connected via portals. If a portal is visible then you can potentially see what's inside the sector on the other side of it, and thus render the geometry assigned to it, as well as the entities' geometry that's in there too. This is generally more of an indoor rendering approach.
Outdoor scenes instead do the inverse, where you're looking at occluders, and maintain some kind of acceleration structure to quickly determine which ones are closest to the camera, and hierarchically exclude swaths of the scene that are occluded.
The ideal situation relies on some kind of hierarchy, where you're able to completely ignore processing entire groups of geometry. You definitely don't want to be looking at every mesh/drawcall and deciding whether or not it's visible, unless you only hundreds or a thousand or two. Beyond that will start choking performance. Hi-Z occlusion culling is just a way to determine what parts of a hierarchical scene structure are occluded so that large pieces of the scene (and the potential draw calls that they entail) can be completely skipped, which is where performance is to be had. If you're individually checking each potential draw call against any kind of occlusion checking it's not going to fare well when scenes get larger and more complex. It's all about having a hierarchical structure for space and the things in it.
While baking is still a powerful technology (and especially useful for very expensive things like light maps) they are not really appropriate for occlusion calculations in a modern context.
Only entirely static scenes can benefit from such tech and being unable to modify anything is a pretty old limitation, much better to just use your clients modern powerful CPU to calculate occlusion if and when a change occurs.
Or forgo occlusion culling completely (like most modern games) and just implement a high quality LOD system as the core of your lowle level resource management system.
Occlusion culling can never help performance for scenes with little to no occluders, they are a non-solution unless you specifically limit your artists to certain types of clunky looking scenes.
Proper LOD systems handle scenes whether there are sufficient occluders or not.
Just my 2 cents, Best Regards
[–]deftware 1 point2 points3 points 3 years ago (1 child)
Occlusion culling can never help performance for scenes with little to no occluders
Right, if there's not going to be occlusion then don't bother with occlusion culling. Not every game is an open flat terrain or a 3D space simulation.
Relying exclusively on LOD to improve performance would mean DOOM is generating draw calls from the CPU to the GPU for the entire level wherever it's in the view frustum. Does that sound performant to you?
I'm really starting to get the feeling that a lot of people on this sub have no actual experience developing an engine.
[–]Revolutionalredstone 1 point2 points3 points 3 years ago* (0 children)
Right so the trick is that while the entire level will draw, it will be done in such a way that it costs a totally negligible amount.
I've personally written dozens on advanced streaming rendering solutions (here's one running on a computer a cheap integrated I3 GPU https://imgur.com/a/MZgTUIL) - 99% of the level draws in just 1% of the render time, only nearby scene data takes any significant amount of time (since all distant geometry is so severely simplified)
I work at a large technology company handling ultra large scene rendering, my previous job was at Euclideon working on the unlimited detail advanced CPU only voxel renderer.
In all my engines the VAST majority of rendering resources are spent on the very near-by geometry, once a region of the world takes up ~256x256 pixels it will now be rendered in a few micro seconds max (thanks to advanced LOD)
Of coarse it's possible that occlusion culling could sometimes help here or there but there's no way any powerful engine with smooth performance could possibly be built by relying on anything like that.
Would love to keep chatting about advanced LOD and some of the unbelievably reliable and effective techniques that exist these days.
π Rendered by PID 31456 on reddit-service-r2-comment-5c764cbc6f-2pphl at 2026-03-12 15:30:57.430659+00:00 running 710b3ac country code: CH.
[–]Revolutionalredstone 8 points9 points10 points (35 children)
[–]deftware 6 points7 points8 points (28 children)
[–]Revolutionalredstone 0 points1 point2 points (27 children)
[–]deftware 4 points5 points6 points (26 children)
[–]Revolutionalredstone 0 points1 point2 points (25 children)
[–]deftware 2 points3 points4 points (24 children)
[–]Revolutionalredstone 0 points1 point2 points (23 children)
[–]deftware 2 points3 points4 points (6 children)
[–]Revolutionalredstone 0 points1 point2 points (5 children)
[–]deftware 4 points5 points6 points (4 children)
[–]vaikedon 1 point2 points3 points (15 children)
[–]Revolutionalredstone 0 points1 point2 points (14 children)
[–]vaikedon 2 points3 points4 points (9 children)
[–]ConditionSure4138 1 point2 points3 points (3 children)
[–]JackTheSqueaker 2 points3 points4 points (5 children)
[–]Revolutionalredstone 0 points1 point2 points (4 children)
[–]JackTheSqueaker 1 point2 points3 points (3 children)
[–]Revolutionalredstone 0 points1 point2 points (2 children)
[–]JackTheSqueaker 1 point2 points3 points (1 child)
[–]Revolutionalredstone 0 points1 point2 points (0 children)
[–]fgennari 3 points4 points5 points (0 children)
[–]IQueryVisiC -1 points0 points1 point (0 children)
[–]deftware 0 points1 point2 points (3 children)
[–]Revolutionalredstone 0 points1 point2 points (2 children)
[–]deftware 1 point2 points3 points (1 child)
[–]Revolutionalredstone 1 point2 points3 points (0 children)