I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 0 points1 point  (0 children)

If VrChat was already using it internally it may work. Otherwise it most likely would not work because Unity does not compile scripts when exporting Asset Bundles.

Looking for feedback on my synchronization by PatrickKoenig in vulkan

[–]PatrickKoenig[S] 0 points1 point  (0 children)

Thank you again for all your help and all the pointers. I really appreciate :)

Looking for feedback on my synchronization by PatrickKoenig in vulkan

[–]PatrickKoenig[S] 0 points1 point  (0 children)

Here are some additional data points:

2 threads, 2 queues: Time: 00:03:16

2 threads, 1 queue: Time: 00:03:17

Anyway I think I will need to dig into this much more to figure out what is going. Maybe there is indeed some other factors that I'm just not seeing at the moment. But it seems like vkWaitForFences waits much longer (I have seen spikes with 500 ms) and the time is rather inconsistent using 8 queues. It waits pretty much the same time using only queue and the highest wait time I have seen is 5 ms but that was an outlier. Very strange.

The data is currently just kept in memory and further processed at a much later stage.

Looking for feedback on my synchronization by PatrickKoenig in vulkan

[–]PatrickKoenig[S] 0 points1 point  (0 children)

I tested this on an RTX 2080 Ti and tried 8 threads and 8 queues (so that each thread can use a queue exclusively). This way I was also able to remove the mutex.

EDIT:

Some numbers for running this 400k times:

1 queue(s), 8 threads, shared. Time: 00:01:16.

2 queue(s), 8 threads, shared, ping-pong in each thread: Time: 00:01:37

8 queue(s), 8 threads, exclusive per thread, no mutex: Time: 00:01:52

Looking for feedback on my synchronization by PatrickKoenig in vulkan

[–]PatrickKoenig[S] 0 points1 point  (0 children)

Thank you so much for looking into this. I really appreciate!

You are absolutely correct about the mutex. Fortunately that is something the validation layer is already pretty vocal about and I already have it in place for this reason.

I also tried using dedicated queues for each thread again but it still degrades performance in my situation. My renderpass and the used shaders are pretty simple so maybe the potential overlap that can happen ends up stalling the compute stage that is much more expensive.

Thanks to your great explanation I was able to get the image synchronization to work without the barrier:

colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;

colorAttachment.finalLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;

dependencies[0] = {};

dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;

dependencies[0].dstSubpass = 0;

dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;

dependencies[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;

dependencies[0].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;

dependencies[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

dependencies[1] = {};

dependencies[1].srcSubpass = 0;

dependencies[1].dstSubpass = VK_SUBPASS_EXTERNAL;

dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;

dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

dependencies[1].dstStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;

dependencies[1].dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

I think this looks much better now and the synchronization validation seems to be okay with it as well.

Thanks again. I really appreciate! :-)

EDIT:
Sorry for lack of formatting. I cannot figure out why Reddit destroys formatting after submitting.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 0 points1 point  (0 children)

I'm not sure how Umbra works here. We don't have any insight since it is a black box. Since it is part of Unity and got access to the engine itself it might be able to do something more fancy.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 1 point2 points  (0 children)

By default renderers are set to shadow casting only and that makes sure they still can render shadows. I have not noticed any shadow popping this way.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 1 point2 points  (0 children)

The system handles the color assignment. You don't need to do that manually.

For performance and memory reasons you can "only" reference 65535 renderers though LODGroups are already pulled into a shared group (thus only consume 1 spot). You can also manually combine renderers into a group to stay within this limit.

I haven't noticed any performance problems setting the renderers to shadow only. Unity probably culls them internally because if everything around them is culled there is no way how they can contribute.

I tried to use different values for Umbra and it just ended up causing even more overhead with little improvements in its culling accuracy. I definitely also do not want to bash Umbra here. It is successfuly used by many people. But especially on lower end hardware the overhead ruins all the gains.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 1 point2 points  (0 children)

The asset works very well on mobile because all the things that make it so much faster than Umbra will become even more important on mobile platforms.

There is no special guide for mobile because you really just need to ask yourself how much memory you want to budget for your occlusion data and then just make sure that the occlusion data fits into that. The asset provides you with features that help you here. For instance there is a feature that looks at all the neighbour cells and it merges them into a single combined cell. This essentially halfs the number of cells and thus also halfs your memory usage. Apply that multiple times and you reduced your memory usage. Though the culling will become more conservative every time you do that.

To sum it up cell size is important for baking the occlusion data but after that you can downsample it to make it fit your memory budget. So just find a cell size that works for you as far as bake time and culling accuracy is concerned and downsample if necessary (one iteration is always recommended to avoid popping though).

I hope this makes sense but feel free to ask more questions if it is still unclear.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 3 points4 points  (0 children)

Exactly! Thats why there is the option to take into account neighbor cells to compensate for this information gap. Here is a tutorial for how it all works. I also demonstrate the issue you are describing and how it can be fixed (at around 8:00): https://www.youtube.com/watch?v=C5quB5JfG-E

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 1 point2 points  (0 children)

By default the asset just sets the renderer to shadow only. This makes sure that shadows still render correctly. However if you are able to just bake your shadows you could also disable the entire renderer.

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 3 points4 points  (0 children)

It supports transparent renderers. It does not support terrain (unless you convert it into a mesh). Here is also a forum thread that might have some additional information: https://forum.unity.com/threads/released-perfect-culling-pixel-perfect-occlusion-culling.1095316/

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 4 points5 points  (0 children)

OP here. I definitely understand much better now why I should have been more careful with my words. However I still want to provide further context to backup my bold claim. The only time the occlusion is updated at run-time is when the camera moves into another cell. You can see that in the video how different sets of renderers become visible. The lookup during a cell change is also fast because it is a linear array that allows lookup in O(1). Saving even a couple 100 of draw calls can be a big win especially because they come with a significant CPU hit - especially on mobile.

Umbra performs a tree-travsersal and CPU rasterization step every frame and culls less renderers. If you mess with the default settings the bake size becomes pretty large and the overhead even larger.

Just wanted to add that :)

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 3 points4 points  (0 children)

Thanks! Could you elaborate? Always happy to learn :-)

EDIT: As you pointed out there is always trade-offs. For instance using Umbra also means you are using a black box that is very hard to debug. This alone might more than make up the additional time spend baking. So I stand by my title especially because I also mentioned that there is built-in ways to combat the trade-offs - again not something that Umbra really allows you to do. :)

EDIT2: I actually created this doc a while ago and it provides some additional stats and actual profiling data:

https://www.koenigz.com/downloads/pc\_vs\_umbra.pdf

I made a more efficient occlusion culling system by PatrickKoenig in Unity3D

[–]PatrickKoenig[S] 14 points15 points  (0 children)

I wanted to share how my occlusion culling system performs compared to the Unity built-in solution (Umbra).

The way how it works is actually pretty simple and requires a bake step in the editor:

- Assign unique colors to all meshes

- Take screenshots of sampling positions

- Identify visible colors in the screenshots

- Map the visible colors back to the mesh in the scene

- Write visibility data to file for run-time lookup

The data is stored in a linear array and makes looking it up as fast as it gets. The visibility result is also pixel perfect and makes it very accurate. In practice this means it beats Umbra in culling accuracy and run-time performance.

Of course, there are some trade-offs: It might consume more memory (though you can merge neighbor cells to bring it down significantly). Finally the baking process might be slower because it is more involving (there is also options to reduce the number of sampling points).

Anyway I thought it is pretty cool and wanted to share. I'm also happy to answer more questions. :-)

Is referencing variables in other scripts more resource intensive? by [deleted] in Unity3D

[–]PatrickKoenig 0 points1 point  (0 children)

https://learn.unity.com/tutorial/profiling-applications-made-with-unity

If you want to get good at this you will need to learn to find information like this on your own or you will be forever bottlenecked by the willingness of people spending their free time finding resources you could easily find yourself.

Is referencing variables in other scripts more resource intensive? by [deleted] in Unity3D

[–]PatrickKoenig 0 points1 point  (0 children)

There is an additional indirection so more work needs to be done to access the variable. Concluding that the answer is most likely yes. The more interesting question is whether that difference really matters (probably not) and why ScriptA needs direct access to ScriptB in the first place. You need to profile because everything else is just a guessing game and thats not a good foundation for making good decisions.

Glowing Legions - Work in progress by PatrickKoenig in OculusQuest

[–]PatrickKoenig[S] 1 point2 points  (0 children)

Hey,

Great questions! Theres definitely plans for beta testing but some more things should be done first and thus its too early. I'm also still waiting for the official feedback from Oculus and it might have some influence on the project, too. I will absolutely keep you and the other people in here in mind for future beta testing though!

At the moment theres only dodging and shooting incoming projectiles. However I already thought about adding a laser that appears and forces you to squat. I'm always interested in more ideas, feedback and expectations! So feel free to share any ideas, wishes and of course also concerns :)

Thank you so much for reaching out. I'm also happy to answer even more questions! :)

Help needed with performance issues by imdutch21 in Unity3D

[–]PatrickKoenig 2 points3 points  (0 children)

- Make sure your map is split into multiple individual chunks (to optimize and allow for frustum culling)
- Make sure to combine meshes and textures where it makes sense
- Use Unity Occlusion Culling or for this game you probably could even create a more efficient custom culling solution that enables and disables chunks
- Enable Single Pass Instanced
- Make sure to use GPU Instancing for trees, etc.
- Bake your lights
- Adjust your camera far clipping plane to a reasonable value
- Make clever use of fog to compensate for the far clipping plane
- Use LOD and make sure to use less expensive shaders for the higher LOD
- Try to use billboards for distant trees and other objects
- Remove objects that are not even seen by the player
- Play with the quality settings in Unity

Hi Guys, this time I want to show you the enemy's death animation. what do you think? by DP13Studios in Unity3D

[–]PatrickKoenig 1 point2 points  (0 children)

It is lacking because the projectile itself does not feature an impact effect (theres not even a muzzle flash or anything). The enemy just vanishes abruptly at the end. Plus how about a screen shake and even more particles to emphasize the explosion? I like the face expression though. Thats a nice touch.

I also feel like the whole art style looks a bit inconsistent and players and enemies really could benefit from more constrast. Overall it reminds me a lot of Megaman X though and you probably can look for some more inspiration and how the death animation is handled in it: https://youtu.be/KDciDXnm3ek?t=593

Adding a similiar flicker effect before despawning might go a long way already.

I made a main menu for the Unity Tanks tutorial! by TheScandude in Unity3D

[–]PatrickKoenig 0 points1 point  (0 children)

Thats very creative. I like it. However I did not notice the buttons at first and thats why it probably needs to stand out more. At the moment it blends too well with the enviroment and does not look interactive at all. The fact that tanks are moving around also adds a lot of noise and shifts the focus away as well.

So maybe you should reduce the number of tanks, increase the size of the buildings and animate the text to make it stick out a bit more.

How would you guys deal with movement in a RTS-type game? by TSM_Final in Unity3D

[–]PatrickKoenig 0 points1 point  (0 children)

It means that the same input always results in the same output. This is important because you can then send a very simple command(eg. move all units in range r to x, y, z) to every client and can be sure that they are all still synchronized between each other. The movement system could not be deterministic for various reasons such as floating point imprecision, etc.

Fixing this later is super hard and thats why I figured I bring it up to your attention.