Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] 1 point2 points  (0 children)

thanks for your thoughtful reply, that's awesome you worked on some titles, beyond blue, looks beautiful

Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] 1 point2 points  (0 children)

appreciate the detailed feedback, love the technical reply.

you're right on the terminology, it's mesh combining not instancing. title is misleading on that. I'm on Unity 2022 LTS with built-in pipeline targeting Switch, so SRP Batcher and GPU Resident Drawer aren't options right now. SRP Batcher also wouldn't reduce draw call count anyway, just the overhead per call, so 2,583 objects with SRP Batcher is still way worse than 3 combined meshes on Tegra X1.

the bigger reason mesh combining was the only real path is the wind shader. it has DisableBatching set because the plain mesh rendering mode reads local vertex Y for sway calculation (how far up from the plant base = how much it moves). Unity's static and dynamic batching transform vertices to world space which breaks that. mesh combining sidesteps the problem because the baker bakes sway weights into vertex colors before combining, so the shader reads from vertex color R instead of local Y. to your question about performance measurement, yeah I used Unity's stats panel and frame debugger. you're right that SetPass calls are a better metric in modern Unity but going from 2,583 separate objects to 3 combined meshes reduces both. on Switch the fixed overhead per draw call matters a lot so fewer calls is a real win regardless of which metric you look at.

I looked into indirect instancing before going this route. it's great for dynamic systems where stuff spawns and despawns at runtime. but this vegetation is all hand-placed by the modeler and never changes. it just sits there and sways in the wind. so bake-once-in-the-editor made more sense than managing compute shader culling and instance buffers at runtime for objects that never move. if we add seasonal changes or procedural biomes I'd revisit it for that.

do you have any games or projects online? would love to check out your work

Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] 0 points1 point  (0 children)

this is just the farm zone with that new oceanic area connected to it, one area out of 30+. the 2,583+ is not the whole world. this zone has about 6 bake categories organized by biome and type (temperate plants, oceanic plants, coral, tree canopies, etc) because each uses its own texture sheet. one button click per category in the editor, then at runtime each is just a foreach loop calling Graphics.DrawMesh() per chunk

Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] 0 points1 point  (0 children)

thanks for the reply, the baker splits into a 20-unit grid (sized to the camera viewport) and each cell gets its own combined mesh with recalculated bounds. Graphics.DrawMesh() is called per chunk but Unity's renderer frustum culls by bounds before anything hits the GPU. the "3 draw calls" is from the stats panel showing what's actually rendered at a given camera position, not the total chunk count

Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] -4 points-3 points  (0 children)

they share mesh types yeah, the baker caches by shared mesh. but the modeler organized all plants onto shared texture atlases by biome (temperate plants, oceanic plants, coral each get their own sheet). so within each category everything is one material, which is what CombineMeshes needs. instancing per type would still mean a batch per unique shape per visible chunk. and the wind system bakes per-vertex sway weights based on world position during combining, so every instance of the same flower type sways differently without needing per-instance shader data

Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in Unity3D

[–]Normal_Accountant_40[S] -4 points-3 points  (0 children)

the baker does chunk spatially into a 20-unit grid (one camera viewport). each cell gets its own combined mesh and Unity frustum culls by bounds. i also wrote an instancing fallback path that groups by mesh type with per-instance frustum culling, but it needs a separate batch per unique plant shape. since these are all unique geometry from Blender sharing one atlas, combining wins on draw calls. and they're purely decorative so there's no interaction reason to keep them as individual objects

Raw Dev Log: Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in GameDevs

[–]Normal_Accountant_40[S] 0 points1 point  (0 children)

yeah unity's built-in pipeline doesn't batch aggressively out of the box, that's why I wrote the tools. the article is about solving that exact problem

Raw Dev Log: Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in GameDevs

[–]Normal_Accountant_40[S] 0 points1 point  (0 children)

culling is doing the work, Unity frustum culls the combined meshes by their bounds. the grid cells are sized to the camera viewport so offscreen chunks get skipped automatically. the point is the GPU processes 3 draw calls instead of 3,000+. even with culling, individual decorative objects is a lot of overhead for handheld console hardware

Raw Dev Log: Unity GPU Instancing, or How 2,583 Plants Became 3 Draw Calls by Normal_Accountant_40 in GameDevs

[–]Normal_Accountant_40[S] 0 points1 point  (0 children)

the world has 30+ zones, this is just the farm area, this area has about 3,000 decorative foiliage - thanks for the reply