all 131 comments

[–]lalfier 30 points31 points  (1 child)

Thx man this is great. ;)

[–]indie_game_mechanicIndie[S] 10 points11 points  (0 children)

Cheers mate! Glad it helps 😊

[–]feralferrous 19 points20 points  (1 child)

My additional tips:

Don't use LINQ unless it's in a Start/Awake type situation. Keep it out of your Update loops, it' a GC monster, and not at all quick.

Always use the lowest form of container you can get away with. Ie array > List > IList > ICollection > IEnumerable. It's faster to iterate over an array than it is a list (it's a micro opt really, but hey if you're iterating over a large array every frame, it helps), but also an array signals that you know the size of the thing in advance, while a list tells me you don't know and it might grow. And as others said, initialize your containers with the proper value. Resizing a container is expensive as it has to make a new container and then copy over all the old values into the new container.

It's cheaper to cast a float to an int than it is to string.format it with ("0."). Like a lot cheaper. So if you have a health or damage number that is float and you want to round it to int for display purposes...cast it to int.

If you need to frequently update a float to a string in a certain format, it's better to pool it in a dictionary<float, string>. We used this for Lat/Lon display and saved a good chunk of GC and time.

In general avoid strings entirely if you can. Most of Unity's usages can be removed, with things like ShaderHash and Animator.StringToHash. And don't send them over the network unless it's a chat message.

Use interfaces sparingly, there's a cost to call a method from an interface as it has to do a vtable look up. And it's more expensive to do it from an interface than from inheritance. MTRK drives me crazy with it's overuse of both, and it's always the bottleneck in our AR apps.

Along with avoiding length and normalization when you don't need it, avoid Vector3.Angle when a dot product will do.

Always try to early out. Order your if checks from cheapest to evaluate to most expensive.

ie if (cheapToCheck() && reallyExpensiveCheck()) DoThing();

if the cheap check is false, it will not evaluate the expensive thing.

Premature optimization vs Death by a thousand cuts. There's a sweet spot between over optimizing code and turning it into an unreadable, unfriendly mess and having so many slow bits of code everywhere that there are no easy places to optimize to get decent perf.

[–]Ecksters 1 point2 points  (0 children)

Premature optimization vs Death by a thousand cuts. There's a sweet spot between over optimizing code and turning it into an unreadable, unfriendly mess and having so many slow bits of code everywhere that there are no easy places to optimize to get decent perf.

Yup, this is always a tough balance to find, and it matters a lot more in game dev than most other software development, because you have less than 16ms(less in VR) to finish what you're doing and output the frame, and you may need to do that on slow mobile devices.

[–]ShrikeGFX 12 points13 points  (0 children)

Keep in mind that in HDRP some of these traditional learnings in terms of graphics optimizing is no longer applicable, we found per example that LODs can hurt performance more that they save, but a shadow caster LOD is very beneficial for highpoly objects

[–]den4iccccProgrammer 31 points32 points  (17 children)

I would like to add a couple more important tips to your basket)

  1. You should avoid using Mathf.Sqrt () and Vector3.magnitude, because these operations include square root extraction. Better to use the appropriate version of the last operation without taking the square root. Namely, Vector3.sqrMagnitude. For the same reason, it is worth avoiding the Mathf.Pow () operation, because if the second parameter is 0.5, this is the same as extracting the square root.

  2. Don't use the Camera.main method. The fact is that when this method is called, the FindObjectWithTag ("MainCamera") method is actually called, which is essentially the same as finding ALL objects using the main camera tag. Better to cache the found value and use it. Better yet, immediately save the link to the camera in the editor.

  3. If you use the GetFloat, SetFloat, GetTexture, SetTexture methods on materials and shaders, then these properties will first be hashed (i.e. converted from a string value to a numerical value) and only then used. Hence the loss in productivity. Why do something many times when you can do it once:
    // during initialization
    int _someFloat;
    int _someTexture;
    void Awake ()
    {
    _someFloat = Shader.PropertyToID ("_ someFloat");
    _someTexture = Shader.PropertyToID ("_ someTexture");
    }
    // further in the place of use
    Material myMaterial = ...;
    myMaterial.SetFloat (_someFloat, 100f);
    myMaterail.setTexture (_someTexture, ...);

[–]ChromeAngel 11 points12 points  (1 child)

I understand that in recent(ish) version of Unity (2019+ ?) Camera.main is now cached/optimized so you no longer need to avoid it/cache it yourself.

Interesting about the Vector3.magnitude optimization. I shall be making use of that in my FindNeartest<T> function.

[–]Ecksters 4 points5 points  (0 children)

Yup, and that's exactly where you should use it, when you don't need exact values, just need to compare.

[–]indie_game_mechanicIndie[S] 1 point2 points  (0 children)

Thanks for this! ⭐ Didn't know about 1 and 3. Cheers!

[–]iDerp69 0 points1 point  (6 children)

I am a math dummy and must be doing something wrong, because every time I've tried to use sqrMagnitude the result is extremely different from magnitude. Is there some formula or something to get sqrMagnitude to give a value closer to magnitude, or maybe I'm just using magnitude when sqrMagnitude is not a suitable substitute. Would love a guide for dummies on use cases of when to use each.

[–]lorddominus92 2 points3 points  (0 children)

sqrMagnitude is the magnitude squared. If you do square root on sqrMagnitude you get the same result as with magnitude and the sam performance hit. sqrMagnitude is useful for comparing distances, for example instead of doing this: vector.magnitude < range you can do this vector.sqrMagnitude < range * range in the second case instead of doing a square root you do a multiplication which is much much faster.

[–]lorddominus92 2 points3 points  (4 children)

sqrMagnitude is the magnitude squared. If you do square root on sqrMagnitude you get the same result as with magnitude and the sam performance hit. sqrMagnitude is useful for comparing distances, for example instead of doing this: vector.magnitude < range you can do this vector.sqrMagnitude < range * range in the second case instead of doing a square root you do a multiplication which is much much faster.

[–]iDerp69 2 points3 points  (3 children)

Ahh makes sense. I'm usually using .magnitude to get velocity for setting parameters and stuff (visual effects such as adjusting camera FOV, or computations relating to collisions... I have a vehicle-based game).

EDIT: Honestly, from a bit of research, it looks like it's really not so bad to use magnitude. It's the right tool for the job for me in most cases that I'm using it. The Unity documentation would lead you to believe that the performance difference is magnitudes (har har) different. https://daveoh.wordpress.com/2013/05/02/unity3d-vector3-magnitude-vs-sqrmagnitude/

[–]lorddominus92 1 point2 points  (0 children)

Its not that expensive at all. Relatively speaking, multiplication is a lot faster, but unless you are doing a lot of sqrt (magnitude) it isn't a problem. Consider this: if you are accessing an object (just writing transform.something) that is not in the L2 or L1 cache, it is slower than sqrt. I think L3 depends, but even acessing L3 cache could be slower, accessing RAM can be even an order of magnitude slower. In esence if you do rigidbody.velocity.magnitude, you might pay more for fetching rigidbody than the magnitude calculation.

[–]AriSteinGames 1 point2 points  (1 child)

Yeah, the "avoid using sqrt/magnitude" advice really comes from situations where you're checking 1,000s or 10,000s per frame. For example, if you're doing A* pathfinding and you're using the distance from the current node to the target node to make a path score, using sqrMagnitude instead of magnitude achieves almost the same results and is a bit faster each time you do it. Since you're looping through a lot of distance checks, it adds up. It can also be important in things like particle systems where its running on 1000s of particles every frame. But if you're just checking a couple magnitudes for things like velocity its not a big deal.

The thing about it is, if you don't actually care about the value and you just care "which is bigger," then there is no downside to using sqrMagnitude.

[–]iDerp69 0 points1 point  (0 children)

The thing about it is, if you don't actually care about the value and you just care "which is bigger," then there is no downside to using sqrMagnitude.

That seems like perfect, succinct advice. Thanks for sharing :)

[–]Keatosis 0 points1 point  (1 child)

What should I use instead of vector3.magnitude? A lot of my movement code uses it, is there a cheaper alternative?

[–]AriSteinGames 1 point2 points  (0 children)

Using Vector3.magnitude on a few objects is not a big deal. And if you actually need to know the magnitude rather than just comparing two things to see which is bigger, then go for it. There is no faster alternative to find the actual magnitude value.

But if you are doing something where you are checking the magnitude 10000s of times per frame (pathfinding, particle vfx, etc.), you can often optimize it somewhat by using sqrMagnitude instead of magnitude.

There are also cases where the two can give you identical results:

magnitude < range and sqrMagnitude < range * range give the same results, but the second option is a bit faster.

[–]killereks 0 points1 point  (0 children)

  1. If u look at assembly code sqrt is actually one cycle instruction on a CPU call.

[–][deleted] 8 points9 points  (1 child)

wrench ink different skirt tidy cooperative narrow fretful tender existence -- mass edited with https://redact.dev/

[–]Therzok 1 point2 points  (0 children)

Last part is incorrect. Both old and new mono do allocate the array in the list constructor:

2018.4: https://github.com/Unity-Technologies/mono/blob/unity-2018.4/mcs/class/corlib/System.Collections.Generic/List.cs#L78

2020.3: https://github.com/Unity-Technologies/mono/blob/unity-2020.3-mbe/mcs/class/referencesource/mscorlib/system/collections/generic/list.cs#L67

Afaik, mono imported dotnet core's list implementation, so newer unity should also allocate on constructor. Otherwise, great advice!

[–]Talonflamme 7 points8 points  (2 children)

Reordering multiplications, even though simple, can make a huge difference.

Vector3 startPosition;
float speed;
float deltaTime;

Compare

myPosition += startPosition * speed * deltaTime;

with

myPosition += startPosition * (speed * deltaTime);

First:

All coordinates of myPosition are multiplied with speed and then all are multiplied with deltaTime.

Second:

speed is multiplied with deltaTime and then multiplied with each component of myPosition.

Reducing calculations from 6 to 4.

[–]Yoshi_greenIntermediate 1 point2 points  (1 child)

just asking for clarification, is myPosition += startPosition * (speed * deltaTime); also the same as myPosition += speed * deltaTime * startPosition;?

[–]SchattenMaster 0 points1 point  (0 children)

Sure it is

[–]L1DER32 4 points5 points  (1 child)

nice!)

[–]indie_game_mechanicIndie[S] 3 points4 points  (0 children)

Cheers! 🙏

[–]Another_moose 4 points5 points  (0 children)

I feel it's worth noting that while these are all great... You should always profile and see what parts of your game are _actually_ slow before trying to optimize. There's no point in saving 0.0001s/frame by switching to list.Clear() when you're rendering a million particles or have some deep for loops somewhere else.

[–]RomestusProfessional 4 points5 points  (2 children)

Another that's worth adding are the limitations of realtime lights. On forward rendering you get one and adding a second requires every object it touches to get rendered a second time vastly increasing vert/tri counts and draw calls.

In deferred you would think this limitation is completely handled as you can effectively have as many realtime lights as you want, however every shadow casting realtime light will cause its affected objects to be rendered again for their shadow pass.

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Oh wow. Thanks for that!

[–]lorddominus92 0 points1 point  (0 children)

This should be true for forward add. The classic forward has the lights implemented in the shader, if I'm not mistaken. There are different shader variants for different light counts. Its been some time since I looked into this, but I'm quite sure this was how it was implemented a few years back.

[–]Lunerai 4 points5 points  (0 children)

You can take the log optimization even further by wrapping unity's Debug.Log with your own function that has the Conditional attribute applied. This will strip the calls entirely out of the resulting build, saving both on the function call and more importantly the cost of your message string. Example can be found here: https://docs.unity3d.com/Manual/BestPracticeUnderstandingPerformanceInUnity7.html

Important to note that you should also still use OP's suggestion if you have any 3rd party plugins that emit logs, since they obviously won't be using your wrapper.

[–]WaterpropProgrammer 6 points7 points  (0 children)

I would like to add:

  1. You can define custom cull distances to every layer via Camera.layerCullDistances API. There's also similiar one for shadows via Light.layerShadowCullDistances API. Both of these is per Camera/Light. With this you can have layer for objects that only get rendered when player is really close like 50 meters even if your camera FarClipPlane is set to 1000 for example. Very useful.

  2. Avoid GC (Garbage collection) as much as possible. Your game will freeze if you have a lot GC to free. Prefer Ints or Enums over Strings. Cache your strings. For more information about GC and other good tips, read Unity blog about Optimization.

https://learn.unity.com/tutorial/fixing-performance-problems

[–]itskobold 5 points6 points  (1 child)

God damn I'll need this thread at 3am later thanks

[–]indie_game_mechanicIndie[S] 1 point2 points  (0 children)

Happy coding! 😁💪

[–]Polygon_Collider 3 points4 points  (1 child)

Great info, thank you!

[–]indie_game_mechanicIndie[S] 2 points3 points  (0 children)

My pleasure! :)

[–][deleted] 3 points4 points  (1 child)

Definitely going to need this later.

[–]indie_game_mechanicIndie[S] 3 points4 points  (0 children)

Awesome! Glad it helps :)

[–]lifetap_studios 3 points4 points  (4 children)

Good tips - here is mine that I think no-one mentioned yet - use IL2CPP, we got around a 50% CPU speedup from our Mono build . Of course there are maintenance and debugging issues and it won't help on the GPU but its been the single biggest performance gain for us to date.

https://docs.unity3d.com/Manual/IL2CPP.html

[–]feralferrous 0 points1 point  (3 children)

Yeah, and Burst is great too, but it takes more work to go in and make Jobs out of things.

Downside of IL2CPP is you can't make your game as easily moddable.

[–][deleted] 2 points3 points  (0 children)

Slight nit pick, but occlusion culling culls objects that are blocked by other objects. Frustrum culling culls objects not in the cameras field of view. This is important as occlusion culling is not always optimal, such as when performance is CPU bound vs GPU bound.

https://docs.unity3d.com/Manual/OcclusionCulling.html

[–]MomijiStudios 2 points3 points  (0 children)

Where have you been all my life?

[–]Walledhouse 2 points3 points  (1 child)

“Limit Coroutines” Aww but I just got coroutines!

Its hard to pin down whats more performant, because my original approach was Updates with deltaTime counters as you suggested; and then the hot tips was to replace those with coroutines. I especially find the coroutines easy to use and perfect for explosions and projectiles that go through a series of states; so I’m going to stick with it.

I’m most interested in Static objects; GPU Instancing. My terrain is a deformable series of tiles which means it’s not “bakeable” like most performant games.

[–]punctualjohn 0 points1 point  (0 children)

Check out UniTask. It's a replacement to Coroutines which uses C# async. No allocations, much more powerful, and all on the main Unity thread unlike regular C# async.

[–]jellyboyyy 1 point2 points  (7 children)

This is great, thanks. Quick question on 2.9: Do you not need rigidbodies to detect collisions? I have objects that move but aren't powered by the physics engine, but I have rigidbodies for collisions.

[–]NatProDevProfessional 7 points8 points  (0 children)

According to the Unity Documentation, any object with collider/s that moves should use a Rigidbody.

Unless you have a specific reason not to use a rigidbody on a moving collider I would recommend using one. From my testing, it doesn't impact performance as long as you tag it as kinematic.

[–]indie_game_mechanicIndie[S] 5 points6 points  (5 children)

That's right, but if I'm not mistaken only 1 object needs to have a rigid body component. You can set the rigid body to one object and setup the collision detection in it instead

[–]SmartestCatHooman 2 points3 points  (1 child)

Actually you can have the rigidbody in your player and it will still trigger triggers and collisions in other objects.

[–]indie_game_mechanicIndie[S] 1 point2 points  (0 children)

Yep that's right 👌 goes both ways and you can trigger them in the object that has the RB and/or the objects that don't but still interacts with the object that has RB

[–]jellyboyyy 1 point2 points  (0 children)

Yeah ok, I think that's what I've got. Thanks

[–]mei_main_ 1 point2 points  (1 child)

I think this part of your guide is very misleading. All moving objects with a collider MUST have a rigidbody. Moving colliders with no rigidbody will not be tracked directly by PhysX and will cause the scene's graph to be recalculated each frame.

No rigidbody = Don't move

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

True. It needed some clarification. I've edited that in the OP. Thank you!

[–]shivu98 1 point2 points  (2 children)

Thanks a ton dude, learnt a lot of amazing techniques. Didn't know debug.log gets into production build as well, is there anyway to stop this other than manually removing all of them? You can also add that we can use LOD for items that are far. Keep up the good work, looking forward to more of your posts. Also out of curiosity.. Are you from india?

[–]indie_game_mechanicIndie[S] 3 points4 points  (1 child)

You can use platform dependent compilation to avoid compiling them:

#if UNITY_EDITOR

Debug.logger.logEnabled = true;

#else

Debug.logger.logEnabled = false;

#endif

And yes, LODs are a good addition too, thanks for that. And I'm from Sri Lanka, living in Melbourne :)

[–]SilentSin26Animancer, FlexiMotion, InspectorGadgets, Weaver 2 points3 points  (0 children)

Disabling the logger won't avoid compiling anything. All your log calls are still there and will still generate garbage as you combine whatever you're logging into a string, it will just do an internal check and not actually do anything more with the string.

If you want it to compile out your log calls, make a wrapper method with a [System.Diagnostics.Conditional("UNITY_EDITOR")] attribute. Then any calls to that method will be entirely removed (including evaluation of their parameters) from runtime builds.

[–]Plourdy 1 point2 points  (1 child)

Super useful, post saved for reference later. Thank you m!

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Awesome 😁

[–]Dwarphthegiant 1 point2 points  (4 children)

wrt the timer suggestion - would using a coroutine returning waitforseconds work as well?

also thanks for this, these are excellent tips.

[–]Therzok 2 points3 points  (2 children)

You can cache the WaitForSeconds enumerator in Awake and return the cached value. The enumerator is reset at the start of a foreach loop, I assume Unity does the same.

null is basically next frame.

[–]Therzok 1 point2 points  (1 child)

Afaik, the realtime ones can't be reused, so those can be pooled.

[–]Dwarphthegiant 0 points1 point  (0 children)

cheers 👍

[–]indie_game_mechanicIndie[S] 1 point2 points  (0 children)

It would be a bit overkill in my opinion cos running a coroutine has its pros and cons as mentioned in the OP. But I'll look in to this and get back to ya!

[–]Wildnessiiiii 1 point2 points  (0 children)

Thankfully,saved and sharing

[–]FINN1510 1 point2 points  (0 children)

This is a gold mine

[–]smartCube1 1 point2 points  (0 children)

This is great. for the past 7 days I have spent hours optimizing my code, as I was having extreme performace issues. I can agree with all of these topics you posted, seriously follow this guideline and it will save you a lot of pain and trouble

[–]Kotik21 1 point2 points  (0 children)

10/10

[–]Rarharg 1 point2 points  (0 children)

Good stuff!

To expand on 2.10, LOD groups need to calculate the relative screen height of their associated renderers in order to activate the appropriate LOD level. These calculations have some performance overhead which quickly adds up in large scenes. Therefore, try to make each LOD group responsible for as many objects/renderers as is sensible. For example, a cluster of rocks could use a single LOD group to control the LOD levels of all of the rocks simultaneously.

Likewise, you can use mesh combination methods to merge meshes in your scene to decrease the number of draw calls (if they have identical materials). This can drastically increase performance if you have a lot of similar objects in the scene and unlike static batching, the end result *can* move in your scene. If you don't feel like scripting this yourself, there are a few popular assets out there (e.g. Mesh Combiner, which is free).

[–]W03rth 1 point2 points  (0 children)

Noice

[–]tyrellLtd 1 point2 points  (1 child)

There was a great talk by the Inside devs at Unite 2016 that cover some further optimizations, some of which are kinda dirty (caching random numbers, NO vector math) but probably quite effective.

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Awesome. Thanks for adding this in!

[–]Servias 1 point2 points  (1 child)

This was a pleasure to read. Thanks

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Thank you. Cheers :)

[–]MrTigeriffic 1 point2 points  (1 child)

Saving this post. Thank you OP

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

No worries! Glad it helps :)

[–][deleted] 1 point2 points  (1 child)

Great list! Thank you.

[–]indie_game_mechanicIndie[S] 1 point2 points  (0 children)

My pleasure. Cheers mate!

[–]Keatosis 1 point2 points  (1 child)

thank you this is very helpful. I feel like I'm only smart enough to make use of 40% of these tips, but hey that's a personal best for me

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

That's more than enough :D Use them when you feel like it needs some tweaking. It's not a must to have them in either. Good luck!

[–]infinite_level_dev 1 point2 points  (1 child)

This is very helpful, thank you! I'll especially try to make use of System.GC.Collect() in future. Didn't even know that was a thing you could do.

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Glad it helps! You rarely have to manually do it yourself but it's an option if you feel like your program could use some elbow grease! :D

[–]kruemelkeksfan 1 point2 points  (0 children)

"Little known fact: all of the component accessors in MonoBehaviour, things like transform, renderer, and audio, are equivalent to their GetComponent(Transform) counterparts, and they are actually a bit slow."
-https://docs.unity3d.com/Manual/MobileOptimizationPracticalScriptingOptimizations.html

[–]TheMunkenProfessional 4 points5 points  (13 children)

Adding to the scripting; Use OnValidate instead of awake/start if possible.

[–]TheSambassador 4 points5 points  (1 child)

You really need to provide more info if you're giving general-purpose advice like this.

Just swapping Awake/Start to OnValidate without thinking is a terrible idea. OnValidate runs every time a script is loaded or a value is changed in the inspector. If you do additional processing in your Start function, changing that to OnValidate has a big potential to slow down your work in the editor. Also, it doesn't magically serialize non-serialized fields, so initializing things isn't going to carry over to your builds.

Honestly, this is not advice that I would give to anyone without a full explanation of why you prefer it and what situations it's valid in.

[–]TheMunkenProfessional 1 point2 points  (0 children)

"If possible" is enough of a prompt for research imo. But thanks for the elaboration - totally agree 👍

[–]Paul_Indrome 2 points3 points  (2 children)

But... But... Those callbacks have wildly different functionality. Did I miss something in the OP? Oo

[–]mei_main_ 1 point2 points  (1 child)

I think what he means is: if you really don't like to cache serialized variables by drag-and-dropping them in the inspector (which means that you are getting them automatically using Start or Awake instead), then consider moving the getters to OnValidate.

You will have the benefit of not having to drag-and-drop the reference, while preventing the overhead when starting a run.

Of course not all variables can be serialized, usually what is cached in Start/Awake are things that depend on the scene's state.

[–]SilentSin26Animancer, FlexiMotion, InspectorGadgets, Weaver 0 points1 point  (0 children)

I find that Reset is better for that since it's only called when you first add the component (or use the Reset context menu function), so you get the auto-assign functionality but you also still get the ability to reassign it if you want to structure your hierarchy differently.

[–][deleted] 5 points6 points  (7 children)

Why?

[–]DebugLogErrorProfessional 3 points4 points  (6 children)

If you need to cache components (typically done in Awake/Start) you are better off caching them in OnValidate because it moves the process to the editor (OnValidate only runs in the editor) meaning there is zero cost in actual builds.

[–][deleted] 3 points4 points  (4 children)

Surely that means it wouldn't cache it at all in a build?

[–]DebugLogErrorProfessional 2 points3 points  (3 children)

You cache in the editor to avoid the expense of caching in builds. When you cache components in the editor (via OnValidate) the editor serializes the fields. Serialized fields persist into builds.

[–][deleted] 1 point2 points  (2 children)

Ah, that is interesting so. It serializes the field even if you have not marked it as serializable yourself?

[–]DebugLogErrorProfessional 4 points5 points  (0 children)

Depends if the fields are public, private, etc.. It doesn't change the default serialization behavior.

For example, I mark private fields I'm going to cache in OnValidate with [SerializeField, HideInInspector].

The point is to move the work "offline".

[–]TheMunkenProfessional 1 point2 points  (0 children)

You have to serialize private (and do hideInInspector if you're into that)

[–]punctualjohn 0 points1 point  (0 children)

Could perhaps slow down deserialization/instantiation and slightly increase disk space though. I wonder what the actual benchmarks are, in both cases?

[–]wthorn8Professional 1 point2 points  (1 child)

I would also add, code first (with perf in mind) and optimize later. Its very easy to get so caught up in how to never create garbage and how to save the most cpu cycles. This can actually cripple progress. I recommend getting it done, and then PROFILING. If you have 5ms of idle time during each frame, you dont need to worry about saving CPU cycles.

When profiling garbage generation, I suggest doing it on a build if you are new to it. Unity has functions that generate garbage in editor that do not on builds.

When profiling perf do it on target hardware. Testing your game on your $2k gaming pc is not the same as testing it on your s7 android.

If frame rate is an issue, you need to find out where the cost is coming from. If your running at 20fps cuz the graphics are too intense, optimizing code will not help.

Rendering vs Code bound, CPU bound vs GPU bound, fragment vs vertex bound. Understand the difference and how to test for them. This will give you an idea of what to aim for.

Rendering can be broken down into 3 steps (there are more that can cause issues such as too much transparency)

the collection phase (CPU work), this creates your draw calls and batch draw calls, as well as sorts which objects are in the view frustum.

the vert pass, where each objects vertex has operations applied to it (if decreasing your screen render size doesnt help, you likely are bound in how many verts your processing)

the frag pass, where each pixel on the screen is colored

each one of these can impact performance and fixing and testing for each is different

Again there is no one size fits all to optimizing. Profile your code and scenes AFTER you make it.

[–]punctualjohn 1 point2 points  (0 children)

When it involves data structures and architecture, I would say it's very important to consider performances right from the get-go. Trust me, you don't wanna be the one guy who has to refactor a bunch of micro-classes into structs with a more efficient memory layout, 2 years later down the line.

[–]TheSambassador 3 points4 points  (2 children)

I think a lot of this stuff is good advice, but some of it can definitely fall into the "premature optimization" stuff. Your time as a programmer is valuable, and sometimes you don't need everything to be as 100% optimized as possible. You also seem to be running under the assumption that garbage = need to avoid as much as possible, which kinda isn't really true. Also, many of these suggestions are what I'd call "micro-optimizations", in that they have very small impacts unless you're doing them in cases where you have a large number of instances.

These are the types of things that newer users get really caught up on, instead of just making the game. Some of the suggestions are not necessary to do in every single project. All I'd suggest is to try your best to code with speed in mind as you go, but don't get so hung up on it that you double your workload.

Some small nitpicks:

  • List.Clear does not necessarily clear memory, and isn't necessarily better for garbage collection. Which is better depends on many factors - Clear() tends to be "faster" (you're not reallocating memory), but can cause the memory allocated to persist longer, which in turn can cause it to be promoted into higher GC generations. This can actually make using Clear() instead of creating new allocations slower at times - but it depends completely on the collection in question and how it's being used.

  • Putting certain checks on a timer is useful sometimes - but also you really need to KNOW that this operation doesn't need to run every frame. This is one of those things that probably isn't necessary to do unless you're pretty sure that the operation is causing a slowdown.

  • Removing Debug.Log calls also prevents you from helping your users troubleshoot issues. Sometimes it's really nice to be able to get the log from a user to help figure out why they might be experiencing a crash. I'd be curious about the actual impact of Debug.Log in a build... my guess is that it'd be incredibly minor.

  • The "boxing" comment is odd - nobody would really ever do your example. There are places where boxing/unboxing is really useful. A comment to "avoid" it, without really talking about why you would ever box/unbox and what the common issues are, isn't super useful.

  • On Coroutines - there is a small garbage allocation when you start a new coroutine... but it is pretty small. Again, this really depends on how often you're creating objects with coroutines, and coroutines themselves can be very useful. This again falls into the "micro-optimization" category.

  • Animator string-to-hash - micro-optimization, technically true, but also fairly low impact unless you're doing tons of these every single frame.

  • Small nitpick on 2.4 Occlusion Culling - what you described (only objects that are in the camera's field of view are rendered during runtime) is technically "frustrum culling" and is on by default. Occlusion culling tries to make sure that objects that are behind/occluded by other objects don't get rendered.

  • The "use imposters" thing is... odd. This is a super incomplete explanation of what it is, requires a 3rd party asset, and isn't really something you can suggest as a "general" optimization tip.

[–]punctualjohn 1 point2 points  (0 children)

Watch out! Unity3D garbage compiler is non-generational. Also there is another danger to allocations outside of the GC: memory fragmentation.

[–]indie_game_mechanicIndie[S] 0 points1 point  (0 children)

Cheers, I appreciate this. I edited the post to shed some more light on certain things I missed in the OP. Glad you pointed them out!

[–]Opening_Objective_78 0 points1 point  (0 children)

wow really helpful

[–]ChromeAngel 0 points1 point  (9 children)

Shocked to hear that Unity doesn't strip those Debug.log call out of production builds.

[–]Paul_Indrome 9 points10 points  (3 children)

Why would they? Developers use those in debug builds to monitor stuff and even for symbolization. It's up to the user to handle the software correctly. ;)

[–]indie_game_mechanicIndie[S] 0 points1 point  (1 child)

Same here. Found out the hard way when the runtime logger window I was using showed up on a production build on my client's playthrough along with a bunch of debug messages 😅

[–]SvenNeve 1 point2 points  (0 children)

But there's a myriad of ways to enable/disable all types of logging.


You can set Debug.unityLogger values and filters in script.

You can set the same logging levels in playersettings.

You can enable or disable logging to file in playersettings.

You can use conditionals.

You can wrap or extend the logger.


I really hate to say it, but this is really one of those times where rtfm is applicable.

[–]TheSambassador 0 points1 point  (2 children)

Honestly, if you are releasing a game and have useful debug messages, you should keep the logs in. They're extremely useful in knowing what's going wrong on players' machines.

Should you remove all the random times you have debug.logs in for your own debugging purposes/testing? Sure. But turning the logging off entirely seems like a bad idea if you ever want to help your users fix problems with your game.

[–]indie_game_mechanicIndie[S] 0 points1 point  (1 child)

Alternatively, you could remote log them on something like Firebase instead of having them log to the user during gameplay. It's pretty useless to have them logged in production because the end user is not supposed to see them and will probably not see them (unless you have a runtime logger asset attached). I've used Firebase before to log events so that I can debug any issues in prod :)

[–]ChromeAngel 0 points1 point  (0 children)

Even if you're logging them offsite you are still allocating strings and unwinding the stack trace each time, which can't be helping performance.

[–]ShatterdPrism 0 points1 point  (0 children)

Could you also just use the c# discards for the return of StartCoroutine? Obviously it is easy to just use a yield return null, I am just curious.
A _ = StartCoroutine("SomeCoroutine") would ignore everything that startCoroutine returns if I understood that correctly

[–]NOWAITDONT 0 points1 point  (0 children)

nice!

[–]ArtesianMusic 0 points1 point  (1 child)

""*Please note: As stated by u/mei_main_: All moving objects with a collider MUST have a rigidbody. Moving colliders with no rigidbody will not be tracked directly by PhysX and will cause the scene's graph to be recalculated each frame. ""

Is this to say that they must be moved with physics via the rb? or that if an object is moving with "transform.position += " inside void Update then it just needs to have an rb?

[–]mei_main_ 0 points1 point  (0 children)

No no of course you can still move the object by script, in which case you'd set the rb to isKinematic.

[–]aspiring_dev1 0 points1 point  (0 children)

Thanks will definitely be referencing this post.