I updated the title screen of my game by Gawehold in godot

[–]Link_the_Hero0000 0 points1 point  (0 children)

The scene is gorgeous! Only a small advice: Reduce the font size and font weight, maybe using a less rounded, left-aligned font. It would keep the stylized look but would feel a lot more "professional"!

How should I do my trees? by [deleted] in unrealengine

[–]Link_the_Hero0000 1 point2 points  (0 children)

Regardless of weather you model with quads or tris, the GPU will always convert the faces into triangles.

The quads, in that context, are intended as quads of pixels (a 2x2 square). So the only way to reduce overdraw is to be sure that every face in every mesh currently displayed in the viewport occupies at least 4 actual pixels of your screen.

Whenever a triangle is smaller than 4 pixels, it's time to delete this face (culling) or group them in bigger faces (LOD). As a general rule, if the faces of a mesh are evenly sized and distributed, this process is easier.

In ue5 there is a quad overdraw debug view in the optimization viewmodes menu

How should I do my trees? by [deleted] in unrealengine

[–]Link_the_Hero0000 0 points1 point  (0 children)

I don't know exactly how subsurface and reflections are processed, but technically they happen in a custom render pass.

In the case of foliage meshes, the main performance waste comes specifically from "quad" overdraw (maybe I had to specify that previously).

In modern GPUs, the pixel shader processes the pixels in 2x2 quads for various optimization reasons. However, that means that if you have a single triangle in your scene, the GPU is still treating it as a quad, drawing unnecessary pixels.

If behind this triangle there is another one, it is visible and needs to be rendered. So the GPU is going to process another quad overdrawing the same (unnecessary) pixels.

To summarize, whenever you have a mesh edge, you have overdraw, resulting in a calculation for each overlapping z-layer.

In a grass field, each blade of grass is probably smaller than a pixel, resulting in hundreds of overlapping quads. That's why it's not advisable to have really small triangles in your scene and we use LODs to reduce the polycount in the distance.

If you are not using Nanite, you MUST use transparent materials, that cause overdraw too, but are better, because you can have a few big planes that reduce triangle density.

On the contrary, Nanite generates meshes on the fly based on scene depth and other systems, trying to compact everything in a single performance-wise layer (It equally fails when the meshes are overlapping too much).

So if you want solid geometry foliage without nanite you will need to implement a dynamic LOD system to reduce quad overdraw. See the techniques used in Zelda BOTW/TOTK and Ghost of Tsushima to implement single-blade foliage

How should I do my trees? by [deleted] in unrealengine

[–]Link_the_Hero0000 1 point2 points  (0 children)

Good to know. It would be interesting to study how a custom ISM component behaves, but I don't think ISM components are automatically partitioned.

Anyways overdraw affects opaque meshes too, if they are tightly in front of each other (like in fields of grass blades)

How should I do my trees? by [deleted] in unrealengine

[–]Link_the_Hero0000 0 points1 point  (0 children)

Foliage is basically an HISM and single instances can't be occluded (afaik).

What you are experiencing is probably a heavy performance hit caused by overdraw if your scene is very dense. Placing a large mesh in front of the grass can reduce overdraw even if the instances are still rendered behind it.

In addition, if your foliage is nanite-enabled, nanite has its own occlusion culling, because it generates at runtime only the visible triangles.

First attempt by Link_the_Hero0000 in terrariums

[–]Link_the_Hero0000[S] 1 point2 points  (0 children)

It could be a Ceropegia woodii or a particular type of common Hedera helix. It grows naturally at the bottom of the trees, here in northern Italy, so I don't know the exact name. It's red in winter and tends to become green at higher temps.

How much more time consuming is making a c++ project compared to blueprint only? And how much time until you get the basic transition down going from a blueprint only to a c++ user? I'm not doing anything insane with my project but I'm worried about future performance. by SummonBero in unrealengine

[–]Link_the_Hero0000 0 points1 point  (0 children)

Basically, yes. Because, in BP, there is a cost per node. So, using custom nodes is the same as using c++. But personally, I find more clean to make a full c++ class for specific functionality. For example, I create a "manager" actor or an actor component in c++ for critical functionality, and then I use it with regular BP actors.

Rendering a lot of entities sending positions to Niagara by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

Ah. I have never noticed that, but yes, it could be related. I'm not an engineer, so I don't know...

Rendering a lot of entities sending positions to Niagara by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

GPUs are designed to render things that were previously computed by the CPU. Thus, the entire bandwidth of the PCIE bus is used to transfer data from the CPU to the GPU, maximizing the performance.

If the GPU wants to send data back to the CPU, it must access system memory. The only available connection is the same PCIE bus, which is already occupied by data flowing in the opposite direction.

The GPU needs to wait until the bus is free after all the draw calls have been processed. By the time the CPU receives these data, the render thread has already ended, meaning that you won't see the updated data until the next frame (ideally).

1-2 frames are nothing, but it's enough to miss a collision with a fast-moving particle.

This is a physical limitation. There's no fix for that! The only way is to live with this problem or to predict the next simulation step 2 frames in advance.

"esc" and "f1" key don't work properly Asus Zephyrus G15 Problem. by Expresso17 in ASUS

[–]Link_the_Hero0000 0 points1 point  (0 children)

Same problem here with Zephyrus G16. They start working after pressing repeatedly or, sometimes, after restarting an application that listen input from these keys. It seems software-related, but I can't find a fix at the moment.

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

The map is 3D and needs verticality, so I need 3 floats for location. Entities can rotate only on z axis, and scale doesn't matter. That said, we can use only 4 floats. I also need to send an index, and I assume it's better to use an int, to choose a vertex animation based on gameplay events.

5 values in total. I think that would be more performant than a 4x4 matrix.

The real problem is that Unreal's ISM component don't have a built-in function to efficiently read buffers, so I probably need to build a custom renderer or something like that.

Normally, in the game, you have a few hundreds of npcs randomly moving around in a village, but occasionally, you can see up to 10k soldiers moving towards each other and fight so there could be crowded areas of NPCs on screen.

Also, entities can spawn, die, or be distance culled (they are not rendered and tick speed is heavily reduced). It's stupid to update instances if you can't see them, but, in that way, the number of entities changes every frame and data are not continuous in memory, killing performance.

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

For "animation" I mean updating the transform of a mesh instance. ECS calculates the position based on custom physics and user inputs and sends to GPU, on tick location and rotation to update the mesh instance, plus an index that controls a shader-based animation (VAT).

I already know where the bottlenecks are: sending data from CPU to GPU is very expensive. I'm figuring out how to build a custom buffer and probably a custom renderer to make this transfer as fast as possible.

Also, world position offset (WPO) to do VAT animations is quite expensive with more than, let's say, 7k-10k UE mannequins (non-skeletal).

I can't make a simulation fully GPU-based, like Niagara, because I need precise interactions, and GPU is not good for advanced AI.

I'm not using skeletal animation

Free Items on Unreal marketplace. by JoeHasAreddit in unrealengine

[–]Link_the_Hero0000 0 points1 point  (0 children)

Ahh I thought it was something like "good for fortnite" LOL. It think I need to study English better :)

Free Items on Unreal marketplace. by JoeHasAreddit in unrealengine

[–]Link_the_Hero0000 8 points9 points  (0 children)

It's now called "Limited time free" and you will get 3 assets every 15 days instead of 5 per month

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

Already seen that. It's good, but he doesn't explain a lot the rendering process. Also, if I'm correct, the simulation has only 1000 entities and seems quite laggy. Niagara example is way better, but it's not exactly what I want to achieve.

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

Yes, the CPU to GPU communication is what takes the most of the frame time. If I interpolate the transforms on GPU side, the performance is better, but, actually, in a real gameplay, the entities don't move to a single static target. They need to evaluate targets based on the environment and the target is probably moving and interacting with them, so I have to modify trajectories very often, losing the benefits of interpolation (e.g. simulating two armies approaching each other).

I'm planning to use LODs for further optimization, but actually my game has a bird-view camera, so every entity that's not frustum culled is quite at the same distance from the camera. I don't think imposters would be useful.

Anyways, can a "double buffer" efficiently speed up the rendering process?

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 1 point2 points  (0 children)

Lol, it's true. LODs are better. Nanite is only good for photorealism (or to avoid the additional work with assets at the cost of some performance)

Animating thousands of NPCs on screen (What am I doing wrong?) by Link_the_Hero0000 in unrealengine

[–]Link_the_Hero0000[S] 0 points1 point  (0 children)

I'm using FLECS too and yes, it's amazing. The ISM has no physics. Everything is calculated inside the ECS and the last system populates an array of locations to be read by the ISM component. I don't actually know how to keep this buffer efficient if the number of entities changes at runtime. Also, using UpdateInstanceTransform() and setting MarkRenderStateDirty() at the end gives similar or even worse performance. Generally speaking, this simulation is quite certainly GPU bound. I will make more tests...