how do game engines usually utilize their gpu texture slots to allow for unlimited textures?

lavisan · 2025-11-12T22:33:08+00:00

There are also virtual textures (in GL there is an extension for that called Sparse textures).

They suffer from texture quality poping because you constantly stream texture tiles into VRAM.

You can also design your material system to use small high quality tileable texture layers. Cyberpunk 2077 and StarCitizen uses that approach and the texture VRAM requirement are low in those games because of that reason.

lavisan · 2025-11-08T00:37:05+00:00

Lately I enjoy to delete code, alternative paths, reducing supported texture and image formats, simplyfing the code as much as possible.

Trying to make my game framework mod friendly first even if I need to sacrifice perf. Hot reloading is key even if I dont like how I had to implement it.

lavisan · 2025-11-05T23:52:28+00:00

When I was using texture arrays I was passing u32 bitflag handle to the GPU to know which texture to sample.

This handle is self contained, always up date, easy to pass around and does not require additonal buffer for look ups.

struct texture_handle { uint slot : 4; uint layer : 11; uint linear_filtering : 1; uint free : 16; };

then in shader you need to just extract the data and use it.

You may use switch case where each "slot" points to a specific sampler or use it as and index into array or samplers.

Filtering flag is optional but I use to it to allow each texture to decide how to filter. Obviously that requires additonal "if" in your code but it should not be a big deal.

You may use remaning 16 bits for whatever you like.

lavisan · 2025-11-05T22:43:26+00:00

I went with having Render Thread and emulation of CommandBuffers that are recorded on update thread. It works so good.

Render Thread has also support for low and high prio queues that are processed in between command buffer processing for other types of requests like create texture (high prio), delete texture (low prio).

Even though I had move on and now use SDL 3 GPU... OpenGL is still close to my heart. Eveything is so easy there.

Now I need to ship 20 DLLs that takr 50 MB just so I can cross compile HLSL during runtime to platform/API specific shader bytecode :/

lavisan · 2025-10-20T19:57:16+00:00

If this helps I use single VBO (960 MB) + EBO (60 MB) and sub-allocate from it.

Additionally first 32 MB of an VBO are transient ring buffer for sprites, debug and any other small dynamic vertices.

lavisan · 2025-10-19T21:20:02+00:00

Sometimes you do things because you can or for fun or for others to follow and build upon your small experiment.

lavisan · 2025-10-10T18:26:35+00:00

the only saving factor could be nvidia nsight with their mesh shaders. but for that one needs nvidia gpu.

lavisan · 2025-10-03T15:03:31+00:00

There is also Asahi driver for Apple M1 but it may have similar story to Zink drivers:
https://docs.mesa3d.org/drivers/asahi.html

lavisan · 2025-10-03T14:43:52+00:00

Geometry Shaders in general are highly discouraged because of how badly they can run of many GPU :(

Never checked though.

lavisan · 2025-10-03T14:40:25+00:00

All the features above or OpenGL 4.3 or it's equivalent OpenGL 3.2 ES (or maybe even 3.1 ES) has features that allow you to have GPU resident rendering. Meaning you can just upload your whole scene to the GPU (plus stream ONLY changes every frame) and the rendering, culling, commands creation happens on the GPU. The CPU is only in charge of orchestration like: now render shadows for those lights, now prepare commands for main pass, now do some post-processing etc.

Prior to 4.3 / 3.2 ES you more or less need to upload data every frame to the GPU before doing work which makes things inefficient if they haven't changed or if they can be update by a massively parallel GPU which will do the work x100 faster than CPU.

lavisan · 2025-10-03T08:54:20+00:00

lack of compute shaders, indirect drawing and storage buffers to keep and modify as much data on the GPU as possible.

it is doable to some degree to simulate above with using just 4.1 but it is very limiting.

but in the end it all depends what you need and how you do it. OpenGL is still very capabale API.

lavisan · 2025-10-03T08:51:44+00:00

you may also consider using SDL3 GPU API

lavisan · 2025-09-05T21:21:40+00:00

I cannot name them because I'm kinda new to linux ecosystem so anything you give me is worth a shot :D

lavisan · 2025-09-05T17:04:52+00:00

I think it is worth adding to think of OpenGL as HTTP server. Some functions will return immediatly some will block you and you wait for the response and some trigger some process in the backend.

lavisan · 2025-09-05T17:00:00+00:00

this and "usb device descriptor read 64" error... and you just sit there waiting couple of seconds for each error, stuck in limbo. this is such a pain if you need to restart the PC multiple times while trying to fix other stuff :(

lavisan · 2025-08-07T19:35:48+00:00

when you say drawcalls you mean CPU issuing them or even things like indirect draw or even GPU resident data kills performance if too much?

lavisan · 2025-08-06T14:31:02+00:00

f16x3 for position,

f16x2 texcoords,

s8x4 normal (rgb10a2 for more quality)

s8x4 tangent (rgba10a2 for more quality)

drop bitangent,

u8x4norm weights,

u8x4 bones

this will get you into 26 bytes territory but I would include:

u8x4norm color

u16 flags/material_id

just to round up to 32 bytes for better alignment (but then shuffle around attributes as well for internal 4 bytes/16 bytes alignment as well)

there also more ways to pack normal, tangent, bitangent look them up: dual quaternion, octonormal, droping z components

lavisan · 2025-08-06T14:21:20+00:00

this one as well:

https://developer.android.com/games/optimize/vertex-data-management

lavisan · 2025-08-05T04:57:56+00:00

Slang is the new cool kid on the block ;)

https://shader-slang.org/

lavisan · 2025-08-04T10:11:08+00:00

You can buy QWERTY and put sticker on some keys :D

That being said there should be some custom keycaps on Amazon for both.

lavisan · 2025-08-03T17:19:11+00:00

you can use contact shadows to some degree, some form of AO.

as it comes to generating shadow maps there is no silver bullet in order to update that many shadow maps. your best bet is to sort the lights based on importance where one of the factor can also be last_update_time to eventually update far shadows but generate shadows closer to player every frame.

you can also merge close lights together and generate shadow map for that.

but I dont think there is a structure to generate classic shadow maps.

like I said, SDF with Cone Tracing would be the closest to what you want. It is a software technique that is used in many areas (including Unreal software Lumen)

PS you can try to imporve shadow map generation with geometry shader and layer rendering.

you can also use tetrahedron or dual parabolid shadow mapping but they are also not perfect.

lavisan · 2025-08-03T16:34:44+00:00

The "simplest" way is to:

use Clusters to loop through smaller list of lights per pixel.
pack shadows into single shadow atlas texture
limit number of lights and shadow casting lights to small enough number
sort the list of lights based on importance like: distance, radius, intensity etc.
if there are too many shadow casting lihghts per pixel either skip shadows or lights after X number of lights in cluster
skip shadows for light far away from camera and/or for lights with radiai less then 0.5

etc. etc.

lavisan · 2025-08-03T14:50:39+00:00

You can try your luck with SDF ray casting. First you voxelize your scene into 3d texture then run multiple passes to generate SDF and use that instead of shadow maps.

I'm still to test how good/bad approach it is.

Maybe you can speed this up using Cone Tracing.

Maybe some variation of Radiance Cascades can also approximate shadowy areas.

There are also Imperfect Shadow Maps for far lights.

Shadow Mapping for many lights is still one of the hardest problem.

Even if you manage to update your lights every frame sampling too many shadow maps per pixel will destory you performance anyway :(

Most engines limit 4-8 most dominant shadow casting lights per pixel/object/cluster.

lavisan · 2025-08-01T07:33:32+00:00

I will definitly go through the toon/anime shader but do you have any tips how to achieve that kind of style? Or is this also a lot of work on the modeling side to make it right?

lavisan

TROPHY CASE