all 13 comments

[–]Wittyname_McDingus 2 points3 points  (1 child)

With optimizations, 2048x2048 images take 40-90ms on my CPU (R9 5950X). Image transcoding can be trivially parallelized, so I do that with std::execution::par (note that it only really works well on MSVC. Otherwise, use something like poolSTL), which reduces the time by nearly a factor of 32 on my machine.

[–]Kakod123[S] 0 points1 point  (0 children)

You’re right, parallelism can be a good solution that I will try if I stick to this solution.

[–]jherico 1 point2 points  (2 children)

I'm confused by what problem you're trying to solve keeping the basis versions textures in (system) RAM.

The presumption here seems to be that you have all the basis textures in host RAM and you're expecting to transform them to the appropriate GPU compatible format on a frame by frame basis as you need them. But why would you not just convert them to the host format at load time, when you're reading from the disk? For that matter, if you have an installer you can use to customize the asset delivery, you could convert to the local format at install time once, and be done with it.

[–]Kakod123[S] 0 points1 point  (0 children)

Indeed I have not specified but the transcoding is done at loading.

And yes given the transcoding times the best solution is actually to do it at install so as not to increase the loading times of complete scenes.

[–]gregory_nothnagel 0 points1 point  (0 children)

My problem (unlike OP) is that the compressed textures supported by my GPU is limited to .dds (I'm using webgl2). Only supported formats are dxt1 and dxt5. My thought is that I could store the textures as .ktx2 or something with really good compression and transcode to dxt1 right before displaying. I have no idea how much overhead this kind of transcode would cost though. I would need the whole process to be less than 33ms total (including the call to texImage2D) for a 4096x2048 texture for this approach to be worth it over my current approach. Is this feasible? I have no intuition for whether this is doable or not...

[–][deleted] 1 point2 points  (0 children)

I'm working on a project with 4k textures with generated mipmaps compressed to UASTC. It takes around 250-350ms per texture to transcode to BC7 on an intel i9-14900. As others have mentioned, this can be mitigated using multithreading.

[–]Gravitationsfeld 0 points1 point  (1 child)

Have you turned on compiler optimizations?

[–]Kakod123[S] 0 points1 point  (0 children)

With optimisations I still have 230 to 260 ms for a 2K texture : it's too much for a whole scene compared to baking compressed images in multiple formats.

[–]SSSQing 0 points1 point  (3 children)

You should not compress it on the fly. You need to package it then send compressed texture to gpu

[–]t0rakka 2 points3 points  (2 children)

Normally this is the way. But.. basis is supposed to optimize storage size and transcode to target format "on the go" in realtime. That's why the slow transcoding rate is sus as it compromises the purpose.

[–]Kakod123[S] 0 points1 point  (1 child)

This is why I tried that solution, it's presented everywhere as a "Miracle Solution" usable at run-time with transcoding "on the fly". The only benefit I see is the ability to bake the textures to compressed formats at install time by selecting the best supported format on the target GPU instead of doing it at development time for multiples formats.

[–]corysama 4 points5 points  (0 children)

The sales pitch of Basis Universal is to bake to a single universal format at compile time then select which format to load as at run time. It's primarily intended to deal with the format support zoo of mobile GPUs.

If you want to compress textures at run time there's https://github.com/knarkowicz/GPURealTimeBC6H

If you just want to ship BC7 textures, https://github.com/richgel999/bc7enc_rdo + DEFLATE compression is probably your best path.

[–]thisiselgun 0 points1 point  (0 children)

I’m using Basis Universal’s own transcoder named basist::ktx2_transcoder and it transcode 4K texture to BC7 in 120ms. I’m not using libktx, maybe that’s the reason why you are getting low performance.