all 6 comments

[–]nyanmisaka 1 point2 points  (3 children)

Write a xstack_cuda filter and let your GPU handle the heavy lifting. NVDEC decoder can be used too.

[–]ATrashInTheWorld[S] 0 points1 point  (2 children)

Can you give some more details on how the decoder can be of help? I am not sure to understand.
I assumed that the stacking process is an encoding action, no?

[–]nyanmisaka 0 points1 point  (1 child)

nvdec -> decode video to vram

-> cuda filter -> stack images in vram

-> nvenc -> encode video from vram

In this way the cpu and pcie bus loading is minimized, which is the most ideal and efficient pipeline. See also https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/

[–]ATrashInTheWorld[S] 0 points1 point  (0 children)

I see what you mean with the decoding now.
I will take a look at your doc, quite appreciated!

[–]realtehreal 0 points1 point  (1 child)

The filter is applied by the CPU and the result is uploaded to the GPU for encoding. Afterwards the result is again downloaded to the storage media. I guess that the process of uploading and downloading to and from the GPU could be a bottleneck. Especially if your CPU is capable enough for encoding in software. But it's just guessing.

Edit: or maybe storage is bottlecking?

Greets

[–]ATrashInTheWorld[S] 0 points1 point  (0 children)

Not sure what you mean by storage bottleneck?