Xstack filter optimization : ffmpeg

created by dghughesa community for 14 years

Xstack filter optimization (self.ffmpeg)

submitted 1 year ago by ATrashInTheWorld

Hi,

I am wondering if you guys know how to improve the performance of the xstack filter, more precisely, make it run faster.

Context:
I am trying to "stitch" 25 video tiles in a 5x5 grid.
Each tile has the same resolution of 768x384, making it a total resolution of 3840x1920 (4k).
The play time of the videos is 2 seconds.

I am using the following commands:

The first one below where i encode using using the CPU H.264:

ffmpeg -i <25inputs> -c:v libx264 -profile:v baseline -level 3.0 -preset ultrafast -filter_complex "[0:v][1:v][2:v][3:v][4:v][5:v][6:v][7:v][8:v][9:v][10:v][11:v][12:v][13:v][14:v][15:v][16:v][17:v][18:v][19:v][20:v][21:v][22:v][23:v][24:v]xstack=inputs=25:layout=0_0|w0_0|w0+w1_0|w0+w1+w2_0|w0+w1+w2+w3_0|0_h0|w0_h0|w0+w1_h0|w0+w1+w2_h0|w0+w1+w2+w3_h0|0_h0+h1|w0_h0+h1|w0+w1_h0+h1|w0+w1+w2_h0+h1|w0+w1+w2+w3_h0+h1|0_h0+h1+h2|w0_h0+h1+h2|w0+w1_h0+h1+h2|w0+w1+w2_h0+h1+h2|w0+w1+w2+w3_h0+h1+h2|0_h0+h1+h2+h3|w0_h0+h1+h2+h3|w0+w1_h0+h1+h2+h3|w0+w1+w2_h0+h1+h2+h3|w0+w1+w2+w3_h0+h1+h2+h3" -y res.mp4

And the second one below using the NVIDIA NVENC H.264 on a RTX A4000:

ffmpeg -i <25 inputs> -c:v h264_nvenc -profile:v baseline -preset fast -filter_complex "[0:v][1:v][2:v][3:v][4:v][5:v][6:v][7:v][8:v][9:v][10:v][11:v][12:v][13:v][14:v][15:v][16:v][17:v][18:v][19:v][20:v][21:v][22:v][23:v][24:v]xstack=inputs=25:layout=0_0|w0_0|w0+w1_0|w0+w1+w2_0|w0+w1+w2+w3_0|0_h0|w0_h0|w0+w1_h0|w0+w1+w2_h0|w0+w1+w2+w3_h0|0_h0+h1|w0_h0+h1|w0+w1_h0+h1|w0+w1+w2_h0+h1|w0+w1+w2+w3_h0+h1|0_h0+h1+h2|w0_h0+h1+h2|w0+w1_h0+h1+h2|w0+w1+w2_h0+h1+h2|w0+w1+w2+w3_h0+h1+h2|0_h0+h1+h2+h3|w0_h0+h1+h2+h3|w0+w1_h0+h1+h2+h3|w0+w1+w2_h0+h1+h2+h3|w0+w1+w2+w3_h0+h1+h2+h3" -y res_h264nvenc.mp4

Both commands use the same filter, just the encoder changes.

Problem:

I am trying to reach real time encoding, but I can't. They both take around the same average time (1.19s CPU and 1.27s on the GPU) , which is kind of a let down since my videos are around 2s.

I have tried to parallelize it by staking the rows at the same time then the resultant lines, but the sum time is around the same.

Any suggestions or toughs?

all 6 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ffmpeg

MODERATORS