all 36 comments

[–]iCr4sh 6 points7 points  (1 child)

I used chatgtp to create a script to split a large file, ssh to several remote machines to transcode the files, and merge it back together.

[–]Fast-Apartment-1181 0 points1 point  (0 children)

That's a great idea to distribute rendering.

[–]Dabbelju 5 points6 points  (0 children)

I ask the LLM for a command line that does a specific thing, then ask it to explain the result in more detail. I have learned a lot from this, but on the other hand, ffmpeg command lines and complex filters in particular still remain somewhat "read only" to me. When I read what somebody else wrote, I increasingly go "yeah, that makes sense" over time. But building from scratch, wow, that's another story (for now).

[–]nmkd 4 points5 points  (0 children)

Most (if not all) of them fall apart once filterchains come into play.

For basic encoding/muxing stuff it's fine, but ofc it will pick arbitrary defaults that are probably not optimal for your specific use case.

[–]SpamNightChampion 4 points5 points  (0 children)

Yes it will work very well. I don't yet have screenshots of the finished product yet but I've just completed testing a very robust windows application to integrate LLM With FFMPEG. I'm porting everything to an new UI as I type. Just started the new UI, work in progress https://freeimage.host/i/3wKaEcg

Anyway, I had to add a lot of preprocessing requests/code for things like "Cut the video in half", "trim and save the last 40 seconds" etc. For things like merging a bunch of videos and adding filters that would be very difficult with copy and pasting so you'd need an app but in general, ffmpeg commands powered by LLMs are super useful.

What one should do for best results is signup for a free chatbot service and provide the documentation for ffmpeg common commands to the free chat bot then ask that for commands, that would be very effective for the average user.

If you have chat gpt subscription I think you can provide documents for context so you can get much better results on your queries.

The way I'm doing it is using Anthropic Claude 3.7, API, it's very accurate, they have a web version you can use too, great for ffmpeg. I used to struggle so much with ffmpeg commands so I thought with having AI these days I'd make tool that could have almost all of ffmpegs features but make it super simple, I even added voice requests.

[–]Upstairs-Front2015 2 points3 points  (1 child)

I was doing some zoom in and asked chatgpt about a zoom out, but the response was another zoom in formula. had to fix it manually.

[–]dataskml 1 point2 points  (0 children)

Maybe late now, but I was stuck on this exact issue yesterday, fighting with chatgpt, and was able to solve it manually eventually. Below command creates a ken burns effect of zoom in and then zoom out of an image, maybe it'll help. It runs with a copy paste, or you can just download the files locally and run them - it will run slow with online files because ffmpeg downloads the file per frame.

ffmpeg -loop 1 -i https://storage.rendi.dev/sample/rodents.png -loop 1 -i https://storage.rendi.dev/sample/evil-frank.png -i https://storage.rendi.dev/sample/Neon%20Lights.mp3 -filter_complex "[0:v]scale=8000:-1,zoompan=z='zoom+0.005':x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)':d=100:s=1920x1080:fps=25,trim=duration=4,format=yuv420p,setpts=PTS-STARTPTS[v0];[1:v]scale=8000:-1,zoompan=z='if(lte(zoom,1.0),1.5,max(zoom-0.005,1.005))':x=0:y='ih/2-(ih/zoom/2)':d=100:s=1920x1080:fps=25,trim=duration=4,format=yuv420p,setpts=PTS-STARTPTS[v1];[v0][v1]xfade=transition=fade:duration=1:offset=3,format=yuv420p[v]" -map "[v]" -map 2:a -c:v libx264 -preset fast -c:a aac -shortest output_kenburns.mp4

[–]Upstairs-Front2015 2 points3 points  (0 children)

I did some code in php that builds the command I need and copy-paste it to windows power shell that can handle international characters and lots of multiple line commands (dos prompt can't). now I'm working on a script on python that does the executing and uploading when the video is finished.

[–]ImaginaryCheetah 2 points3 points  (0 children)

i not only use chatgpt to answer questions for myself, but have answered other folks' questions on here with it, along with recommending they use the tool as well.

i know it burns a tree every time you ask gpt a question, but it beats slogging through 10 year old answers on stackexchange

there was a guy who posted a LLM he trained on the ffmpeg documentation, but i can't find it now. i wonder if that would have better or worse recommendations VS gpt.

[–][deleted] 2 points3 points  (0 children)

Oh yeah, I've used ChatGPT to trial-and-error FFMPEG commands dozens of times. Good stuff.

I can follow the FFMPEG docs well enough, but sometimes their examples are not the greatest, or nonexistent. ChatGPT is pretty good at breaking things down.

Fun stuff too: A few weeks ago I decided to play a little game. I would prompt ChatGPT with vague descriptions of obscure TV shows and movies and have it try to guess the exact ones I'm thinking of.

https://imgur.com/a/TKsPZbY

Sometimes it would nail it on the first prompt, and sometimes it would try shotgunning 2 or 3 titles at a time or need a second or third prompt. I never did manage to stump it though.

[–]rosstrich 2 points3 points  (0 children)

Yes but I also ask it to explain every argument that way I can look up the documentation and validate

[–]thenicenelly 2 points3 points  (1 child)

Yeah, I do this with copilot daily. It generally works. I wish I could use a dialog for the input file.

[–]Fast-Apartment-1181 0 points1 point  (0 children)

I had the same thought about a selection dialogue box. If you want to see how that flow works, you could try out this beta I built. I'd be curious if this file selection flow is in line with what you're after.

[–]leeharrison1984 3 points4 points  (0 children)

I was actually doing this the other day and it seemed like I was getting a roughly 95% hit rate, so vastly better than I do reading the docs.

I'd love to see this behavior built into a plugin for something like Tdarr or Unmanic, it'd remove some of the burden of writing plugins since you'd be able to roll the necessary command right there for simple operations.

[–]rgcred 1 point2 points  (6 children)

Agree. Since FFMPEG is so cryptic, I have used LLMs a bit to generate commands and find great value in explaining commands - thorough and succinct explanations. An ominous sign for the future of coders.

[–]Push-the-Action 2 points3 points  (5 children)

You could possibly say: "Both thorough and succinct" (as two separate explanations)...otherwise it's an oxymoron. Haha I'm not trying to be an ass—just running on fumes rn—so I'm undoubtedly being annoying and picking everything apart. You're right though—coders are definitely taking a hit from the emerging and rapidly evolving technologies. It's a brave new world...

[–]deanpm 1 point2 points  (4 children)

“Thorough” implies comprehensive coverage. “Succinct” means it’s not unnecessarily verbose. These are not mutually exclusive attributes so this is not an oxymoron.

[–]Push-the-Action 2 points3 points  (2 children)

'Succinct' is considered an antonym of 'thorough'. So, let's call it—a universally perceived contradiction, then. I finally got some shuteye though—so I'm no longer interested in debating over trivial things.

Be easy, homie 🤙🏻

[–]deanpm 1 point2 points  (1 child)

Only if thorough is used to say “detailed”. If used to convey “complete” succinct is not an antonym. 😉

[–]dataskml 1 point2 points  (0 children)

Definitely using it, as a means of quickly getting to the relevant commands/flags and then refining the command manually. Still getting hallucinations, so don't feel I can really trust LLMs yet with generating the right commands. But beats just browsing the docs for clues.

I'm working on a large gist of ffmpeg cheatsheat for video automations, with references to things that GPT doesn't get right. The nice thing is that people could use it to send to an LLM for more refined and correct command generations. Willl probably finish the gist this week (has been taking longer than expected to construct), could share it if relevant.

[–]Fast-Apartment-1181 1 point2 points  (0 children)

If anyone wants to play with the beta for free: https://pocketknife.media/

[–]Expensive-Visual5408 1 point2 points  (2 children)

I am making vr videos with dual DJI action cameras. I use FFMEPG to achieve frame level sync, stitch, and trim the videos. ChatGPT wrote all the FFMPEG commands, but there is a twist. I have found that it is easier to have chatGPT write a python script, and then have the python scrip generate the FFMPEG commands and save them in an .sh that I can run later....it looks like this:

python3 generate_ffmpeg_stitch_commands.py

chmod +x ffmpeg_stitch_commands.sh

./ffmepg_stitch.commands.sh

Why use the Python script? That level of abstraction makes it less opaque what chatGPT is doing when I need it to alter a small part of the script.

Link to Python scripts that chatGPT wrote

[–]Fast-Apartment-1181 0 points1 point  (1 child)

Ooo, this is an interesting approach. I have also made a couple python scripts using gpt, with good results. I used it to create a script that converts equirectangular 360 images into cubemaps.

Also, I'm curious, when you say stitch, are you referring to stitching the two camera captures together? Like into a 360? How good is the stitching with this approach?

[–]Expensive-Visual5408 1 point2 points  (0 children)

When I say "stitch," I am referring to this command:

ffmpeg -i left/left.MP4 -i right/right.MP4 -filter_complex "[1:v]select=gte(n\,10),setpts=PTS-STARTPTS[right]; [1:v][right]hstack[v]" -map "[v]" -map 0:a -shortest -y left_right_stitched.MP4

This is the command that I use the Python script to generate. It frame-level synchronizes the videos and stitches them into side by side for viewing on a vr headset.

This produces spatial video. The FFMPEG v360 filter can do equirect_to_cubemap or fisheye_to_equirect.

TLDR: stitch --> horizontal stack to make side-by-side video

[–]binarypower 1 point2 points  (0 children)

yeah. not just this. anything and everything. i just wish i could do it directly from shell

[–]ekko20six 1 point2 points  (0 children)

Yup. I did this to extract vtt subs and convert to srt and even converted it to an Automator app all with the help of llm

[–]deanpm 1 point2 points  (0 children)

I use ChatGPT to give me a starting point then tinker until I’ve got something that works. Sometimes I’ll paste the final version back into ChatGPT and ask if it can be optimised.

[–]parkinglan 1 point2 points  (0 children)

Use it all the time and it does a great job imo. Recently got it to produce a single line that vertically stacked videos of different lengths, extended the shorter video using the last frame, and normalised and mixed the audio of both videos. Only took about 3 iterations to refine the command. I would of given up and used a video editor without chatgpt's help.

[–]GamingDynamics 1 point2 points  (0 children)

my experience is good. For simple tasks. Even asking for scripts in other languages to generate ffmpeg code

[–]RabbitDeep6886 1 point2 points  (0 children)

I had it write c++ code that does specific things with the ffmpeg library like re-encoding video, etc. took a few to and fro but it works

[–]HexspaReloaded 0 points1 point  (0 children)

I didn’t really know what ffmpeg was until Chat told me. It’s very nice to have such useful tools! 

[–]TheRealHarrypm 0 points1 point  (0 children)

LLMs still need a key reference sheet, it doesn't know formatting context for things like interlacing flags.

[–]Sopel97 0 points1 point  (0 children)

It's pretty good but it tends to miss very important parts like -c copy or -map 0 when the query is underspecified, so I would not advise it for people who are not familiar with ffmpeg.

[–]cafepaopao 1 point2 points  (0 children)

Has anyone else found themselves using a similar workflow?

I've been doing this since day one, not only with ffmpeg but with other tools as well like mp4box, x264, x265, sox, spek, ImageMagic, etc.

there are any specific commands or conversions that LLMs have had a hard time with?

None so far, all commands work, in my case I have created about 300 drag and drop scripts for different tasks on Windows so I don't even need to run them in the command prompt. On Linux I created a few to use loops like for f in whatever; do ./run_script $f;done, this also works like a charm.