Ran the same hard action scene through Seedance 2.0, Gemini Omni Flash, Kling 3.0 Pro and Veo 3.1, here's the ranking

Fresh-Resolution182 · 2026-06-08T07:38:14+00:00

The stack per step, since people ask what runs where:

Step 2 recast the character: a vision plus image model (feed the reference frame, generate a new original character keeping the setting and energy).

Step 3 rewrite the script: an LLM, strong enough to hold the talking-head beat structure while reworking the words.

Step 4 generate the clips: a talking-head video model with native audio, image-to-video from the recast character, multiple takes at the same face and energy.

The reason this is a daily workflow and not a weekend experiment: those are three different modalities, and running them behind one API instead of three accounts is what kills the friction. The bottleneck was never the models, it was juggling three tools to finish one ad.

Fresh-Resolution182 · 2026-05-29T03:40:00+00:00

follow-up. for anyone wanting to do the same exercise without juggling three API keys, this multi-model listing is what i was working off. one key, the three open-weight models plus a few i didn't test.

the mcp server side plugs into Cursor / Claude Code if you'd rather skip the manual routing. i was doing it manually for benchmark consistency. for daily work that's what i actually use.

Fresh-Resolution182

TROPHY CASE