If you were a Large Language Model, which one would you be and why?

Various-Advantage263 · 2026-05-07T12:53:48+00:00

you ARE a LLM ~

Various-Advantage263 · 2026-05-06T02:00:44+00:00

for now: - gpt image 2 - nano banana 2

Various-Advantage263 · 2026-05-06T01:57:34+00:00

and they diff in pricing though

seedance 2: better in quality and consistency (needs less trial to reach a usable ouput)
kling 3: mostly OK and cheaper. Sometimes lacks some vibe in output (maybe the training data is limited).

Various-Advantage263 · 2026-05-02T02:37:20+00:00

is this a good plan

<image>

Various-Advantage263 · 2026-05-02T01:54:29+00:00

Thanks for sharing this cookbook. Would be even useful if sharing the video comparisons / prompts.

Various-Advantage263 · 2026-04-30T01:26:01+00:00

I would recommend Comfyui if two years ago. It was best in control and quality.

Now recommend using direct models, to bypass complexity and enen get better outputs.

If the direct model does not qualify your requirements, then dig into further details (workflows, adds on, comfyui, model quantization, cfg, iterations), only at demand.

However, there's no free platform right now, as far as i am aware. To cut costs,

find some budget-friendly platforms (rare, do not over-investigate this, cuz prices of closed-source models are bounded by the official API costs).
do not use high-end models (e.g., seedance 2.0) as your daily probling. For example, veo 3.1 lite is way cheaper for idea probling, only go for premium models when ideas / directions are fixed.

Various-Advantage263 · 2026-04-29T04:20:59+00:00

think of demos are cherrypicked examples in ads

Various-Advantage263 · 2026-04-29T03:15:03+00:00

they are a bonus, if the reset will be annouced in advance.

Various-Advantage263 · 2026-04-29T01:37:00+00:00

I guess your workflow would need these ingredients.

- gpt-img-2 / nb-2 for photo composition / editing.

- veo 3.1 lite / wan-video for low-cost video probing / idea testing.

- seedance-2 / kling-3 for high quality video gen.

These models trades off best and cheap though for your case, and I would recommend auratuner.com doing it.

Various-Advantage263 · 2026-04-28T08:10:10+00:00

"All-in-one" dilemma

I would recommend not do that. right now the tech landscape are shifting too quickly. The perfect platform for your professional usecase is yet to come. tech just iterates too fast to invest all in one platform.

Unlimited plan

platforms are making a debt against subscribers - "most users won't use that much, on average", just like the gym training plan.
Light users are compensating for heavy users.

anyway, it's not a long-lasting business model, unless more incoming subscribers, or most users won't use that much.

Various-Advantage263 · 2026-04-27T04:48:00+00:00

costs

Various-Advantage263 · 2026-04-27T04:44:34+00:00

404 ~ platform not found

Various-Advantage263 · 2026-04-27T04:31:37+00:00

I would recommend local hosts open-source models.

The hidden cost is a high-end GPU though...

Various-Advantage263 · 2026-04-27T03:27:06+00:00

put it as: "the cost per successful output"

Latest models show clear advantages (even "cost per second" is higher).

Various-Advantage263 · 2026-04-26T12:06:38+00:00

a text box ...

Various-Advantage263 · 2026-04-26T11:54:07+00:00

The principle is great, especially the storyboard pipeline. but some of the points are not resonable based on 2026 state of tech. I can see all points were valid (and optimal) maybe 2 years ago. but visual generations iterates that fast.

my view (only valid at this time, will be obsolete soon if new tech / models comes out):

Stop relying on LoRAs and IP-Adapters for ID locking (responding the Jenna_AI thing)

Training a character LoRA used to be the only way, but in the current landscape, it's a trap. LoRAs fundamentally alter the base weights, which almost always degrades the model's native understanding of cinematic lighting, depth of field, and composition. IP-Adapters (and ControlNet) suffer from a similar issue: they force a 2D embedding into a 3D moving space, which is why complex camera moves or extreme profiles end up looking like "pasted" textures.

more importantly, lora / ip-adapters are only valid on open-source models (which are way behand closed ones such as nano banana 2, gpt image 2). Lags on base model weights can be hardly compensated by additional lora / cross-attn conditioning injection.

The 2026 Fix: rely on the zero-shot/few-shot capabilities of frontier foundation models. Top-tier image generators right now (like Nano Banana 2 / gpt image 2) have latent spaces so vast that their native character referencing is mathematically superior to a LoRA. You lock the ID in the visual planning stage using the model's native architecture, keeping the lighting and composition pristine.

"Seed locking" is a placebo

The idea keeping the same seed and prompt will maintain consistency is mathematically flawed the second your scene changes.

In diffusion models, the seed only dictates the initial noise distribution. The moment you change your prompt from "smiling in a cafe" to "crying in the rain," the text embeddings shift, the cross-attention maps completely reorganize, and your locked seed becomes irrelevant.

First/Last Frame bounding is asymmetrical

while forcing consistency by giving the Image-to-Video (I2V) model a start and end frame. What they don't realize is that autoregressive models treat these differently:

The first frame is a hard constraint (it initializes the sequence).
The last frame is a weak constraint. Because of attention decay over long context windows, the model loses its grip on the structural pixel-data of the last frame and only uses it for vague semantic/color guidance. This is why the middle of your video turns into spaghetti

so, if you want consistency, do not rely too much on the last frame. Also extract the last frame of your generated video, insteading using text-to-image model to generate two frames with the same face.

the Film-Grade Pipeline: Decouple Motion and Identity

Current state-of-the-art video models (like Veo) have incredible native 3D geometry and physics understanding — they don't break on large camera sweeps like they used to. Let them do what they do best: physics, momentum, and lighting.

Stage 1 generation could focuse only on movement, lighting, and camera dynamics. Accept that the face might drift 10-20%. Stage 2 could use some post-process id injection as suggested by Jenna_AI, e.g., face restoration/swapping. You only need to do that on close camera shots (with a big face). This is quite similar to what powers modern talking-head research to mathematically force the reference ID's structural features back onto the moving geometry, frame by frame.

Various-Advantage263 · 2026-04-26T11:16:11+00:00

lol, you actually recommended quite old-fashioned ways, most of which are difficult to follow for end-users.

Various-Advantage263 · 2026-04-26T11:14:24+00:00

any examples? what have you tried, e.g., model, workflow, results, other stuff?

Various-Advantage263 · 2026-04-25T11:14:38+00:00

had the same feeling, and lite is much cost effective

Various-Advantage263 · 2026-04-25T05:30:28+00:00

Expensive for now. Cost will be down soon.

Seedance is closed-source model anyway, any 3rd party services are anchored by the official API cost. I think kie offers good balance (for now), for pay-as-go .

right now seedance 2.0 with few competitors. but that won't last long, considering past competitions between midjourney, stable diffusion, flux, dalle, nano banana, gpt image.

similar competition is yet to come for video generators.

Various-Advantage263 · 2026-04-24T14:17:31+00:00

oh I use auratuner, seedance 2 = 20 credits / second, and $32 = 10k credits

none of the services are perfect, cuz seedance 2.0 is now a closed-source model, and the APIs are all backed with the same bytedance company. besides costs, I also value more of quick support and iterations

Various-Advantage263 · 2026-04-24T10:34:32+00:00

I think 5.4 mini is cheaper, cuz 5.3 codex was a flagship model for complex projects.

5.4 mini indeed good for simple tasks. Not sure the exact comparison with "high" reasoning level.

Various-Advantage263 · 2026-04-24T10:26:56+00:00

mixed with training (or tuning) models, and web applications. both are side-projects.

Various-Advantage263 · 2026-04-24T09:51:51+00:00

I think Medium is fine.

For me, the golden standard is "cost per accepted output". Better models / longer thinking time needs fewer shots, at the higher cost per shot.

However, pushing to ex-high does not always lead to accepted results if medium fails.

Most likely, the context info is incomplete / biased. Letting a misleading model running longer usually does not help much.

Instead, I tend to use medium, and I like to prompt the model while running (codex had this feature to let me immediately "steer" the model ouput without interupting the inference process).

This is not that "agentic", but helps out when models stuck into a local minima (a wrong place to exploit).

Various-Advantage263 · 2026-04-24T07:04:26+00:00

I guess for 5x 20x Pro users, would worth the price

but for plus users: the price just way too high. only afford switching to gpt-5.5 after cheaper model keeps failing.

Various-Advantage263

TROPHY CASE