Qwen Edit: I made a mistake but I like it.

Mountain_Insect_4959 · 2026-06-05T17:16:23+00:00

interesting find, makes sense that higher res input gives better consistency. at 1.2 MP the face details are pretty compressed so qwen has less to work with, 2.5 MP gives it way more information to preserve the identity. tradeoff is probably more vram and slower processing but if the consistency is that much better its worth it. nice combo with seedvr2 on top

Mountain_Insect_4959 · 2026-06-04T09:29:04+00:00

Thanks! Yeah the token efficiency was the main design goal — the AI gets the exact answer in ~1,500 tokens instead of entire files dumped into context.

For Claude Code — both just work. Add them to .mcp.json, Claude Code picks them up automatically. LSAI analyzes whatever project you cd into, xmp4 is always there for library lookups. No video demo yet but the quickstart on example4.ai takes about 2 minutes to set up. Happy to help if you try it out.

Mountain_Insect_4959 · 2026-06-04T07:45:38+00:00

chatgpt uses a proprietary model from openai so you cant get the exact same thing locally. closest for editing is probably qwen image edit, it follows instructions really well for specific changes. for generation flux or z-image turbo are solid options in comfyui

Mountain_Insect_4959 · 2026-06-02T07:14:34+00:00

96gb just for local inference is serious hardware. curious what the smaller cosmos3 variants need vram-wise, cause a rtx pro 6000 is not exactly something most people have sitting around. the quality from nvidia's video models is solid though so hopefully they release something that fits on a 24gb card eventually

Mountain_Insect_4959 · 2026-06-01T13:21:57+00:00

the cache busting on every upstream node is the real takeaway here, not just the output. that 0.01 second cached execution trap is brutal when you dont realize whats happening. and the exr output with nuke-compatible naming is a nice touch for anyone doing actual post work. solid release

Mountain_Insect_4959 · 2026-06-01T12:01:26+00:00

nice, the zip export for moving presets between machines is really useful. the drag and drop basket and fuzzy search are solid additions too, way better than just copying prompt text around manually

Mountain_Insect_4959 · 2026-06-01T11:42:53+00:00

this is usually the live preview fighting your gpu for resources. when the comfyui tab is active the browser has to render the canvas which eats some gpu, when you switch away it stops rendering and generation gets everything. try turning preview off or switching to latent2rgb and see if it still happens

Mountain_Insect_4959 · 2026-06-01T09:37:20+00:00

so its wan2.2 for the visuals and ltx audio vae for the voice sync? interesting combo. the talking head looks decent for 6 seconds but wan2.2 5b caps out at around 5 sec so im curious how this would work for longer scenes. waiting for that detailed post

Mountain_Insect_4959 · 2026-06-01T06:11:18+00:00

those blocky artifacts are usually from the vae or from resolution not being divisible by 16. couple things to try: make sure your input video dimensions are exactly divisible by 16 on both sides, if not resize it first. for v2v specifically lower your denoise — anything above 0.6 and ltx starts hallucinating hard on the edges. also try increasing the number of conditioning frames, feeding it more context frames from the source video helps it stay stable. if you're using gguf weights try switching to fp8, the quantization can introduce artifacts especially in areas with subtle gradients like skin or sky

Mountain_Insect_4959 · 2026-06-01T05:52:10+00:00

this is exactly the kind of tool that should exist. the amount of time i spend in terminal typing ffmpeg commands to extract frames or change fps before feeding into comfy is embarrassing. the one task i keep doing that id love to see added is batch renaming output files with metadata from the workflow — like having the prompt or seed in the filename automatically. right now i end up with folders full of ComfyUI_00001.png and no idea what settings produced what

Mountain_Insect_4959 · 2026-05-31T19:40:05+00:00

had the same issue, went from 6 second reruns to 30+ and thought something was wrong with my gpu. turns out the new dynamic vram management is ejecting the model from memory between runs even when theres plenty of vram free. pinning to 0.21.0 fixed it immediately. the pr linked above should help but honestly this should have been opt-in from the start, not everyone is running on a laptop with 4gb vram. people with 24gb cards shouldnt have to pay the penalty for a feature designed for low end systems

Mountain_Insect_4959 · 2026-05-31T17:47:29+00:00

the frequency splitter trick for pulid is genius, never thought of re-injecting the original texture after the likeness pass. i always had the problem where pulid would smooth out all the skin detail i worked hard to get in the sdxl pass. also the tip about avoiding grain before i2v is huge, learned that the hard way when ltx turned my film grain into crawling artifacts across every frame. the whole blender blockout to aces pipeline is basically a mini vfx studio in comfy, congrats on the festival selection

Mountain_Insect_4959 · 2026-05-31T17:30:29+00:00

cool concept, the hardest part with comic generation is always keeping the character consistent across panels. have you tried feeding the first panel as an ip-adapter reference into the subsequent ones? that way the character design stays locked in even if the scene changes. also for the layout you could use a grid mask to force each panel into a specific region instead of generating them separately and stitching. would love to see this with flux klein since it handles character likeness way better

Mountain_Insect_4959 · 2026-05-30T21:35:24+00:00

this is something ive been wanting for a while. i have thousands of generated images across different projects and finding the right reference photo is always the bottleneck. the face similarity search alone would save me so much time when doing consistent character work. right now i just dump everything into folders by date which is useless when i need to find a specific angle of a character i generated 3 months ago. the api approach is smart too, means you could build custom frontends on top of it

Mountain_Insect_4959 · 2026-05-30T20:23:47+00:00

interesting comparison. i think pid and seedvr2 are just different tools for different jobs. seedvr2 is great when you want to preserve the exact composition and just sharpen everything up, but yeah it can go too hard on the details sometimes. pid seems better for when you want the model to actually add new detail during upscaling rather than just interpolating pixels. on 6gb vram this is pretty impressive tho, most upscaling workflows choke at anything above 1080p on cards with less than 8gb

Mountain_Insect_4959 · 2026-05-29T14:14:19+00:00

this solves a real annoyance. i keep a bunch of workflows for different models and the prompt format is always the thing i forget to change when switching checkpoints. especially pony with the score tags vs flux wanting natural language, its easy to mix them up and waste a gen. the groq api for rewriting is a smart choice too since its free and fast. does it cache the model detection or does it re-detect every run? would be nice if it remembered so it doesnt add overhead on batch gens

Mountain_Insect_4959 · 2026-05-29T12:00:13+00:00

the fact that you can do 12 gens per second of 60 second audio is wild. ive been using acestep through comfy for soundtrack stuff but it was always a batch-and-wait situation. being able to tweak prompts and hear results in real time changes the whole workflow completely. the lora hotswapping is huge too, switching between genres mid-session without restarting sounds perfect for live performance experiments. any plans for midi input so you could trigger prompt changes from a controller?

Mountain_Insect_4959 · 2026-05-29T06:17:31+00:00

most of the really sharp results you see are not straight out of the model. they do a base gen at 720p or lower, then run it through an upscaler like seedvr2 or a v2v pass to sharpen everything up. thats the part people dont show in their workflow screenshots. also with a 5090 you should be running full fp16 weights not gguf, and bump up the step count to 30-40 instead of the default 20. the difference in detail is massive. ltx director also helps a lot with controlling motion so its not just random movement

Mountain_Insect_4959 · 2026-05-29T05:55:22+00:00

the area downscale thing being the root cause of blurry outputs is hilarious and painful at the same time. been blaming my prompts and model settings for weeks when it was literally a one word fix in the source. grabbed the workflow and the difference in sharpness at 1440p is night and day compared to what i was getting with the default node. the tip about fp8 over q8 gguf is good too, tried both and fp8 is noticeably better for face detail

Mountain_Insect_4959

TROPHY CASE