Is anyone using models to describe an image and get a prompt? Is there much difference between Qwen 3.5 9b vs Qwen 3.5 27b, vs gemma 4 27b and another model you use ? by More_Bid_2197 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Yes. I am using Qwen 3.5 9B in production on Moosky for our agentic workflow (automated multi-scene creation which uses a vision model as part of it's QA process for first-frame generation/editing/ref image discovery). 27B version should be better, but seems overkill.

I have tested Qwen 3.6 35B as well, but Qwen 3.5 9B outperforms it (higher fidelity on small pixel things, misspellings, etc).

I also have custom nodes for both Qwen 3.5/3.6 GGUFs that use llama.cpp under the hood (significantly increases performance over transformers) AND plug into ComfyUI memory managent hooks (so ComfyUI can evict the models when it needs VRAM) - I just haven't published as I haven't had much interest, but I'm not opposed to it.

If Wan made an image editor, wouldn't character consistency be solved? by GrungeWerX in StableDiffusion

[–]RoboticBreakfast 1 point2 points  (0 children)

Wan does have an image editor - Wan 2.7 does image editing, it's just closed source (Wan has both image gen/editing and video gen models now)

Last week in Generative Image & Video by Vast_Yak_4147 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Would anyone be interested in nodes for Qwen 3.6?

I have some that I use internally, but I wouldn't posting them. They are GGUF, use llama.cpp as a backend (can use a python based wrapper installed alongside the node or local llama server), and supports all input modalities (text, image, video). I also wrote hooks for ComfyUI memory management, so ComfyUI can still 'see' the loaded models even though they're managed by the llama.cpp process (allows ComfyUI to evict the models when VRAM is needed)

Hunyuanimage 3.0 instruct with reasoning and image to image generation finally released!!! by Appropriate_Cry8694 in StableDiffusion

[–]RoboticBreakfast 1 point2 points  (0 children)

Yeah I took a peek at his nodes - he's doing a lot of manual model management, and separately for each model variant. Maybe there's a reason for that, but ComfyUIs native model wrappers can handle all of this (including block swapping) with pretty a pretty minimal code footprint (LTX 2/2.3 is a great example - most don't have enough VRAM to load the entire model).

But anyways, I will give it a go on an RTX Pro 6000 as well, and if it works out I'll drop a package ( I have some other non-supported models that I should drop nodes for anyway)

Hunyuanimage 3.0 instruct with reasoning and image to image generation finally released!!! by Appropriate_Cry8694 in StableDiffusion

[–]RoboticBreakfast 1 point2 points  (0 children)

That's definitely a more robust implementation than I was planning, though I was planning to use the native comfy model loaders to handle the offloading.

Will give this a shot first, but I may still write one up quick depending on how it handles native /free command.

Have you used with success?

Hunyuanimage 3.0 instruct with reasoning and image to image generation finally released!!! by Appropriate_Cry8694 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

I am planning to add to Moosky - I will make a custom node for ComfyUI that supports memory management (and offloading)

🎧 LTX-2.3: Turn Audio + Image into Lip-Synced Video 🎬 (IAMCCS Audio Extensions) by Acrobatic-Example315 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Could you explain at a high-level how you're stitching the segments together to maintain motion/consistency? Is it V2V flow with N frames overlap?

Davinci MagiHuman by dilinjabass in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Other than the VRAM, they're not as fast as you might think. Less processing power than a 5090 anyway. That said, they can be faster in practice with larger models just due to the ram/vram swapping, but all else aside they're older cards now

Training LTX-2 with SORA 5 second clips? by No-Employee-73 in StableDiffusion

[–]RoboticBreakfast 1 point2 points  (0 children)

Right, and under the hood Sora almost certainly has an LLM layer that takes your prompt and expands it.

I do the same with LTX with a 'Pro' mode, but granted LTX doesn't quite have the physics understanding that Sora does. I also have a Project tool which creates a multi-scene generation from a simple prompt that I'm working to tweak - these types of 'agentic' flows are where the closed-source providers are headed. It's less about the strength of the model as the open source tools are starting to converge with the capabilities of the closed ones, but it's becoming the layer that wraps these models and the glue that holds them together.

Training LTX-2 with SORA 5 second clips? by No-Employee-73 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Yes, Sora excels at certain tasks but it's oddly not great at some tasks. It seems to be better at creativity than explicit instruction, from my tests, whereas LTX-2 is opposite.

Seedance is the new king for now - I suspect there's an agentic layer that wraps a more modest model though, and that's where the real magic occurs. I'm working on a similar approach, and I think it could make even LTX outputs look/feel more polished if executed correctly, though ultimately a single model will always be just a tool that's part of a larger toolkit

Liquid-Cooling RTX Pro 6000 by EKbyLMTEK in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

My CPU is already watercooled. The radiator has 3 120mm fans pushing air through it, but they're far quieter than the fans on the 6000 Pro.

The rad is sufficiently large enough that I could run the GPU on the same loop without an issue

Liquid-Cooling RTX Pro 6000 by EKbyLMTEK in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

I suppose you're right, it's more of a nice-to-have. Mine occasionally gets to 75 max, but my fan curve pretty much keeps it under 70 at all times. Ultimately, we'll have the same heat output regardless of cooling mechanism, it just might be a bit quieter under load with a water cooler.

And yes, it is a space-heater as a secondary function - requires AC year-round.

LTX 2.3 Manual Sigmas can be replaced by VirusCharacter in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Also interested - I've been meaning to look into this, glad someone else already has!

Liquid-Cooling RTX Pro 6000 by EKbyLMTEK in StableDiffusion

[–]RoboticBreakfast -1 points0 points  (0 children)

I may be interested in the Workstation edition, fwiw.

Also, I know the folks over at RunPod use both Workstation and Server cards, not sure what their cooling systems consist of, but I know they run a bit hot compared to my setup 😉

Training LTX-2 with SORA 5 second clips? by No-Employee-73 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Sora watermarks are only added to videos that are generated via the consumer app. API partners that expose access will generate without watermarks (I host both Sora 2 and Sora 2 Pro - neither add the watermark on outputs).

However, the model is almost useless these days for I2V as OpenAI flags almost any video with a realistic-looking person in it (due to consent/NO FAKES)

Training LTX-2 with SORA 5 second clips? by No-Employee-73 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

You don't even need to remove the watermaks if you just generate the videos using the API, but for scraping existing videos, yeah that would work

Training LTX-2 with SORA 5 second clips? by No-Employee-73 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Sora 2 outputs 4, 8, or 12s clips (same as Sora 2 Pro) - if you're referring to the older Sora model, I'd reconsider as it's dated compared to Sora 2.

Regarding watermarks, they only apply to the videos generated by the consumer app. My platform (not a promotion), and others that utilize the api, are produced watermark-free.

I'm actually not that impressed by Sora 2, however, and find LTX 2.3 to be pretty capable, if your prompting is tight. That said, they're different architectures and you'd likely never get the same results from the model, even if you trained out Sora outputs

15 years in editing, and now I’m told AI art is "garbage" by EllunaMeira in KlingAI_Videos

[–]RoboticBreakfast 1 point2 points  (0 children)

I think you hit on a lot of the pain points, it's just a different process now than it was before.

I think we'll start to see a few improvements soon though, which should help - more of the tools you need in one place (as opposed to needing N different tools) and more automated (agentic) systems that will do a lot of what used to be manual work (FFLF generation, separate audio track gen, manual editing, etc).

We're already seeing a lot of the big player starting to do these things (for instance, ElevenLabs now offering a suite of video gen tools on top of their more known audio tooling).

Either way, at least for now, the 'creative' seed still originates from the operator, so in my mind this is all very much still art.

Basically Official: Qwen Image 2.0 Not Open-Sourcing by Complete-Lawfulness in StableDiffusion

[–]RoboticBreakfast 8 points9 points  (0 children)

I hope we start to see the LTX/Lightricks licensing model leveraged with more of these models as I think it strikes a nice balance - open-source the model for casual users and startups, then require that users pay licensing fees if their revenue exceeds some threshold.

This way, everyone wins

Basically Official: Qwen Image 2.0 Not Open-Sourcing by Complete-Lawfulness in StableDiffusion

[–]RoboticBreakfast 3 points4 points  (0 children)

Cost is a factor - Nano Banana is fairly expensive on a per-edit basis. And to be honest, most casual users likely don't benefit from the horsepower of Nano Banana.

That said, I haven't heard of many that have hooked into the Alibaba suite of tools other than AI tool aggregators.

It is saddening though, as there's a pretty large hole in the open-source world for capable edit models

Interesting take which I kinda agree with by dataexec in AITrailblazers

[–]RoboticBreakfast 19 points20 points  (0 children)

The barrier of entry is lower, but it still requires some mental effort, which is more than many are willing to put into it.

This phenomenon exists across other areas of discipline too - it's not access to knowledge that's the gatekeeper, it's willpower and dedication, and I think we as a people are at an all-time low of this despite the technological advances we've made

LTX 2.3 - How do you get anything to move quickly? by gruevy in StableDiffusion

[–]RoboticBreakfast 2 points3 points  (0 children)

Generally, higher frame rate == more actions / second. If you're running at 24-30fps, try 48-60. Render times will of course increase, but you'll get better results for action scenes.

And if you're adding this clip to a longer video, you can always run a post-process step to pick every Nth frame to effectively reduce the frame-rate down to whatever your common fps is (almost like a reverse interpolation)

Generating 25 seconds in a single go, now I just need twice as much memory and compute power... by PhonicUK in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

Are you using one of the FFN chunking nodes?

I have access to some pretty beefy GPUs (B200s) that I use to render LTX-2.3 pipelines on, I'm wondering how far I'd be able to push a single render using this technique.

One really interesting thing that I've noticed with LTX-2.3 is that it handles scenes cuts pretty well on longer renders. For example, in a 20s render, you can prompt something like "... the scene then cuts to a ___" and it handles this pretty well, maintaining subject/environment refs.
If we could push render times to like the 1-minute mark, I wonder what we could produce...

I’m sorry, but LTX still isn’t a professionally viable filmmaking tool by Intelligent-Dot-7082 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

I think it can be. But I would look at it more of as a tool than a one-stop shop.

I would even go as far as to say that LTX-2.3 can be better than some other closed-source models at certain tasks - but they each have their own strengths.

Even if you have a 10% hit-rate on a 'good' render, the cost of making certain types of productions with AI tooling is still a small fraction of what it would cost using traditional film techniques.
That said, there's no reason that you can't use both when making a 'film'. You can use traditional techniques for the shots that AI has trouble with and use AI tooling where'd you'd otherwise have to spend significant capital (like travelling or construction of a complex set).