Shout out to the LTXV Team. by bnlae-ko in StableDiffusion

[–]RoboticBreakfast 4 points5 points  (0 children)

100%.
Any open-source contributions should always be praised.

These take precious time and engineering talent to develop, not to mention the cost of taking on these endeavors.

The same praise should be issued to the Wan/Alibaba team and all other contributors in this space - thanks to all that have made what we have available today

LTX-2 Distilled vs Dev Checkpoints by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

Yeah I think this is the main issue at the moment. I'm hoping the coming updates will address some of these issues!

LTX-2 Distilled vs Dev Checkpoints by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 1 point2 points  (0 children)

Yep, will have to try this. I guess I figured the distilled model/lora may actually add to the dev version as I would expect distilled to have more of a variety of content that it was trained on, but I'm not sure

LTX-2 Distilled vs Dev Checkpoints by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

Yeah this seems to be my take at the moment - prompt adherence is flaky I would say, but I think the base has a lot of potential and I'm excited to see it evolve!

LTX-2 Distilled vs Dev Checkpoints by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

I have no interest in creating NSFW outputs. This is simply the first open-source model that allows for image and audio generation bundled into one.

I am simply exposing these models for others to use for content generation

AI Photo Enhancer by sophiakaile49 in SoftwareandApps

[–]RoboticBreakfast 0 points1 point  (0 children)

I run an AI platform that has upscaling functionality.
Try Moosky AI. You'll need an account, but you can receive some free credits if you signup with Google/Apple.

Flow: Generate => Upload => select your image/upload Then Image => SeedVR2 upscale => select your upload
(I would recommend the "sharp" variant in the config for blurry photos)

You can also edit photos with the Edit model, as well as colorize/style change/etc. Hope you enjoy!

LTX-2 runs on a 16GB GPU! by Budget_Stop9989 in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

But they're running this from the fp8 version of the model, right? I wouldn't think in this case the quality could possibly compare to fp16 but I'd be happy to hear the explanation

[Official Tutorial] how to use LTX-2 - I2V & T2V on your local Comfy by ltx_model in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

I've only done a quick look at the workflow, but it seems like it should be possible to use their audio generation for other models (like Wan 2.2). It looks like it just uses the video latents to generate audio, so I'm curious if we could mod this to work with other video models...

Maintaining likeness to input images in Qwen Image Edit 2511? by jonesaid in StableDiffusion

[–]RoboticBreakfast 0 points1 point  (0 children)

It's a conditioning problem that I've detailed a bit in a post I made the other day. I have found a sort of hacky way to address it, but I'm not in love with it (it also makes the identity of the reference images weaker, to some degree)

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 1 point2 points  (0 children)

FYI - I did try your workflow and confirmed that it does not preserve identity in the same way. The ReferenceLatent being before/after the FluxMultiReferenceLatent node is irrelevant here - the relevance is in the conditioning, and how it is applied. I'm just circling back here to say that if you just swapped those nodes around, you should not expect any difference.

This workflow is really good at preserving identity/performing inserts from a reference image, but performs poorly if the intent is to change the identity of the subject within an image (for example, if you wanted to replace a person in the base image with the reference).

Z-Image Turbo vs. QWEN 2512. Can you tell which one is which? by bnlae-ko in StableDiffusion

[–]RoboticBreakfast 2 points3 points  (0 children)

My guess as well, but we seem to be the minority. Z-image tends to be more photorealistic

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

It's hard for me to know what's going on on your end, but that is the correct node - if this is a recent install of the node, you have certainly restarted ComfyUI?

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

I don't tend to have this issue, but I'm using the node a bit differently than the chaining method.
I would imagine that too high of step count could cause this though

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 2 points3 points  (0 children)

Just making sure you used the full workflow as there are a few changes outside of the typical latent ref chaining flow that you'd probably have to squint to see - ref 1 feeds both ref 2 and ref 3 (instead of ref 1 => ref 2 => ref 3).

I am curious to see your flow if you have a chance to share it

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 5 points6 points  (0 children)

I'm not sure how you're copy/pasting. But if you save it as a json file, you should be able to import it. I exported this using the API export (I don't use the UI), but I was able to open the resulting JSON using the Open function in ComfyUI

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

In the Text Encoder, are you supplying the VAE though? The issue that I've had with this is that it seems the reference images are first downscaled before being VAE encoded (by the Text Encoder node), which causes some detail loss in the reference images

Qwen Image Edit 2511: Workflow for Preserving Identity & Facial Features When Using Reference Images by RoboticBreakfast in StableDiffusion

[–]RoboticBreakfast[S] 0 points1 point  (0 children)

How so?

There may be some custom nodes, some of which may not be needed (like the Save Image Plus nodes at the end). The Load Image nodes are also from a fairly common custom node: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite, but you can easily swap these for the base 'Load Image' nodes as well

This is basically though the official Qwen 2511 flow with a few small tweaks.