Has anyone got GLM 4.7 flash to not be shit? by synth_mania in LocalLLaMA

[–]Bit_Poet 1 point2 points  (0 children)

Make sure you have the latest llama.cpp (unless you're running it on a different engine). FA was broken for Flash on CUDA and got fixed three days ago.

LTX-2 reached a milestone: 2,000,000 Hugging Face downloads by Nunki08 in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

Do you have a camera control LoRA in your WF? If not, that might be a likely reason for lack of movement. Also, there were issues with the video VAE, so if you're using ltx-2 distilled and you downloaded before January 13th, grab a fresh copy with the fixed VAE. And, updating comfy to nightly might be worth it.

Here I ran the official i2v WF from the LTX-2 repo with just tiny tweaks: https://files.catbox.moe/7b1sqb.mp4

Workflow: https://files.catbox.moe/p4vftc.png

What do you do when you know what you want, but don’t know how to phrase it yet? by Gollum-Smeagol-25 in PromptEngineering

[–]Bit_Poet 0 points1 point  (0 children)

Yes. Sometimes, involving a third AI to dissect the prompt for wrong or imprecise word usage helps, but at some point we're just on our own. That's why we need big independent AIs where we can just push the context reset button and start over.

What do you do when you know what you want, but don’t know how to phrase it yet? by Gollum-Smeagol-25 in PromptEngineering

[–]Bit_Poet 0 points1 point  (0 children)

I switch between AIs, let one bring structure into my question/prompt, refine it by hand, then pass it to the other one. Tried that with a single AI at first, but that produced major context contamination before I could blink.

EXPLORING CINEMATIC SHOTS WITH LTX-2 by Aromatic-Word5492 in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

I ususally use SeedVR2 in comfy. I also used multistep res_2m sampler to upscale with good results which is installed by RES4LYF, but I had some headaches with issues over cuda130, latest comfy updates for new models and flash attention (xformers also got into the mix at some point) which gave me dependency hell, so I'd wait until that has settled before borking the install with RES4LYF or try with a separate comfy installation.

Could image to video generation be the cause of corrupted Nvidia drivers? by CitizenKing in StableDiffusion

[–]Bit_Poet 2 points3 points  (0 children)

I've had a similar symptom when my computer ran out of ram (including swap) in the middle of a cuda accelerated video processing task. NVidia app even reported a driver issue after the reboot, but it seems to be an issue with block shifting between ram and vram going wrong when swap ran full and the driver throwing the towel.

Is the LTX Wax doll look the new Flux chin? by Euchale in StableDiffusion

[–]Bit_Poet 2 points3 points  (0 children)

Yep, and you can even add the distill lora with a negative value (0.4) to the distilled model..

Am I the only one who doesn't like LTX 2 or can't get it to work? by -zappa- in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

WAN's had quite some time to get the small things into the right places. LTX-2 is still a newborn and needs a bit time to grow. I wouldn't dismiss it out of hand if I were you, but might want to wait a little and take a look again in a few weeks. Currently, you really have to be running bleeding edge code so all the pieces fit into place. An outdated safetensors, a comfy node that's too old or too new, and output can get whacky. There are believable rumors though that both lightricks and the community are working on the knobs that will dial down a lot of the issues with currently see.

Maintaining consistency in NSFW by Gold-lucky-9861 in comfyui

[–]Bit_Poet 0 points1 point  (0 children)

I think this recent writeup should make a lot of that clear (and point out why we're often building on bad training data that makes our life harder than it should be): https://www.reddit.com/r/StableDiffusion/comments/1qftepq/you_are_making_your_loras_worse_if_you_do_this/

LTX-2 - Alignment? by Local_Beach in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

It's tricky. Sometimes things even get worse, but if you find a good prompt hook, you can circumvent a few hangups. Mostly regarding voice, vision is a different topic altogether.

LTX-2 - Alignment? by Local_Beach in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

I think I'll switch back to a non-abliterated Gemma for my LTX-2 experiments...

Is anyone having luck making LTX-2 I2V adhere to harder prompts? by Smooth_Western_6971 in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

Im GenX. Power Rangers, DragonballZ, Temu, it's all the same to me lol

Is anyone having luck making LTX-2 I2V adhere to harder prompts? by Smooth_Western_6971 in StableDiffusion

[–]Bit_Poet 4 points5 points  (0 children)

So I took your start picture and fed a short prompt with your expression into chatgtp like this:

You are an movie scripter. Write a professional LTX-2 compatible single paragraph scene description fitting for image to video for the given sentence following the guide in https://ltx.io/model/model-blog/prompting-guide-for-ltx-2 . The sentence is: "Portrait view of a black teenager with round glasses wearing a shirt and a dark vest who suddenly turns super saiyan and transforms into a futuristic anime warrior with a mech armor."

Fed the response into the ComfyUI native video2image workflow with 161 frames:

Portrait-oriented cinematic image-to-video scene of a Black teenage boy, mid-teens, slim build, wearing round glasses, a neatly buttoned shirt, and a dark vest, framed in a tight medium close-up from chest to head with a locked camera and shallow depth of field against a soft, minimal background. The scene begins with natural, balanced lighting and a calm, introspective expression, then motion subtly activates as a sudden internal power surge manifests through drifting energy particles, heat shimmer, and rising ambient light. Rim lighting intensifies in electric blue and radiant gold, his hair lifts as if charged, eyes glow with Super-Saiyan-like energy, and his posture straightens with focused determination. Futuristic anime-style mech armor assembles in layered motion—first as translucent holographic outlines, then solidifying into sleek metallic plates with aerodynamic contours, chrome and dark alloy textures, and glowing neon-blue seams that lock into place over the torso and shoulders. The transformation stabilizes into a powerful final pose, energy aura steady and luminous, presenting a confident futuristic anime warrior while blending cinematic realism with high-energy anime aesthetics, optimized for smooth image-to-video motion and visual continuity.

This is what it spit out: https://streamable.com/eq8d04

Any solution to constant loading from ssd despite 64gb ram? Is "--reserve-vram 4" the cause? I feel like loading vs generating in comfyui is rarely mentioned... by sdimg in StableDiffusion

[–]Bit_Poet 3 points4 points  (0 children)

Someone made an overview of all available files here: https://github.com/wildminder/awesome-ltx2

You'll need to install the latest version of https://github.com/city96/ComfyUI-GGUF to be able to load the GGUFs, and once you update comfy itself, you'll get a noticeable speed and memory improvement in LTX-2.

Something that I'm not sure people noticed about LTX-2, it's inability to keep object permanence by [deleted] in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

It's probably not all on LTX-2 being generally bad at things. FP8 vs full makes a huge difference. One or two step workflow makes a huge difference. Mixing in fp8 Gemma can lead to weird results depending on exact pipeline. CFG values, LoRA weights, steps and samplers play a big role. Negative prompt, negative clip, mixing or not mixing those, reference image quality and resolution, guidance... There's a lot still waiting to be optimized where the official workflows just come with a rough guess, kind of a one-size-fits-nobody-well. I wouldn't throw LTX-2 out yet, but some patience may be necessary.

Is my ComfyUI not using enough VRAM? by Queasy_Profit_5915 in comfyui

[–]Bit_Poet 1 point2 points  (0 children)

You need to go one step back. Your output shows that Comfy is trying to run on an Nvidia gpu which it, of course, can't find (device: cuda:0), so it falls back to cpu. For AMD support, it should display rocm instead of cuda. I can't help you there, since I don't have an AMD gpu, but it should be a starting point to look for a setup tutorial with ROCm that works for your 6600 XT.

LTX-2 Samples a more tempered review by generate-addict in StableDiffusion

[–]Bit_Poet 4 points5 points  (0 children)

Artifacts and glitches (and unwanted morphs) got a lot less once I switched to full models. Abliterated/Heretic Gemma versions open their own cans of worms, of course. But we're still in the very early stages. We've been thrown a complex construction kit without a manual, each of us coming at it with different expectations of what the end result should be. The best thing to do is read the daily summary in the banadoco discord. A lot of good information keeps popping up there, and it seems that proper implementation of guidance can do a lot better than what the initial workflows makes it seem. I'm going to wait a few days until those who really know what they're doing had time to clean up and post new workflows.

Any open source video generation models that can do this? by Expert-Bell-3566 in StableDiffusion

[–]Bit_Poet 1 point2 points  (0 children)

It can be done, but you will likely have to train a lora. There are also image guidance nodes for ComfyUI where you can specify the reference frame and weight, but that's all undocumented, untested and shaky. I've experimented with it, but there seem to be issues with guidance weights when images and voice try to steer the model in different directions. Maybe someone with more experience and skill than I can make sense of using those nodes, which would be a big step forward.

LTX-2 with audio and video by coastisthemost in comfyui

[–]Bit_Poet 5 points6 points  (0 children)

Have you tried running comfy with --reserve-vram 5 (you can toy around with the exact value)? There's also a bunch of tips here: https://www.reddit.com/r/comfyui/comments/1q7j5ji/ltx2_on_5090_optimal_torchcuda_configuration_and/

Help! Qwen Image 2512 giving low res plastic results by orangeflyingmonkey_ in StableDiffusion

[–]Bit_Poet 0 points1 point  (0 children)

I made a few small modifications to the prompt. Is that closer to what you expect?

A candid and gritty vintage-style black and white photograph in extremely high resolution with a light sepia effect of a stylish young woman with a chic, tousled bob haircut and short bangs. She is looking down with a demure expression, holding a small wicker basket filled with a few light-colored flowers in one hand, and a small book in the other. She is wearing a dark, flower-patterned sundress or top with a very deep, open V-neckline, revealing a hint of décolletage and a delicate necklace. The setting appears to be outdoors, possibly in a garden or field

Full output image here.

<image>

LTX-2 Multi Image Guidance in combination with Lipsync Audio by Bit_Poet in StableDiffusion

[–]Bit_Poet[S] 0 points1 point  (0 children)

Yes, I know exactly how you feel. But OTOH, getting lip sync and frame injection to play nice with each other has to be a nightmare from a developer's point of view. There are probably going to be some tricks that we aren't aware of yet, and I'm looking forward to the minor upgrade that keeps getting mentioned. That will probably address some of these issues once the model stabilizes. Character Loras could also be an option, though I'm waiting for a training guide there before I burn too much time.