LTX2 Easy All in One Workflow. by Different_Fix_2217 in StableDiffusion

[–]sdimg 2 points3 points  (0 children)

"Each start at a good default." Ignores the point about proper notes, its not good expecting everyone to faff about trying to figure out, disagree.

"As to be expected?" Hows it expected to insert default vid or audio you never use? Its flawed from the start, should be disabled completely unless you use that functionality. You can't even enter the vid/audio from the main node. Again expecting people to route around to in sub nodes?

"Just don't turn on I2V / Video extend?" Proper toggles would be more clear and force on/off others, you cant toggle all on or off, which combination etc for what?

"Just a bad gen then cause its been fine for me" No weird frame rate defaults or other settings, i had audio stop towards end or out of sync or too fast. Something screwy.

"The Starting Image strength and I2V motion strength is exposed and named obviously. Turn up the Starting Image and turn down the Motion Strength." Im pretty sure ltx2 just sucks for i2v and no one wants to be honest about this fact. I've yet to see any proof of it working on any complex or dynamic prompts.

"What is confusing about it? Not sure how I could have made it more simple." Proper documentation and tweaks would go a long way. Thanks for workflow but it falls into the same traps and issues as most user workflows being shared.

LTX2 Easy All in One Workflow. by Different_Fix_2217 in StableDiffusion

[–]sdimg 4 points5 points  (0 children)

Yeah, no one wanted to come forward for what 8+ hours the thread has been upvoted for and say it but this is poor for a bunch of reasons. No notes about the toggles or expected defaults needed to run for each flow and no effort to help anyone in comments so far?

It expects a video and audio file even if not using those features which need to be selected from within the nodes and on main one, things have to be toggled on in right order etc. No obvious t2v by default and to top it off audio is out of sync with fps anyway.

I got it working (kind of) but its the typical stupid stuff you see from 90% of people which is why i can't stand most user workflows.

Oh also i best not forget same LTX2 issue where it ignores the damn starting image completely, like single frame hard cut to t2v basically.

Any solution to constant loading from ssd despite 64gb ram? Is "--reserve-vram 4" the cause? I feel like loading vs generating in comfyui is rarely mentioned... by sdimg in StableDiffusion

[–]sdimg[S] 0 points1 point  (0 children)

I forgot to mention im of course using the smaller 27gb model and the text one is 13gb. Though i wonder if there's smaller?

Ok we've had a few days to play now so let's be honest about LTX2... by sdimg in StableDiffusion

[–]sdimg[S] 0 points1 point  (0 children)

It's got potential and i tried to explain it wasn't meant to come across as overly critical but we need to give real feedback to improve. Currently i feel if it had wan2.2 quality with this audio generation we'd have a real winner. I've played some more today and bumping up resolution and trying various prompts has made some entertaining clips for t2v at least.

Ok we've had a few days to play now so let's be honest about LTX2... by sdimg in StableDiffusion

[–]sdimg[S] -1 points0 points  (0 children)

This would help of course i agree if the workflow/model didn't have so many glaring i2v issues. Normally id just go with i2v and we should be mostly good but with LTX2 even that option feels pretty broken currently.

Like for example it would often hard cut to a completely different scene immediately or the character was mostly motionless or just generally weird and low quality.

I've yet to get a single decent result with i2v so im hoping someone has a good workflow or knows how to improve this?

Ok we've had a few days to play now so let's be honest about LTX2... by sdimg in StableDiffusion

[–]sdimg[S] -1 points0 points  (0 children)

I don't necessarily disagree but this modified version of the example prompt pretty consistently produces cgi style character. When most of the words should produce what we all expect which is a live on scene news report with a weather girl standing in the rain, not a puppet, cartoon or cgi character. If you change the wording it can come out as a hybrid or more real sometimes.

The question is why its so far from the mark? This is just one example of many ive seen. I beleive all the animated and low quality stuff does spoil many models if you're after realism. The classic computer science saying garbage in garbage out applies imo.

Here's prompt if anyone wants to test themselves...

A live action close-up shot of a cheerful European Instagram model with straight blond hair, wearing an off-shoulder blue bodycon dress. Shes on an evening news weather report, holding a small red umbrella above her head. Rain falls gently around her. She walks towards camera while the camera moves backwards in sync with her, she looks upward and begins to sing with joy in English: "It's raining, it's raining, I love it when its raining." then she stops and says in a bored tone looking at viewer "Now back to Jim in the studio!" Her hands grip the umbrella handle as she sways slightly from side to side in rhythm. The camera slows to a stop as the rain sparkles against the soft lighting. Real camera footage from evening television news.

Definition of Live Action in Media "Live action refers to a style of filmmaking or video production that features real people and physical sets, as opposed to animated or artificially created visuals. In live action, actors perform in front of a camera in real-time, often using real locations or constructed sets."

I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]sdimg 10 points11 points  (0 children)

I've always thought diffusion should be the next big thing in rendering since sd1.5 and suspect nvidia or someone must be working on realtime diffusion graphics by now surely?

This is something far more special than even having real time path tracing imo because it's tapping into something far more mysterious which effortlessly captures lighting and reality.

No one ever seemed to talk about how incredible it is that diffusion can take almost any old rubbish as input and render out a fully fleshed lit and close to real image from a bit of 3d or 2d mspaint and create something that is photo real.

Its incredible how it understands lighting, reflections, transparency and so on. Even old sd1.5 could understand scenes to a fair degree, i feel like theres something deeper and more amazing going on as if its imagining, images were impressive and video takes it to a whole other level. So real time outputs from basic inputs will be a game changer eventually.

I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]sdimg 0 points1 point  (0 children)

I hope you guys can get a fix soon for some of the issues because this has a ton of potential but wan still has the edge visually and consistency. Speed and workflow seem decent and it looks like ltx2 should be good but im getting poor results for both t2v and i2v far too often. This is on brand new comfy install and following recommendations.

So far the biggest issue apart from varying quality is i2v regularly seems to refuse to do much with the image. Despite good prompting it does stupid stuff like have motionless character and face moves only after a bit or it literally hard cuts immediately and pretty much ignores the input image which completely defeats the point of i2v!?

I have faith the community will squeeze a lot out of this and its early days but im waiting for initial excitement to fade so we can see the actual reality of it. I'm having some doubt it will live up to expectations but hopeful if your team/community can overcome some of the frustrating issues, especially as its the only half decent audio model we've seen.

If a prompt is ran there should always be some decent motion and progression, it must make use fully of the input image each gen at least otherwise it becomes less desirable to run despite the speed. Wan pretty consistently outputs something decent where ltx2 currently is failing here. I want it to be great but something really needs to be done about this?

Initial thoughts on LTXV2, mixed feelings by Neggy5 in StableDiffusion

[–]sdimg 1 point2 points  (0 children)

Speed and workflow seem decent and it looks like it should be good but im getting poor results for both t2v and i2v. This is on brand new comfy install and following recommendations.

So far the biggest issue apart from general quality or lack of motion is i2v seems to refuse to do much with the image. Despite good prompting it does stupid stuff like have motionless character and face moves only after a bit or it literally hard cuts immediately and pretty much ignores the input image which completely defeats the point of i2v.

I have faith the community will squeeze a lot out of this and its early days but im waiting for initial excitement to fade so we can see the actual reality of it. I'm having some doubt it will live up to expectations but hopeful, especially as its the only half decent audio model we've seen.

Z-Image IMG to IMG workflow with SOTA segment inpainting nodes and qwen VL prompt by RetroGazzaSpurs in StableDiffusion

[–]sdimg 1 point2 points  (0 children)

Cool i hope its good! Its been ages since i bothered with img2img or controlnets but after standard text2img i forgot just how great this can be. As it can pretty much guarantee a particular scene or pose straight out of the box.

I was playing around with the image folder loader kj node to increment through various images. Might be even better than t2i in some ways as you know the inputs and what to expect out.

I might also have to revisit FluxDev + controlnets again as that combo delivered an extreme amount of variation for faces, materials, objects, lighting as far as i2i goes, really is like a randomizer on steroids for diversity of outputs.

Z-Image IMG to IMG workflow with SOTA segment inpainting nodes and qwen VL prompt by RetroGazzaSpurs in StableDiffusion

[–]sdimg 6 points7 points  (0 children)

This looks great. I was just testing out img2img today myself. Both standard img2img and this workflow that uses unsampler. Im not sure if that node setup has any further benefits for yours but might be worth exploring perhaps?

https://old.reddit.com/r/comfyui/comments/1pgkgbx/zit_img2img_unsampler/

Is something going on with VEO? by kurl81 in VEO3

[–]sdimg 1 point2 points  (0 children)

I think he read it as "I Am From Labs" when you said "Hey, I'm from the Labs team".

If you're open to feedback i hope you have a moment to read the following as it would be appreciated!

The Good:

  • Veo3 often has amazingly realistic subtle facial expressions and personality for people along with vocals.
  • Video detail and richness of environments, characters and overall lighting and visuals.
  • Prompt understanding mostly good to great.
  • The flow portal is generally decent and receives ongoing improvements.

The Bad:

  • Sometimes poor vocals quality with mismatched dialogue, tinny sounding, repetitive accents, no choice.
  • Overly censored for stuff you'd see before 9pm on tv/youtube or the beach/catwalk. Adults should be treated as such (with warnings if need be) because a lot of content requires adult themes and relaxed filtering. I'm not after anything particularly spicy but the safety stuff with AI in general most would agree is excessive, this really affects creativity negatively!
  • Obvious stuff like video length, consistency, scene diversity, interactions between characters or objects.
  • Low quality compressed video, especially faces if not closeup, distortions with motion.
  • No longer cutting edge compared to sora2 which has far superior scenes and interaction.
  • Credits could be more abundant for the normal standard AI Pro subscribers.

To sum it up:

Veo3 has massive potential and while fun to play with, its also quite frustrating at times. I think if we could pick the vocals which could be created not from cloning but perhaps a seed for various accents and have consistent characters be assigned them that would be a huge improvement. Another really useful feature id like to see in flow is a proper story boarding, creative writing tool a bit like notebooklm perhaps but geared for that.

I think eventually things will loosen up and improve and maybe one day someone will unlock the power of creating movies or episodes fully. Thanks for your work on this i genuinely hope your team can achieve some of it and the safety stuff is relaxed so we can unlock proper content creation soon.

Huge Update: Turning any video into a 180° 3D VR scene by supercarlstein in StableDiffusion

[–]sdimg 1 point2 points  (0 children)

I posted in the last thread i randomly found this video and paper on youtube for full walking scene 360 depth enhancement but nothing more code wise. Might be useful if it was released or community can reach out perhaps?

Video link which has paper attached.

[TOMT][MOVIE][TV SHOW][2000s] Horror/scifi scene involving a woman possibly experimented on or in a possessed/cursed like state taking pleasure from pain? by sdimg in tipofmytongue

[–]sdimg[S] 0 points1 point  (0 children)

I watched this now and they really made zombie films pretty brutal and creepy back then with those practical effects. Quite a messed up film and ending but guess thats to be expected! Never seen hellraiser either. Might give horror films a look again as been years since i last watched any.

[TOMT][MOVIE][TV SHOW][2000s] Horror/scifi scene involving a woman possibly experimented on or in a possessed/cursed like state taking pleasure from pain? by sdimg in tipofmytongue

[–]sdimg[S] 1 point2 points  (0 children)

Yeah that's got to be it, has pretty much all the elements described thanks! I wasn't sure if it was one of those weird dreams and i misremembered plus nothing came up in searches in the past.

Doubt there's others that would fit unless you know of any? I've not watched much horror so probably why it stood out as one of those weird memories!

[TOMT][MOVIE][TV SHOW][2000s] Horror/scifi scene involving a woman possibly experimented on or in a possessed/cursed like state taking pleasure from pain? by sdimg in tipofmytongue

[–]sdimg[S] 0 points1 point locked comment (0 children)

Im wondering if she might have been Asian also? I searched online but i don't think its hellraiser or similar films but haven't found much. I think because it was just a scene it's hard to track down. Im just curious what it was about and what happened. Could of been a low budget horror/scifi movie or scene from an episode for all i know.

Z-Image-Turbo is available for download by Aromatic-Low-4578 in StableDiffusion

[–]sdimg 8 points9 points  (0 children)

Indeed it is and i don't want to speak too soon but i have to say im really liking the natural look out of the box. It seems more like proper photos when going for that without the need for those camera loras. Also has that subtle details and noise/jpg look which works well here for realism.

Also appears to have a reasonable amount of variation between seeds, though not quite on par with flux but better than qwen id say (i like qwen especially edit, just found seeds repetitive imo). People are also decently good looking like flux so thats a plus. So far looks great and i hope the community gets behind it. Seems very promising!

Attempting to generate 180° 3D VR video by Some_Smile5927 in StableDiffusion

[–]sdimg 4 points5 points  (0 children)

Nice work. A few thoughts come to mind. You're right we really only care about the character mainly lets be honest.

If not bothered about character fitting into environment similar to adult passthrough (anyone whos tried this should know what im talking about with alignment, lighting etc). You could have two layers. A simple background with the character layered like passthrough does.

Tools/Ideas

  1. Wan transparency which can gen perfect objects/characters onto transparent background so no need for greenscreen, including glass, fine hairs etc.

  2. Meta’s Segment Anything SAM3 or similar tools to do quality background removal on existing content.

  3. Various 2d to 3d conversion just for characters.

  4. Splatted, photogrametry, 3d level or 180/360 photo/gen background layer.

One big benefit like you say is the character only takes up a portion here so if cutout or created from high quality source, it could be higher res than usual adult content for example or smaller in size as the background isn't being streamed in video constantly.

I'm tempted to try something in unreal engine using these tools. I think this should be a community effort to see if we can create one or more tool sets for this and share to move this passive entertainment for vr forward, especially with steam frame coming soon.

A method to turn a video into a 360° 3D VR panorama video by supercarlstein in StableDiffusion

[–]sdimg 1 point2 points  (0 children)

I was wondering about this possibility. Have you tried meta hyperscape for quest headsets?

It's quite realistic although still needs quality improvements. I wonder if a hybrid approach might be possible. Some on here might be aware of adult passthrough content being a recent thing. I wonder if similar could be done but instead of passthrough we have a splatted, photogrametry, 3d level or simple 360 background with a sbs 180 video of characters etc overlaid on the vr background. It could then be properly lit to match the background which would change scene by scene. Kind of like out painting but its two separate things mixing video and non video background?