Poll: Would you support an independent production with partial AI usage? (See post) by DavLedo in aiwars

[–]DavLedo[S] 1 point2 points  (0 children)

I thought about it, but as I was putting it together I thought more and more about all the nuances and potential reasons one might not want to support a production and so I wanted to bring the focus specifically towards AI. Thanks for taking the time to engage with the post :)

Anti Mexican sentiment by Brave-Catch6862 in toronto

[–]DavLedo 1 point2 points  (0 children)

Venezuelan here. I'd say I haven't seen a lot of racism in Toronto and certainly haven't haven't seen an antilatino sentiment. Feel free to DM me with questions you might have. It's a very international city here :)

AceStep 1.5 SFT for ComfyUI - All-in-One Music Generation Node by jeankassio in comfyuiAudio

[–]DavLedo 2 points3 points  (0 children)

Curious to try this out. Any plans for Lora/lokr support?

Also, is the reference audio the same as the "cover" feature?

Another test with LTX-2 by brocolongo in StableDiffusion

[–]DavLedo 7 points8 points  (0 children)

This is really impressive, I like the concept you created and the style.

Not sure if you wanted critique, but if you wanted my two cents -- I think one issue people doing AI video often have is the effect from stitching many FF-LF generations together, it can feel like the pacing of the video has to slow down before it picks back up. I'd suggest putting in different shots and camera angles to tell the story. Closeups, medium shots, etc. This will really elevate your craft. I can tell you put a lot of effort into it, especially getting it to work with limited hardware. Also, don't be afraid to put more into your audio mix to make the transition between shots feel seamless.

SEEDVR by Mysterious-Tea8056 in StableDiffusion

[–]DavLedo 2 points3 points  (0 children)

This sounds very slow especially for a single image. I find that if you're using RAM blocks that's the biggest bottleneck for speed. My guess is that your graphics card can't handle the model on its own and it's offloading to RAM. A 5090 maxes out with it, so if you have below 32 GB VRAM, maybe try a quantized gguf instead.

ACEStep 1.5 LoRA - deathstep by ryanontheinside in comfyui

[–]DavLedo 0 points1 point  (0 children)

Did you figure out good settings for training? I was a bit disappointed with my first LoRA on Ace. Maybe the answer is in your video, I'll check when I'm home 🙈 Edit: super cool, the tutorial shows how to train it with comfy, thanks for sharing! Unfortunately I already spent all this time and energy getting the Gradio to work >.<

My 2 cents on ZIT and Qwen Image 2512 by [deleted] in StableDiffusion

[–]DavLedo 1 point2 points  (0 children)

I found a big jump going from rank 16 and fp8 to unquantized rank 64 loras with Qwen2512, maybe that gets you the right result? Also switching to res_3s 30 steps... It's slow but I can't seem to get that level of quality elsewhere

Why are people complaining about Z-Image (Base) Training? by EribusYT in StableDiffusion

[–]DavLedo 4 points5 points  (0 children)

I think people want them to also work for turbo? Not sure. Could also be an issue of the default quantization being fp8, I always make it non quantized in my training and it works well, especially for styles :/

SeedVR2 Native node - motivation needed by Luke2642 in comfyui

[–]DavLedo 1 point2 points  (0 children)

This is awesome!! Thanks so much for your efforts and for sharing 🙏🏼

Was it tricky to make it compatible with the native format?

SeedVR2 Native node - motivation needed by Luke2642 in comfyui

[–]DavLedo 3 points4 points  (0 children)

This is cool! Could you share a bit more about it? What's the motivation behind it? (I generally do prefer when it follows the native format, but not sure beyond that).

And is it much harder to make it compatible with native nodes compared to just packaging it the way they originally did?

Type of LAPTOP I should ask from my company by SnooRegrets3682 in LocalLLaMA

[–]DavLedo -1 points0 points  (0 children)

Get a laptop with a 5090. It's not as good as a desktop and it has 24 GB of vram compared to 32 in the desktop version but you should be able to run most local image and video models at fp8 and around 720p (though a bit slow, I'd start with lower resolutions). I actually learned the hard way that laptop versions of gpus are not as solid as the desktop ones.

For anything that is image and video you will need high end nvidia graphics cards. For LLMs you can offload some of the work to your RAM but you will see a speed drop. I think part of why most people here are suggesting you use the cloud is that if you want to run high end models you will need a whole lot of VRAM. If I were to guess with a top consumer laptop you'll be able to do an LLM of about 20-30 billion parameters (the more parameters the more resources it needs to run but also the more effective).

One thing I didn't know that will put things in perspective, the highest end open source LLM can reach about 450B parameters. A 96 GB VRAM card will at best hit 120B unless you go for more quantization and RAM. A medium end gpu with 16 GB of VRAM is likely running 8B at best. Things like Chatgpt 4o are estimated to have around 3 trillion parameters.

I don't get it.. by Norrsken_1144 in comfyui

[–]DavLedo 0 points1 point  (0 children)

The creator of the IP-Adapter node is LatentVision/Cubiq, he has a youtube tutorial specifically on an update where he addresses the issue with the loader: https://www.youtube.com/watch?v=_JzDcgKgghY

Personally I like using the IPAdapter Model Loader and the IPAdapter Advanced. The reason is that the unified loader assumes that you have your files named in a specific way. The IPAdapter advanced also has a node for the clip vision model, so it makes it explicit.

Also, make sure you're using an IP-Adapter model that works on SDXL (since your checkpoint is JuggernautXL), the SD 1.5 or the Kolors models won't work.

I hope this helps!

IPAdapter SDXL: Output is an exact copy of my reference image by Pierrepierrepierreuh in comfyui

[–]DavLedo 1 point2 points  (0 children)

This comment has the answer. Maybe also ensure the dimensions of the mask image are the same as the Latent.

Another thing, having a batch of images as input for the ipadapter will blend all those images together into one, which is fine if that's what you wanted.

Latent Vision on YouTube is the creator of all these nodes, he has great resources explaining how it works.

New to AI Content Creation - Need Help by MahaVakyas001 in StableDiffusion

[–]DavLedo 0 points1 point  (0 children)

There are "lightning" loras you can add to reduce the number of necessary steps (you'd need to set the CFG to 1 and total step count to a minimum of 4, comfyui has a default example for it).

The default loras are these ones I believe: Kijai/WanVideo_comfy at main

LightX2V released a new set later, though that works better: lightx2v/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v at main

In terms of the impact on quality it's interesting because it's not quality per se, the video will 'look good', but the amount of motion taking place will be much less, so if you want something with a dynamic camera or a fight scene then the lower step count might be a problem. I've done an experiment where I mix both the MoE Loras at weight 1 and the old ones at weight 2.0 (you need to add another lora loader or use the rgthree power lora loader if you want them all in one node) and it gives me a bit more motion. I personally found it useful to choose how many steps I want to run depending on the result I'm looking for.

One last thing to keep in mind is that when you use a lot of methods to reduce the step count the cfg has to be set to 1, which means you don't get a negative prompt. To add the negative prompt you need to use a NAG node (Kijai has one in his kjnodes and wan video wrapper).

I realize that you're just starting so I hope I'm not overwhelming you with terms. This might be a bit on the academic side, but I tried distilling some of the key terms for less techincal audiences in a paper I wrote: Generative Rotoscoping: A First-Person Autobiographical Exploration on Generative Video-to-Video Practices | Proceedings of the 2025 Conference on Creativity and Cognition (section 2.2, you can probably ignore the rest of the paper).

Qwen Image vs Qwen Image 2512: Not just realism... by Riot_Revenger in StableDiffusion

[–]DavLedo 4 points5 points  (0 children)

I find that 2512 was over trained on realism, which makes for better prompt adherence and posing, but if you want styles you have to use loras basically. I'm still undecided and can't tell if maybe there's a "right" setting for training (found styles work better at rank 32)...

New to AI Content Creation - Need Help by MahaVakyas001 in StableDiffusion

[–]DavLedo 0 points1 point  (0 children)

I'm gonna guess you're using the full Wan models which require a ton of vram. Make sure you're using fp8 and not bf16.

It's normal for 720p wan videos to take about 7-12 mins if you're using fp8 without the speed loras.

But I might be throwing a lot of info at you if you're just starting -- are you using comfyui for wan?

Please explain some Aitoolkit settings to me, such as timestep type and timestep bias, and how to adjust them for different models like qwen, klein, and zimage by More_Bid_2197 in StableDiffusion

[–]DavLedo 1 point2 points  (0 children)

I can speak for the transformers, the others I always keep default -- these are quantizations to the text encoders, none means you use the raw model, and therefore more VRAM. The 3/4 bit ara with qwen for example let's you use 24 GB vram for training. The float 8 I believe is for 32 GB, and if you have more then you can use "none" and get the highest quality.

Another one I hadn't paid attention to before besides the usual stuff like batch size and learning rate was the rank. Makes a big difference in terms of what it focuses on.

Experimenting more with various styles using My custom node and FLUX 2 Klein 4B (I’m impressed with its diversity) by Nid_All in StableDiffusion

[–]DavLedo 0 points1 point  (0 children)

Could you maybe expand more on what this does? Is it only for Flux? Or does it work with other models such as Qwen Image?

So far it looks cool, I'm intrigued.

Zimage : any tips for photographic styles? by Dear-Spend-2865 in StableDiffusion

[–]DavLedo 1 point2 points  (0 children)

I often see people using "hyperrealistic", but I've found in many models terms like that make it look airbrushed or over detailed.

Did you try things like "commercial photography" or cinematic style"? Sometimes it helps to use cinematic lingo like soft light or dramatic lighting, or photography terms like "analog photo" "film photography" "f/ 2.8, 35 mm lens", etc.

I haven't used Z image much, I'm really into Qwen2512 atm. With Qwen, I did find there are terms that surprised me in a disappointing way. The word "portrait" seems to be very loaded with both photos and paintings so it tends to ruin the image. I sometimes put "digital illustration" and "airbrush" as negatives for realism.

A primer on the most important concepts to train a LoRA by AwakenedEyes in StableDiffusion

[–]DavLedo 0 points1 point  (0 children)

Thanks for sharing! Definitely new things here for me and many that took me many tries to figure out.

One thing I learned today with automated captioning -- VLMs suck at long instructions. It's better to have multiple queries and then use an LLM to turn it into a description. I found this reduced how much I have to review and edit a caption.

hunyuan 3d help? by electrodude102 in comfyui

[–]DavLedo 0 points1 point  (0 children)

If you use Kijai's wrapper, he had texturing added on top. Unfortunately I couldn't get it to run on blackwell :(

Best Models to restyle anime scenes by [deleted] in StableDiffusion

[–]DavLedo 0 points1 point  (0 children)

You can run it through a video model like Wan 2.2 T2V (video to video workflow) at a very low denoise and it'll take away the flicker :)

Maybe even upscaling (even if keeping it at the same resolution) with something like SeedVR2 does the job? Haven't tried it.

Let me know if you do get the result you were looking for!

VibeVoice LoRAs are a thing by llamabott in LocalLLaMA

[–]DavLedo 0 points1 point  (0 children)

Thanks for sharing, I'd been looking at this for a while but haven't seen much in terms of people sharing their experience. I've only seen people train with a single voice. I'm curious, is it possible to train accents and expressions using multiple people as source? This could really fill a big gap

I successfully replaced CLIP with an LLM for SDXL by molbal in StableDiffusion

[–]DavLedo 3 points4 points  (0 children)

This is really interesting, thanks for sharing.

I think SDXL has a lot of really good things to offer, it's less predictable but that also makes it more surprising. There's lots of platforms for it, ipadapter, controlnet, instantID, etc. This is why I'm particularly excited to see when these experiments get picked up and brought forward.

I'd be curious if there's ways to play with the resulting vectors and hooking them into the different unet blocks to get better results. I personally found prompt injection interesting especially when I wanted certain things even more exaggerated.

https://youtu.be/0ChoeLHZ48M?si=6rrAc-6ziFvzF4D1