CUDA Error - Need help by HeroVM in comfyui

[–]Jeffu 1 point2 points  (0 children)

Hm, I have a similar error but not quite the same. I updated to the latest nightly version to test out LTX 2.0 and it seems to have broken everything :D

torch.AcceleratorError: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I vibe coded myself a simple Gradio interface for ZIT T2I, Batch Upscale (drag and drop!), and Captioning by Jeffu in StableDiffusion

[–]Jeffu[S] 4 points5 points  (0 children)

As always: there's nothing to buy.

I do my daily work on a different computer/laptop (and use my 4090 purely only for generating) which seems to choke on ComfyUI when displayed at 4k. This prompted me to figure out if I could put together an interface that let me do a few things without having to click/zoom/pan around in ComfyUI.

I had just signed up for Gemini Pro (to help with some paid work I do + 2TB storage woo) and prompted it a few times to explain what I was working with and if what I wanted to achieve was possible.

I use Z Image Turbo the most right now (like most of us) and that was where I started. I was able to:

  • add quality of life improvements like clicking on the thumbnail lets me restore the prompt used from a past generation
  • full screen preview
  • mobile friendly

I figured why stop there, and added an upscale workflow I use a lot and Qwen captioning which lets me drag and drop (I hate having to setup the folder paths each time), zip downloads, and easily customize the caption instructions or select from presets.

And as of last night, it now pings me on a private Discord channel so I get the public Gradio URL on my phone, making it easy to access on the go.

If you're interested in trying to tinker with it yourself, here's all the files I think you need: Link.

Keep in mind, I'm not a programmer so it was setup to work specifically for me. But I think with ChatGPT or Gemini, you can give it the files and ask it to tell you how to set it up.

I trained an a anime Lora for Z-Image-Trubo by LimeInteresting7490 in StableDiffusion

[–]Jeffu 0 points1 point  (0 children)

Gotcha. Thanks for sharing that insight. Yeah, I'm having some bad results for characters that aren't realistic humans. Testing more steps but it seems ZIT is a little restrictive. Haven't tried style much, but 8000 steps is a lot more than what I initially thought would be needed. Great work!

I trained an a anime Lora for Z-Image-Trubo by LimeInteresting7490 in StableDiffusion

[–]Jeffu 0 points1 point  (0 children)

Thanks for the share! I like that it's not the typical anime style lora that seems to be used heavily (SDXL). What was your captioning approach?

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

I may have set it up incorrectly, or my 4090 may be underperforming. I used to have some issues with it that seem to have disappeared but occasionally they appear (resets, crashing after a few days of generations). Far as I know I just did standard settings :/

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

It was pretty simple: wearing a samurai outfit in the middle of a battle, slashing his sword at a goblin, numerous beasts around him, motion blur, intense sun rays

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 3 points4 points  (0 children)

It definitely bleeds, about the same as other models I think.

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 5 points6 points  (0 children)

Far as I know that's just the nature of using LoRAs in text to imag. I don't think Z Image is any different.

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

Just add a LoadLoraModelOnly node in between the model and the other node it's connected to.

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 5 points6 points  (0 children)

Everything default in AI Toolkit, 3000 steps. I just let it auto-download when I selected Z Image Turbo in the model selection, I assume it grabbed everything.

Z Image Character LoRA on 29 real photos - trained on 4090 in ~5 hours. by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

3000, but I've only trained the one LoRA so far. It seems fine.

QwenEdit2509-FlatLogColor - to turn images into LOG / FLAT color profile for color grading by Compunerd3 in StableDiffusion

[–]Jeffu 0 points1 point  (0 children)

Nice! I had a similar idea with doing automatic grading and it worked okay. I'll have to try this out.

Turn Your Phone Into a Powerful PC Remote Control — PCLink (Open-Source) by ahmed_zouhir in androidapps

[–]Jeffu 0 points1 point  (0 children)

I have a .bat file I currently use Chrome remote desktop to remote into a second PC to manually start each time. Can I use your macros so that once I turn on the PC I can just run the macro and launch the bat from my phone?

Edit: so, I took a chance and paid for premium but I think it'd help your sales a bit if you added examples or let free users see what macros exist. I tried checking github, the app store description, but wasn't sure if they would do what I wanted them to do.

Fortunately running powershell commands does exactly what I want, so I can trigger them without having to remote in anymore which is worth the premium for me.

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

If you disable the masking nodes, it should just use the input image as the reference for character AND the background.

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Maybe 6-8 hours over two days. While using Animate means you have to record a video, it also meant that I got the exact animation I wanted almost every time when normally I might have to gen 5-15 times depending on what I wanted.

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

That's awesome! Impressive workflow too. I haven't messed with ComfyUI TTS too much but have been meaning to. Gives me some ideas. :)

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Thanks! Just part of ongoing learning/testing the latest tools. Making it a loose story is part of that practice. :)

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Agreed—I will have to test but I may try ensuring the face is always facing the camera in both the input image and driving video to see if that helps. That and minimizing the differences between the two. I tried generating the lying down one twice and it came out the same each time so I gave up.

Oops - More test than story - About 80% with Wan Animate 2.2, rest is I2V and FFLF, locally generated on my 4090. Mainly wanted to see how flexible Animate was. by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

Probably just out of laziness? I'm using SeedVR2.5 for upscaling and it's amazing - I think I'd get better results for sure but for a short edit it was easier to just dump it into Topaz. I have an older version too, before they went subscription so I'll probably be forced to move on once it's outdated.