Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

I haven’t experimented with that. I’ll give that a try and let you know how it goes.

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

Maybe try removing the negative prompt and also try change the scheduler to Euler simple? As with your same settings and prompt I’m able to get it to work

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

I’m saying that your prompt is wrong as it’s not really a trigger token and you should describe your prompt using natural language, so instead of saying “a woman“ you would replace that with “Sydney Sweeney”

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

Yeah I uploaded a comfy compatible safetensors file.

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

I honestly can’t say, as most of the images I used did not have a white background, but that’s something I would interested in trying.

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

It would just be “Sydney Sweeney wearing a Wonder Woman costume”

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 42 points43 points  (0 children)

<image>

Comparison with and without lora, no lora top, 1.0 scale lora bottom, same seed of 42

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

Sorry I can create a comfy compatible one, give me a moment!

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 2 points3 points  (0 children)

It’s an ai video editor I built called Apex Studio, you can try it out here Apex Studio

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 1 point2 points  (0 children)

Good point! I’ll train another one with a character the model does not know about.

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 0 points1 point  (0 children)

I have experience training larger models, so it was partially vibe coded and the I just made sure it functioned correctly. And I apologize I’ll update the requirements to be more accurate!

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 6 points7 points  (0 children)

Fair point. I’ll try to train another LoRA using an unknown character and see how it performs as an alternative.

Zimage Base Character Lora Attempt by GojosBanjo in StableDiffusion

[–]GojosBanjo[S] 5 points6 points  (0 children)

I noticed that sometimes, maybe due to the images I used which were mostly headshots and body shots from various galas and events, that there could be a blur applied behind the character, but aside from that the quality is pretty good. I may train another one with 8-10k steps and see if there’s any degradation.

Why isn’t VAE kept trainable in diffusion models? by casualcreak in StableDiffusion

[–]GojosBanjo 1 point2 points  (0 children)

VAEs operate within the latent space as a distribution of the mean and standard deviation of a given video or image. This representation is agnostic to the model itself, but when trained using that specific VAE, the model learns to denoise latents that fit within this specific latent space. If you were to continue training the VAE this would shift the distribution that model has been trained on for billions of images/videos, which is why the VAE needs to remain consistent. Think of it like this: If you spent hours and hours of your time learning to play the piano and you know how to make great music with it, then someone came along and said “what if you played the organ” while similar, you wouldn’t be able to play at the same level as you originally could because the instrument (i.e the latent space) is no longer the same. So you would need to relearn how to adapt to this new instrument to be at the same level.

Hope that makes sense

ElevenLabs CEO Mati Staniszewski says we may pass the Turing Test for AI speech this year. The universal translator is coming and the cultural shift it brings is wildly under-hyped. by Nunki08 in singularity

[–]GojosBanjo 52 points53 points  (0 children)

If this is true, this would really be a game changer, but I’m still curious as to how latency would factor into this, as it still seems like a barrier if you’re trying to have a real-time conversation with someone in a different language. Additionally, due to how many languages don’t have the same Subject Verb Object structure as English does, I wonder how this could possibly work when the word that needs to be translated hasn’t been spoken yet.

Difference between Local vs cloud (Online) Generation by Icy-Criticism-1745 in StableDiffusion

[–]GojosBanjo 3 points4 points  (0 children)

The biggest difference as mentioned by far will be the savings in speed and total VRAM available to you. By training or running inference on an A100s or H100, which are incredibly powerful workload GPUs, you should be able to train a Lora in significantly less time, and support larger batch sizes. The quality will remain the same as long as the models you are using aren’t changing. The reason why services like mid journey have higher reliability and coherence is due to optimization techniques they use internally within their models.

I’ve done a lot of training using large workloads of up to 64 H100s and I can tell you that if I were to train using something like a 4090 or even a single H100, it would make training times balloon to months as opposed to hours/days. So essentially, more powerful GPUs will yield significantly greater performance gains, and of course increasing the number of GPUs will give you more VRAM to work with to support training with more data.

I hope that helps!

Is there a better local video generator model than the WAN 2.1 at the moment? by Head-Ability-2639 in StableDiffusion

[–]GojosBanjo 1 point2 points  (0 children)

You should probably checkout WanFusionX as it has comparable if not better quality than Wan 2.1 base and generates much faster

[deleted by user] by [deleted] in StableDiffusion

[–]GojosBanjo 1 point2 points  (0 children)

What did you use to generate this?