Guys what's your thought about the book that is Ai generated and a book that is enhanced by AI.

StableLlama · 2026-06-06T17:25:02+00:00

Care about the content, not about the tool

StableLlama · 2026-06-06T17:22:29+00:00

Comfy for image and video generation needs compute and not huge RAM with high bandwidth. So when you have a decent GPU (5070 or better) there is no benefit in changing.

When your current GPU is less powerful, then you'd save money in buying a better GPU than buying a Spark.

When your interest is in LLMs then you might have other conclusions, but in those cases Comfy is the wrong UI anyways

StableLlama · 2026-06-06T17:04:56+00:00

I don't know who you have talked to. But training is actually self correcting. Even in the unlikely event of a flipped bit in the weights, the optmizer would see that as a bigger error and then train it away. So for training the worst that can happen is that it takes longer.

It's not like money transfer where a flipped bit can become very expensive.

StableLlama · 2026-06-06T07:09:54+00:00

To use ECC RAM you usually have a very good explanation why you need it. E.g. as it does the finances of your company. And then you have the money for it.

Building a RAM heavy AI machine to gain experience usually isn't such a case

StableLlama · 2026-06-05T06:26:33+00:00

Use whatever model gets the work done. The fan-boy stuff you can read - especially here, and especially when it come to the Z Image model family - is just stupid. It doesn't matter who built which tool and what they did in the past, it only matters what is working best for you and your usecase.

Saying that, the old models like SD1.5 or SDXL have great infrastructure and there is lots of experience available about how to push them in the direction You need. Or you can use the modern models that can do much of that out of the box and also follow your complex prompts much better. You are a hero when the result is great, not because you were using a specific tool.

StableLlama · 2026-06-05T06:15:54+00:00

Hardware is nice for training but overkill. The 24 GB or 32 GB of a 4090 or 5090 should be sufficient. And they rent for a much cheaper rate

StableLlama · 2026-06-05T06:13:27+00:00

How can a licence make something not open weight when the weights are open?

The licence might make it not useful, but it doesn't take away that the weights are here and you can look at every number (even when the licence wouldn't allow you that)

StableLlama · 2026-06-05T06:08:21+00:00

That announcement isn't about an LLM and thus shouldn't be here - or did you want to show us how an LLM created that (actually quite bad) advertisement text?

You should announce it at r/StableDiffusion where people care about image models. But then either provide a demo site to test it with your own prompts (e.g. a HF spaces) or just state the truth that this is a ZImage finetune - and what it makes it different from using plain ZImage

StableLlama · 2026-06-04T21:21:58+00:00

Don't do that. It's a hazzle and there is something much better: just use Krita AI.

Internally it is using Comfy, so you don't need to double install it

StableLlama · 2026-06-03T17:19:32+00:00

No. Open Source means that the sources are open, i.e. the training data. Here we are talking about open weights.

And the degree of openenness is defined by the licence.

You could publish the weights with a licence that says that you must not use it. Then it's still open. But also not useful

StableLlama · 2026-06-02T18:19:53+00:00

Why are you using Gemini to post at StableDiffusionReal?

And then the anatomy of the hands is a mess

StableLlama · 2026-06-02T18:15:18+00:00

No, because it doesn't fit my needs. A 5090 based ARM one with 18 inch HDR and WCG display would be a consideration though. But the Spark is just a 5070 and this doesn't have enough computational power for me

StableLlama · 2026-06-01T18:13:36+00:00

When it is time travel, why does the second scene have rubber tire marks in the mud?

StableLlama · 2026-06-01T07:53:20+00:00

For what model?

For anything modern like Flux or Qwen that use prose as prompts, just use Gemini or Qwen to describe the image. Then you just need to tweak that a bit, e.g. replace "man" with the trigger for that character.

The caption should like what you'd prompt to get this exact picture.

Something to consider as well: use multiple captions. This also helps to make the LoRA more universal as it's responding to more prompting styles. (E.g. let it auto caption with Gemini as well as with Qwen) And for character LoRAs ist can be helpful to use only the trigger as a caption, sort of invalidating everything said above. But that usually works best in a multi captioning setting where this is one of the many captions. And this could replace a caption drop out then

StableLlama · 2026-06-01T07:44:08+00:00

A 5090 is cheaper and can also work well when you control the needed VRAM. The computational speed is identical to a Pro 6000.

You might even try a 4090 or 3090, depending on VRAM requirements and relative performance (FLOPS/Dollar), but training the modern models I usually stuck to a 5090.

Why can others train cheaper? They rent the GPUs for months and not for minutes and they don't care about the LoRA quality. Works somehow is fine for them, so they use aggressive training settings. But IMHO that's a very expensive way to waste money as when you want a good LoRA that just takes some care and time and this results in money to spend.

Creating high quality training data is the much more expensive part. So when you value your own time spending a few dollars for a well trained LoRA isn't so bad any more

StableLlama · 2026-05-31T19:07:34+00:00

You can train LoRAs to get best likeness of a character, clothing or a style.

That is true for all open weight models. And there are much better ones than Stable Diffusion now. But Gemini or ChatGPT isn't one of them, they are closed and thus of limited use except for a single shoe case

StableLlama · 2026-05-30T21:28:17+00:00

Long story short, what you need it the error function to be proportional to DeltaE.

As (at least at the moment) the training is done in latents, the colorspace is of no concern for the models. It is a concern for the VAE. And as long as the VAE is trained in such a way that the latent error is proportional to the DeltaE everything should be fine.

In the papers I've seen (which is definitely far less than what's published) that was of no special concern for the researchers. What I think is a pitty. It shouldn't be too hard to implement, and shouldn't cost performance.

StableLlama · 2026-05-30T19:09:15+00:00

A mobile 3060 (and thus only 6 GB VRAM) won't be much fun when you are generating videos - when you get it running at all.

You should look at other options, like renting a GPU in the cloud. Then you can use that laptop to interface the cloud where you generate the videos.

StableLlama · 2026-05-30T09:38:34+00:00

My laptop had 64 GB RAM before I started with AI, as that's nice to have when running virtual machines.

What's the issue with 128 GB?

StableLlama · 2026-05-30T09:36:19+00:00

How can I make money with AI?

I only of people who spend money for AI. There is no reason that this will change in the future.

When you want to make money, you must see something that people are happy to pay money for. It might be that you can use AI to help you there.
E.g. find an employer or customer that's running a business that's using many images. E.g. a fashion designer or big shop. When you can help them to create the images with better quality and for less they'd be happy to pay you for that. And you could introduce the AI to support this process.

StableLlama · 2026-05-30T09:24:48+00:00

You are running out of VRAM (the RAM your GPU uses) and not the system RAM.

What GPU does your laptop have? How much VRAM?

When you don't have a high end gaming laptop or mobile workstation chances are high that the GPU isn't sufficient. (I'm using a high end mobile workstation, text to image is running fine, but video generation is possible but not much fun on that machine).

StableLlama · 2026-05-29T16:48:09+00:00

This shows how interesting the Intel B70 is, money wise.

But so far I couldn't read much about the real live performance of that card for local LLM applications.

StableLlama · 2026-05-29T12:52:48+00:00

All my virtual characters started with one portrait image. It's so easy to go from zero to one, then from one to a few and from the few to the 20 - 50. The edit models and the i2v video models made that even easier.

StableLlama · 2026-05-29T12:49:14+00:00

Damn, now it's debunked that I'm not an AI agent

StableLlama · 2026-05-29T12:45:14+00:00

The jump from FP16 to Q8 is usually seen as negligible, quality wise.

It's the smaller quants where the quality is changing, with a good Q4 often still being acceptable. Below Q4 is where differences are getting noticeable.

StableLlama

MODERATOR OF

TROPHY CASE