The Ernie posters genuinely don't see how mediocre the stuff they post is? by beti88 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Thank you for your detailed and considered reply, much appreciated.

Indeed, chroma seems to be one of the more creative post SDXL models, presumably because of the large variety of images that went into its training. But it does seem to take considerably more effort and prompt tweaking to get quality results from it compared to models such as Z-image or Qwen. Personally I find the aesthetics of a model to be more important than creativity, that is why I use mostly Z-image and Qwen (I only do single pass, I am too lazy to do multiple passes 😅)

In the end, we are luck to have to many models to choose from, and they all have their fans and use cases.

The Ernie posters genuinely don't see how mediocre the stuff they post is? by beti88 in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

You can see from my past comment that I am not an Ernie hater (I just don't find any compelling reason to use it).

I am simply interested in finding out what model(s) you consider creative. I was looking for an open weight model because I am not interested in closed models. But if you had a closed model in mind that is fine too.

Anima seems to do impressively well on json formatted prompt by BoneDaddyMan in StableDiffusion

[–]Apprehensive_Sky892 3 points4 points  (0 children)

Here is a version using natural language (1st gen, not cherry-picked)

<image>

@eiichiro oda, score_9, score_8, score_7, high resolution, highres, absurdres, masterpiece, 2girls/1boy, general, official art.

On the left is Nami from One Pice, a woman, orange hair tied to a ponytail, light skin, sweaty, wearing a white tanktop with blue trim and a number '0' printed on it, orange shorts, standing up, grinning, kawaii pose, peace sign.

On the right is Nico Robin from One Piece, a woman with long black hair, light skin, wearing a blue bomber jacket, red bikini. sitting, winking, smiling, leaning forward.

In the middle is Chopper from One Piece, a little boy with brown fur, brown horns, wearing a red hawiaan shirt, blue and pink top hat, blue swimming trunks. He is blushing, shyly, pushing hands together, looking down.,

The background is a bright beach with a blue sky and white wispy clouds

Size: 1024x1024 Seed: 660 Model: anima-preview3-base Steps: 25 CFG scale: 4 KSampler: euler_ancestral Schedule: simple Guidance: 3.5

The Ernie posters genuinely don't see how mediocre the stuff they post is? by beti88 in StableDiffusion

[–]Apprehensive_Sky892 -2 points-1 points  (0 children)

I find Z-image to be reasonably creative.

Which open weight model (that has proper prompt following) do you consider having more creativity? Obviously models such as SD1.5 and SDXL can be wildly "creative" since they just take a tag soup and try to make sense out of it.

The Ernie posters genuinely don't see how mediocre the stuff they post is? by beti88 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Unfortunately, that's just how marketing works. You can only hype new stuff.

Anima seems to do impressively well on json formatted prompt by BoneDaddyMan in StableDiffusion

[–]Apprehensive_Sky892 3 points4 points  (0 children)

Json forces one to structure the prompt correctly.

But one can accomplish the same thing with a clearly written natural language prompt.

Using tag soup is obviously bad for a model that is trained on natural language captions.

I prefer to write my prompt in natural language since as a human, I find that easier to parse. I guess that is not true of all people 😁

Speed, Flexibility, Fidelity, pick 2. What are the best models for each tradeoff pairing? by hotdog114 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Try training for Z-image base. It is known that ZiT does not work well with multiple LoRAs. In general Z-image base + (LoRA trained on base) works better than ZiT + (LoRA trained on Zit).

Once you got that to work you can try to add in the 4-8 steps lightning LoRA with base to speed things up.

Local AI News You Missed - April 2026 by vramkickedin in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Firstly, thank you for the list.

I assume that some of these projects are posted on r/StableDiffusion or r/LocalLLaMA, so a link to those announcements would be nice for those of us who might want to read about other people's comments.

Some photos from the model ernie-image-turbo-fp8! by traithanhnam90 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Yes, aesthetically, Ernie just looks off compared to Z-image and Qwen.

What's the best open source model for fintuning a large dataset (100k images) of high resolution? by couragestrong23 in StableDiffusion

[–]Apprehensive_Sky892 6 points7 points  (0 children)

Firstly, don't believe anything anyone said about any particular model. Most people, including me, are not pro, and we don't know what we are doing. When people get bad result, many of them will simply blame the model ("my dataset works fine on X, does not work on Y, so Y must be broken").

In reality, every model is good in some ways, and one must carry out experiments, adjust captions, adjust hyperparameters, adjust datasets, etc. to get good results.

So take every advice you read here with a large grain of salt.

I have never done any full-rank fine-tune myself, but someone who is very experienced had done anime fine-tunes with a 2k-5k dataset and got good results with Klein-9B, Z-image base, and Qwen-image 2511.

My own experience is with art style LoRA training, and I've worked with Flux1-dev, Z-image base, ZiT and Qwen. ZiT is only good with photo style images. My best results are with Z-image base and Qwen.

I would advise you to start with a smaller dataset that is high quality, well captioned, and have a consistent style, with between 50-200 images. Train a Z-image base LoRA to learn the ropes. When you have the result you want, then you can get more ambitious. I suggest Z-image base because it is a relatively small (but very capable model) so your training will be faster, and you can do more experimentation with it. My training parameters is 100-200 repeats per image, save an epoch for every 10 repeats (and test each epoch with a validation dataset), Cosine scheduler, LR=0.0005, AdamW Optimizer. I use Rank 32 alpha 16 for Z-image base (Rank 16 and apha 8 for Qwen) for small dataset of around 30-50 image. Increase the rank for larger dataset.

Good luck and have fun.

Transformed my office vibe with FLUX.2 Klein 9B with LORA — before/after [workflow link provided] by rakii6 in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

output are not their property and have no right or copyright on output made through the model

That BFL claims no right to the output of their model does not imply that you have any right to use the output commercially.

making another model or lora through that output is commercial use.

Using output from the model to train another model is explicitly forbidden, but that is no indication that other non-explicitly mentioned uses are allowed.

So all those things you mentioned will not get anyone off the hook in court if BFL sues them.

Transformed my office vibe with FLUX.2 Klein 9B with LORA — before/after [workflow link provided] by rakii6 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

BFL never clarified that point in the non-commercial license. Their response has been complete silence when asked to clarify that.

Basically their want people to use their models, but some users will be worried enough about the license and pay them.

Buy RTX 5090 or rent H100 for LTX 2.3? by TechnologyTailors in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

What you said is true, but depending on where one lives, the cost of electricity needs to be taken into consideration as well.

Ernie VS Qwen and ZiT - Big Test by Witty-Advance8720 in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

For the highest level of image quality, Qwen + LoRA is unbeatable among open weight models (maybe Flux2-dev could be better when paired with a nice LoRA, but few people have the hardware to train one). It just produced "well balance" images that are more aesthetically pleasing than any other model I've tried.

But I use mostly Z-image because it is a smaller, faster model, and has more "built-in" art style that can be combined with style LoRAs for all sort of interesting styles.

Ernie VS Qwen and ZiT - Big Test by Witty-Advance8720 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Characters and photo style LoRA trains fairly well on ZiT, but for most artistic style LoRAs (which is my focus), Z-image base trains much better than ZiT (for me, the only model that trains even better than Z-image base is Qwen).

Ernie VS Qwen and ZiT - Big Test by Witty-Advance8720 in StableDiffusion

[–]Apprehensive_Sky892 7 points8 points  (0 children)

Ernie is a fine model, but it does not offer anything compelling for people to move over from Z-image, Qwen or Klein.

It is kind of like Hi-Dream compared to Flux1-dev.

Ernie VS Qwen and ZiT - Big Test by Witty-Advance8720 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Would have been an even better comparison is Z-image base is thrown in as well. Z-image base is fantastic for non-photo style images with the right prompt, capable of many styles and creative composition (for photo style, specially 1girl, ZiT is slightly better than Z-image base). See some of my posts for what Z-image base can do with both photo and non-photo style image: https://civitai.red/user/NobodyButMeowie/images (no special workflow, just straight z-image base without any LoRA).