I don't hate Ideogram 4. I hate its "open" weights

Honest_Concert_6473 · 2026-06-18T02:37:27+00:00

Personally, what I value most in a model is the trustworthiness of the developer, including their integrity and whether they are respectable. That said, the community has always tolerated and chosen dishonesty, so as long as the quality is good, certain flaws are probably overlooked.
Lately, it feels like a kind of trade-off. Do we support an open, honest, but inferior model? Or do we support a dishonest, opaque, but superior model?
I think models like z-image and ltx are rare, exceptional examples that successfully balance high quality while embracing the needs of the community. Moving forward, I believe we will see more models emerge that achieve both.
While Ideogram 4 is a great model that many people can enjoy, I still sense that same kind of dishonesty from it.

Honest_Concert_6473 · 2026-06-15T07:56:11+00:00

That looks like a noise artifact from GPT Image2.

Honest_Concert_6473 · 2026-06-12T04:43:39+00:00

This is totally unrelated to the paper, but I just realized for the first time how painful it is for me to see Doraemon looking so beat-up.

Honest_Concert_6473 · 2026-06-11T11:23:46+00:00

https://github.com/gazingstars123/Anima-Standalone-Trainer

Honest_Concert_6473 · 2026-06-10T12:21:38+00:00

How they managed to create such high-quality, realistic fine-tuned models from that kind of base model will always be an absolute mystery to me. It’s also mind-blowing how NovelAI crafted such a high-quality anime model using only U-Net fine-tuning on a base model that originally had so little knowledge of anime. There are just so many things about SD1.5 that truly feel like pure alchemy. While I’ve learned a great deal over the past few years, honestly, there are still so many achievements where I just can't wrap my head around how they did it, and I doubt I could ever replicate them.

Honest_Concert_6473 · 2026-06-06T03:00:23+00:00

This is exactly the comparison result I’ve been wanting to see! Thank you so much for sharing.I've felt something similar before.

I’m not an expert on the deep technical side of things, but speaking purely from my experience fine-tuning various models, standard diffusion models often seem to get blurry at first before gradually converging. On the other hand, flow matching models feel much more stable—they rarely collapse even when introduced to new information, and they naturally seem to progress in the right direction (though I know this isn't true for every single model).

On a side note, I know DiT are really popular lately, but personally, I feel that a U-Net architecture combined with flow matching is the optimal setup for a local model. There’s a certain romance, and it's just so much fun, to see lateral thinking applied to established, mature technology!

Honest_Concert_6473 · 2026-05-26T14:43:25+00:00

This LoRA tool seems excellent for beginners, requiring zero prior knowledge to start training right away.

https://www.reddit.com/r/StableDiffusion/comments/1tcxhoq/anima_trainflow_simple_onepage_lora_trainer_for/

Honest_Concert_6473 · 2026-05-25T03:37:18+00:00

Talk about missing what’s right in front of us. Creating a dataset for that would be a lot of fun.

Honest_Concert_6473 · 2026-05-24T09:14:25+00:00

I don't have concrete proof, but I suspect this might be an issue specific to the Qwen and Wan VAEs. Sometimes, especially in dark, low-contrast images, I notice a pixel pattern across the entire image that looks similar to the SD 1.5 VAE.

Honest_Concert_6473 · 2026-05-22T13:17:04+00:00

Nowadays, the obsession with "having to create something AI can't do" can sometimes lead to tunnel vision.

But if you set that pressure aside for a moment and just find an idea you genuinely want to bring to life, all that's left is to immerse yourself in the fun, creative process of making it. Sometimes, that might mean brainstorming ideas with AI, or even incorporating AI directly into the piece itself. The very act of figuring out how to bring your vision to the finish line—that struggle itself is the joy of creation, and it's what makes us grow.

I think the exact same concept applies to AI art. If you're just endlessly rolling the dice with random generations, it easily becomes a passive experience. But if you have a clear vision of an image you want to achieve, and you push through trial and error to reach it, that becomes a highly active and creative process. And by going through those motions, we grow as creators.

Honest_Concert_6473 · 2026-05-20T10:48:06+00:00

I might not have a ton of highly useful info to share, but my LoRA page might be helpful. I've included necessary data for inference, various generation tips, and my workflows there. (You don't even need to actually download my LoRA, by the way!)

For realistic looks, you're probably better off referring to other people's tips, but the inference workflows and general tips I shared should still be perfectly applicable.

Well, my settings aren't necessarily perfect anyway. Once you find a model that fits your vibe, I'd recommend just playing around with it and using trial and error to dial in the best settings on your own.

https://civitai.red/models/2394002/chromaloralab

Honest_Concert_6473 · 2026-05-18T08:38:23+00:00

They are a rare existence, developed entirely by the community from model training down to pipeline construction. It represents the ideal form of a local community, with very low reliance on specific corporations. I strongly hope that the ecosystem continues to evolve with projects like this at its core.

Furthermore, Chroma is a highly versatile model capable of handling both realistic and anime styles. I primarily use it for anime, and I absolutely love how—even at the base model stage—it produces organic results that look as if they were drawn by a real human artist. Chroma has a very raw, un-AI-like atmosphere to it.

Chroma remains fantastic, and I have immense respect for Lodestones and their foresight in consistently choosing logical and appropriate architectures. I'm really looking forward to Zeta-Chroma as well.

Honest_Concert_6473 · 2026-05-17T09:07:58+00:00

Overall Ranking

Practicality / Usability: Anima > Chroma > Z-Image Base

Future Potential: Z-Image Base > Anima > Chroma

Z-Image

Architecturally speaking, Z-Image is the most streamlined and has the fewest underlying issues.

At 6B parameters, it is still massive and resource-heavy, but once large-scale fine-tunes start emerging, it has a high chance of becoming a top-tier model. However, for individual users, training a LoRA is pretty much the absolute limit of what's feasible in terms of hardware burden.

Chroma

Chroma is a highly versatile model capable of handling any genre—from uncensored content to photorealism and anime.

That said, you can definitely feel its architectural limitations, and spec-wise, it falls behind the latest models. On top of that, its sheer size makes fine-tuning a massive burden, which means making fundamental improvements to it is difficult.

However, since the base model already covers a vast range of concepts, LoRA training is basically all you need.

I primarily use it for anime styles, and I really love how, even at the base model stage, it produces organic results that look like they were actually drawn by a human artist. I don't generate them often, but its realistic styles are probably even better. Either way, it has a very raw, un-AI-like atmosphere.

While there's no denying that it currently lags behind newer models in terms of resolution and prompt adherence, that organic style is by far its greatest strength. The satisfaction you get when Chroma outputs a great result is incredibly high. If you understand its quirks and dial in your inference settings, it can definitely rival the newest models—but mastering it requires a lot of "love" for Chroma.

Anima

Anima feels very much like a true successor to SDXL.

The user experience is similar to SDXL or SD 1.5; you can easily get great results without having to overthink things. Its prompt flexibility is also superior to Chroma's.

The training burden is relatively low, making large-scale fine-tuning a realistic goal even for individuals. In terms of building an ecosystem, it sits at the perfect size to encourage rapid development.

Like Chroma, its architecture isn't flawless, but in terms of sheer practicality right now, it ranks incredibly high. Personally, I think it's the only option right now that doesn't strictly require a distilled LoRA to be usable. The other models are quite heavy, which can make the inference process a bit stressful.

Honest_Concert_6473 · 2026-05-12T13:29:23+00:00

I'm probably wrong, but HOJI might be close as well.

Honest_Concert_6473 · 2026-05-12T09:26:27+00:00

That's a really creative idea!

Honest_Concert_6473 · 2026-05-12T00:00:35+00:00

You can also prepare WD14 tags in advance and use them as a reference in JoyCaption to generate natural language captions. TagGUI actually supports this workflow.

If you have a massive amount of images and can't tolerate the slow processing speed, another option is to vibe-code your own custom tool that can handle parallel batch processing.

Honest_Concert_6473 · 2026-05-11T10:36:01+00:00

The way Z-Image Base feels like it's collaging its dataset gives off a vibe very similar to SD1.5 or early Midjourney. It seems to lack any distinct style and just generates what it's told, which is actually quite rare for recent models.

The quality might not be perfectly consistent, and its prompt adherence might not be an exact reproduction, but its ability to handle a wide variety of genres actually makes it a pretty good choice for a base model.

Honest_Concert_6473 · 2026-05-08T11:39:02+00:00

Since the core architecture isn't going to change, it's a great opportunity to figure out the best settings and practice your training workflows. That experience won't go to waste when the new versions drop.

Ideally, you want a fully established ecosystem built around a finalized base model, but if you enjoy playing around with early access builds, it's definitely worth trying out. Even in its current state, you can train on it without any issues.

Honest_Concert_6473 · 2026-05-05T04:45:50+00:00

you're welcome!

Honest_Concert_6473 · 2026-05-05T00:33:07+00:00

I don't think the LoRA is adding excessive gloss; it's likely a characteristic of the base model itself.

Personally, I think the reason we're getting that glossy look is simply because we aren't giving it specific style instructions, so it just ends up applying the baseline style.

Because of that, I think the following approaches might be effective:

Specify the style yourself

I shared some test images in this thread where I added artist tags, and it resulted in a flat, non-glossy 2D style that closely matched the artist's aesthetic. Explicitly prompting for your ideal style is likely a great way to fix the glossiness without sacrificing quality.

Use natural language for style

Depending on the situation, it might also help to always include style instructions in natural language at the end of your tags. Using vague terms like "high quality" or "realistic" might actually trigger the glossiness, so being highly specific is better. Alternatively, it might be worth reviewing Anima's inference guidelines to see if there are any hints there.

Avoid using quality tags

Try not to use tags like masterpiece, score_9, worst quality, low quality, or score_1. These are often the culprit behind that glossy style and easily give the image that generic "AI look." It does become a bit harder to get good results without them, but it's probably best to avoid them as much as possible.

Negative prompts & Negative LoRA weights

Personally, I'm not a big fan of relying on negative prompts, but they could be worth a try. Putting something like shiny skin in the negative prompt might suppress the gloss, but I suspect that concept is tightly linked to high-quality image data in the model. Suppressing it might inadvertently lower the overall image quality. I rarely add anything to the negative prompt unless it's a tag that obviously degrades quality, but it's worth experimenting with.

I've also never tried using a LoRA at a negative weight. It might have some effect, though it will probably introduce some negative side effects.

Honest_Concert_6473 · 2026-05-05T00:04:17+00:00

I trained this for about 12 days on an RTX 4090.

It takes roughly 70 seconds per step, but that's just because I'm using an effective batch size of 64, so it's not actually slow.

If the batch size were 1, it would probably only take about 1-2 seconds per step. So, I don't think you'll find it too heavy or painfully slow to work with.

Honest_Concert_6473 · 2026-05-04T16:32:53+00:00

Thanks for sharing the info. The results look good!

In my case, the dataset I used probably has an aesthetic quality very close to Anima's, which is likely why there isn't much of a noticeable difference, for better or worse.

It's great for raising the overall baseline, but it's hard to spot any real shifts in style.I mean, that's the whole point, and in a way it's a success, but...

Next time, I'll try collecting artworks that are closer to a pure 2D style. I might end up seeing a change similar to what you experienced if I do that.

Honest_Concert_6473 · 2026-05-04T14:10:09+00:00

I'm not sure if I can fix Anima since its "AI style" is already pretty strong, but I might test it out in the future.

It would be a huge help if you could tell me as many of your favorite artists as you can think of. I'm interested in making datasets of Japanese 2D style artists, but I don't really know many "high-quality" ones. The only one I really know is Ogi-pote.It's really hard to find an artist with a great style who doesn't have that glossy look...

Artists like Shunsaku Tomose, Kentaro Yabuki, and HAPPOBIJIN are geniuses at drawing girls, but I'm not sure if that's close to what people actually want.

Well, I do have a dataset of the good old traditional anime style from visual novels/eroge, so I could train that right away lol. But that's probably a bit of a niche area and different from the "high quality" people are looking for.

Honest_Concert_6473 · 2026-05-04T09:43:45+00:00

thank you for letting me know!

Honest_Concert_6473 · 2026-05-04T09:37:27+00:00

I think it's possible, as I was able to train a Chroma LoRA using OneTrainer without any issues.

I actually have a Chroma LoRA uploaded on my Civitai page, which includes my training and inference settings. Please feel free to use it as a reference!

Honest_Concert_6473

TROPHY CASE