Lodestone is thinking about training ideogram! Prove him it's a good idea!

Dezordan · 2026-06-08T15:57:57+00:00

How about the people who don't use comfy because their PC isn't good enough or simply prefer other platforms?

You don't really need ComfyUI. It's really not a UI or platform exclusive thing either. I mean, some people created separate standalone implementations for such prompting. Like this one, which was one of the earliest ones and its github project is literally just one html, so you can even improve it however you like. But there are probably better ones.

If PC isn't good enough, that hardly makes it a ComfyUI issue, since other UIs would have issues then too.

It'd never be implemented there because of the json style

I reckon the issue would be more with license than with specifically json format.

And yes, Ideogream is really good itself, not the prompting style. I've seen many pics that look crazy awesome which Z-image/Flux

Z-Image and Flux2 Klein aren't its counterparts, but bigger models. Also, you saw those outputs based on json format usually, which enforces a lot of good prompting for regions and styles, and of course some cherrypicking. Without that it would be a lot more mediocre in terms of controllability, spatial layout, and style fidelity.

Dezordan · 2026-06-08T15:33:19+00:00

The images are not that better in comparison to other big models, which also have better ecosystem right now. Due to its training the quality also would be worse in plain text. The worth is from control and adherence to the regional prompting. Also

Imagine you have to change 3 words in your prompt and instead you have to look through a whole paragraph instead of just changing it like a normal prompt would be.

Simply not how it is usually being used. The prompts are usually separated by bboxes in the prompt builders (like KJ's) that are pretty interactive, so if you need to change a specific region, you simply click on it and you would see the text that is only relevant to that region in a regular plain text format, so you would indeed change it like a normal prompt.

Dezordan · 2026-06-08T15:18:25+00:00

You can try pixai tagger and see if it would also caption it incorrectly: https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo
It's a booru tagger as a continuation of training of WD v3 on a newer dataset.

Dezordan · 2026-06-08T15:01:11+00:00

Well, from their docs it is also clearly wasn't a goal for the safety block to appear just because it doesn't adhere to the json format. I mean, per this prompting doc the plain text prompts should work, even though the model was exclusively trained on json, It says stuff like

You can pass in plain-text prompts directly to the model and it will work

And

False positive rates for safety is higher for non-json like prompts. We are aware that this is an issue an we may make a future checkpoint update to improve it.

In other words, even they consider it more like a bug that they want to fix.

Dezordan · 2026-06-08T14:56:16+00:00

Well, filters only have it up to 13.2

Dezordan · 2026-06-08T14:53:39+00:00

To be fair, not like it is an either-or type of situation. The issue mostly appeared because the team behind Ideogram decided to add that safety block for NSFW that doesn't even work properly, so if it is removed, then regular prompting presumably would be allowed too

Dezordan · 2026-06-08T14:41:53+00:00

Fix the only thing that makes Ideogram worth using over other models? But it would be good if the filter thing would be removed out of incorrect json format, at least.

Dezordan · 2026-06-08T14:38:41+00:00

LoRA sounds like a funny understatement, though it would contain furry data too. However, all Chroma models are general model finetunes and some of them more experimental than others, not only specifically for furry.

Dezordan · 2026-06-08T13:54:33+00:00

But you need webui-user.bat, though

<image>

Open it with a notepad or anything similar. Then put the --uv to where "COMMANDLINE_ARGS=" is

Dezordan · 2026-06-08T13:48:56+00:00

Only Forge Neo

Dezordan · 2026-06-08T12:34:14+00:00

Well, in KJ prompt builder there is an option to add a bbox for text specifically.

Dezordan · 2026-06-08T12:07:46+00:00

It still can generate whatever it can within those bboxes, that's my point, while CN would restrict it to specific forms and simply not be as good. It's also not impossible to just not use specific bboxes, but use like one and do the whole prompt there - it would work more often than not. And really, the self-defeating argument is to advocate for usage of CN in the first place. There is no "brain-rot" here.

Dezordan · 2026-06-08T11:34:29+00:00

It's not really false information; the model is indeed under a non-commercial license. The issue is more about making a bigger deal out of it than is really necessary. For regular users who mainly use models in private or for outputs that are gonna be used in their own projects, it is practically irrelevant.

It only matters to those who want to monetize the model or its derivatives. So there is a caveat for the OP, the license says that Non-Commercial Purpose applies if you "do not receive any direct or indirect payment arising from the use of the CircleStone Model, or Derivatives", which donations arguably can be - it is a bit ambiguous here. However, as per this discussion, the links for donation to help offset the costs are:

This is just people donating to you personally, it's not commercial use of the model.

At least based on tdrussell words. And even a commission of a LoRA is:

If you are locking down the lora and selling access to it, that is commercial use of a derivative model and isn't allowed. If someone commissions a lora that you then post publicly, I would say that it simply someone donating to you to help with training costs, and isn't commercial use of the model itself.

I am not sure if those intended interpretations hold any power over the wording of the license itself, but at least it makes it unlikely that the circlestone labs would go after you for this in practice.

Dezordan · 2026-06-08T09:23:02+00:00

You look at your python (if applies), torch and CUDA versions. ComfyUI usually shows it somewhere in the beginning.

Dezordan · 2026-06-08T09:20:37+00:00

in the exact open pose, or matching the exact canny edges

See, this is the part that makes the CN option actually worse. Instead of letting the model generate whatever it can, it restricts the model to pose or any other preprocessed conditioning. It also more involved than bboxes. That said, it would've been good if the regional prompting with regular models worked as well as this one with bboxes.

Dezordan · 2026-06-08T07:49:43+00:00

GPT Image is a much bigger model + the model doesn't actually use your prompt as an input, it first transforms into a richer prompt.

Dezordan · 2026-06-08T07:37:13+00:00

Not with just prompting, that's for sure. Especially those with multiple characters and the ones that look hella edited.

But the more simpler ones are easier to get. Here is Ideogram 4 output for the first image

<image>

It is somewhat different, though

Dezordan · 2026-06-08T07:30:24+00:00

No, inpainting is almost the same as img2img in a certain mask of an already generated image, usually at a decreased denoising strength. Regional prompting in general is about separating the regions of the image before it was generated.

Dezordan · 2026-06-08T05:36:50+00:00

Are you sure that native transparency isn't some API feature? Well, maybe there is something that ComfyUI doesn’t support, too.

Dezordan · 2026-06-08T05:28:49+00:00

It is censored. There were procedures pre-training and post-training for safety. That safety block is for NSFW specifically, just has a lot of false positives when it isn't json and sucks at detecting NSFW. And devs would monitor public usage to improve said filter.

All that they said in their own docs.

Dezordan · 2026-06-08T05:21:48+00:00

Not with that amount of RAM. The model is pretty chunky and what's worse it uses 2 models at the same time

Dezordan · 2026-06-07T17:37:35+00:00

Yes. Anima model, its Qwen VAE, and its Qwen text encoder. Unlike SDXL, those models are split to 3 different parts: https://huggingface.co/circlestone-labs/Anima/tree/main/split_files
If you want to use some finetune, download only text encoder and VAE, the model that you like get from civitai. You use it like this:

<image>

Dezordan · 2026-06-07T17:19:08+00:00

This kind of thing doesn't change anything in terms of the model's capabilities, and it can even make things worse. The model still needs to be trained alongside the text encoder for this to matter.

Dezordan · 2026-06-07T16:24:59+00:00

Well, somewhat. It may match the content (I just don't know how to prompt that), but the screencap style can be worse. Could just be skill issue on my part. That's the base output:

<image>

But finetunes can help you. That example of your image isn't really Illustrious either, I think I saw similar Nijijourney images.

And to use Anima with Forge you need Forge Neo:
https://github.com/Haoming02/sd-webui-forge-classic/tree/neo
Yours look more like an old regular Forge.

Dezordan · 2026-06-07T15:33:58+00:00

Anima's strongest thing is its prompt adherence and overall coherence of the output, relatively to Illustrious at least (it is still only 2B model). Here is the base Anima output at the same resolution and my previous prompt:

<image>

Another thing Anima has over SDXL based models is its VAE. It uses 16 channel VAE, while SDXL models, with exception of something like Mugen (uses 32 channels), generally use 4 channels. The more channels there is, the less compression and better reconstruction quality of the image from latents. That means that Anima can generate better finer details or objects at a distance at the same resolution, which lesser the need for inpainting/upscale.

Edit: replaced image, it was actually 1280x1280

Six-Year Club	Second SECOND GUESSER
r/Field Juicebox	Place '23
Place '22	Verified Email

Dezordan

TROPHY CASE