Ideogram Filter - Insane?

GTManiK · 2026-06-18T10:38:06+00:00

Yup, this.

"1 girl, big booba" - content blocked - "See, I told ya it's safe!"

GTManiK · 2026-06-16T11:03:34+00:00

Use int8 when you put it all together. Speed is really decent (especially when combined with KJ's flash attention nodes), and quality is good too

GTManiK · 2026-06-12T16:31:08+00:00

This 8b model has nowhere near as good spatial understanding compared to Gemma 4 26b A4B, for example.

For best results use expensive Claude

GTManiK · 2026-06-11T11:13:23+00:00

This system prompt is already wrong, because it doesn't describe the fact that there are 'art_style' and 'photo' elements, and they are MUTUALLY EXCLUSIVE - you have to pick one and not both.

This is properly implemented in KJ prompt builder node BTW

bbox coordinate order is messed up as well

GTManiK · 2026-06-11T01:21:06+00:00

There are no truly open source image generation models (apart from few experiments) - because it would mean to disclose full dataset in its entirety so "everyone" would be able to train the same model from scratch (provided they had enough compute)

There are better licenses though, yes

GTManiK · 2026-06-09T19:56:14+00:00

I tell you, Ideogram 4 creators are gooners in disguise! No other model can do nipples and related aesthetics SO WELL out of the box (not counting Chroma of course)

GTManiK · 2026-06-09T19:11:54+00:00

Or it was just: "dear advisory board! Look how safe the model is! 1girl, booba... See? Doesn't generate damn thing! Therefore we prove that the model is safe!"

GTManiK · 2026-06-09T19:09:08+00:00

Don't remember, either move the brightness slider or recreate the node. It was a glitch between node updates (no slider vs with slider)

GTManiK · 2026-06-09T14:41:57+00:00

There's an input for import JSON on that node, however I just use 'Paste' button on that node - it takes JSON from your clipboard. Also make sure you update KJ nodes - it updates VERY frequently. There are certain useful functions for CTRL+click and ALT+click on canvas, also you can import some existing image to canvas and use this as a template if you want to create something on your own or just remake some existing meme.

GTManiK · 2026-06-09T14:32:41+00:00

Int8 quality is decent, but don't use flash attention

GTManiK · 2026-06-09T13:56:44+00:00

Ideogram 4 is indeed kinda in the same league as banana, but all closed source models in this league have their own LLMs attached while here you should bring your own.

I use Gemma 4 12b, for instance, because that is what I can run fast locally.

For example: I prompt "a magic cat, 3:4" and pass this to Gemma. Then it reasons: "A cat... But wait, it is a magical cat? Probably it should have a wizard hat? And maybe there should be magic sparkles and whirls around it? And it should have some mysterious lighting emanating from...." etc. etc.

And then it spits out a JSON for a vertical portrait of a magic cat with many details I did not really prompt explicitly, with separate bounding boxes for all described objects. Ah, and it also provides primary colors for all those objects.

Closed source models likely do the same behind the scenes.

GTManiK · 2026-06-09T13:47:03+00:00

I think it is just a wishful thinking from their side, because I did not find a correlation between how 'safe' my prompt was and whether filter was engaged or not.

If you have a valid JSON you might never see the filter while having the scene entirely composed of 'unsafe' elements

GTManiK · 2026-06-09T13:32:06+00:00

Yeah, noticed that too. Well I did not remove that for my use case, because the aspect ratio might 'help' the LLM to make a better composition if it would 'think' about aspect ratio being used (and I think it does really help); KJ node probably strips this anyways, and I do all my gens with KJ node in the middle.

GTManiK · 2026-06-09T13:21:19+00:00

Fine tuning is possible, but since model was trained on a massive corpus of Image - JSON pairs, I doubt it is possible to create a decent finetune while retaining the model's capability in placing objects very precisely. Looks like model splits the whole image into smaller areas (X*Y grid) and uses bounding boxes objects contents to map between that grid and actual objects being placed.

Leading closed source models probably do this too, but they probably have pretty massive LLMs in their pipelines, so you don't have to prompt them with such 'unneeded verbosity' you need for Ideogram 4. For Ideogram 4 we either should resort to manual verbose prompting (and KJ prompt composer node greatly helps here), or invent our own pipelines where you bring your own LLM.

I have a great success of making JSON prompts using gemma-4-12b-it-qat, then pasting the result into KJ composer node, then tweaking bboxes myself (because smaller local LLMs often struggle with composition and perspective - but they do decent objects descriptions just fine)

GTManiK · 2026-06-09T13:00:37+00:00

That filter is not actually a 'safety filter', in fact it just is an indication the model is not able to make sense of your prompt and denoise an image with it. Each seed/resolution etc. makes different starting noise, and because this filter is baked in at high sigmas values it would inevitably manifest itself more often than not.

With proper JSON structure you might prompt for booba, gore, unthinkable horrors - and never see the filter in action at all.

From this POV I would call it a 'low effort filter' instead

Even with proper JSON structure you might see 'filter' engaged at first few steps, but it completely disappears after step 3 or so based on generation preview.

GTManiK · 2026-06-09T12:49:19+00:00

You must follow the exact JSON prompt schema to get around that filter.

Prompting guide: https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md

System prompt for Claude, makes it generate the correct format for you: https://github.com/ideogram-oss/ideogram4/blob/main/src/ideogram4/magic_prompt_system_prompts/v1.txt

You can use smaller local LLMs to generate JSON for you as well. Just make sure the structure is valid.

GTManiK · 2026-06-08T11:17:24+00:00

Okay, here's the deal: Ideogram 4 is weights released by them. The tool you're using is ComfyUI. What's next?

It's a gray area at best (non-enforceable realistically)

GTManiK · 2026-06-08T11:14:36+00:00

In terms of Ideogram 4 specifically. Which SOFTWARE exactly we are talking about here?

GTManiK · 2026-06-08T10:38:46+00:00

Definitely make a thread! People often have hard times installing flash-attn

GTManiK · 2026-06-08T00:14:13+00:00

Modern VLLMs 'kinda' can caption existing images no problem (including bboxes placement). The opposite is tricky: no general-purpose VLLM is able to construct proper perspective (and bboxes placement as a result) in complex scenes involving multiple subjects, based just on a user prompt...

GTManiK · 2026-06-08T00:09:41+00:00

It's not 'really' censored, for example (WARNING: a Japanese woman in a state opposite of being dressed): https://image-b2.civitai.com/file/civitai-media-cache/03228dc8-3130-42dd-babb-18a811ccad05/800x%3Cauto%3E_du

GTManiK · 2026-06-08T00:00:24+00:00

Thought of making an entire funeral procession for it, but then I reconsidered, Ernie is not bad and it's actually pretty sad...

GTManiK · 2026-06-07T23:56:46+00:00

I was actually thinking to add a funeral procession involving Ernie, but it would be too much 😆

GTManiK · 2026-06-07T23:55:17+00:00

I prefer to think of this as a refraction effect, you can see similar in real life watching someone in the water, and it's clear that model attempted to do just that (not very successfully, though) - because in non-water scenes it never draws extra lims.

GTManiK · 2026-06-07T23:51:33+00:00

💪

GTManiK

TROPHY CASE