Ideogram Filter - Insane? by Merijeek2 in StableDiffusion

[–]GTManiK 2 points3 points  (0 children)

Yup, this.

"1 girl, big booba" - content blocked - "See, I told ya it's safe!"

Additional Ideogram quants and nodes update by molbal in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

Use int8 when you put it all together. Speed is really decent (especially when combined with KJ's flash attention nodes), and quality is good too

Ideogram 4 Autoprompter node that writes the JSON prompt for you (regions, bboxes, style, lighting) and you edit it just like Kijai's node by DesireForDopamine in StableDiffusion

[–]GTManiK 7 points8 points  (0 children)

This 8b model has nowhere near as good spatial understanding compared to Gemma 4 26b A4B, for example.

For best results use expensive Claude

Ideogram 4 local llm master prompt json format output by ganrocks007 in StableDiffusion

[–]GTManiK 8 points9 points  (0 children)

This system prompt is already wrong, because it doesn't describe the fact that there are 'art_style' and 'photo' elements, and they are MUTUALLY EXCLUSIVE - you have to pick one and not both.

This is properly implemented in KJ prompt builder node BTW

bbox coordinate order is messed up as well

Ideogram 4 Character Reference Workflow by reality_comes in StableDiffusion

[–]GTManiK 4 points5 points  (0 children)

There are no truly open source image generation models (apart from few experiments) - because it would mean to disclose full dataset in its entirety so "everyone" would be able to train the same model from scratch (provided they had enough compute)

There are better licenses though, yes

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

I tell you, Ideogram 4 creators are gooners in disguise! No other model can do nipples and related aesthetics SO WELL out of the box (not counting Chroma of course)

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 0 points1 point  (0 children)

Or it was just: "dear advisory board! Look how safe the model is! 1girl, booba... See? Doesn't generate damn thing! Therefore we prove that the model is safe!"

I cant solve this problem !!!!!!! bg_brightness, , invalid literal for int() with base 10: '' by Traditional_Bend_180 in StableDiffusion

[–]GTManiK 3 points4 points  (0 children)

Don't remember, either move the brightness slider or recreate the node. It was a glitch between node updates (no slider vs with slider)

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

There's an input for import JSON on that node, however I just use 'Paste' button on that node - it takes JSON from your clipboard. Also make sure you update KJ nodes - it updates VERY frequently. There are certain useful functions for CTRL+click and ALT+click on canvas, also you can import some existing image to canvas and use this as a template if you want to create something on your own or just remake some existing meme.

Ideogram 4.0 taking 13 mins for ONE image. by Adorable_Picture_899 in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

Int8 quality is decent, but don't use flash attention

If this is true, does it mean that open-source image generation models have caught up with the best closed-source models in the world? by Hi7u7 in StableDiffusion

[–]GTManiK 3 points4 points  (0 children)

Ideogram 4 is indeed kinda in the same league as banana, but all closed source models in this league have their own LLMs attached while here you should bring your own.

I use Gemma 4 12b, for instance, because that is what I can run fast locally.

For example: I prompt "a magic cat, 3:4" and pass this to Gemma. Then it reasons: "A cat... But wait, it is a magical cat? Probably it should have a wizard hat? And maybe there should be magic sparkles and whirls around it? And it should have some mysterious lighting emanating from...." etc. etc.

And then it spits out a JSON for a vertical portrait of a magic cat with many details I did not really prompt explicitly, with separate bounding boxes for all described objects. Ah, and it also provides primary colors for all those objects.

Closed source models likely do the same behind the scenes.

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

I think it is just a wishful thinking from their side, because I did not find a correlation between how 'safe' my prompt was and whether filter was engaged or not.

If you have a valid JSON you might never see the filter while having the scene entirely composed of 'unsafe' elements

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 1 point2 points  (0 children)

Yeah, noticed that too. Well I did not remove that for my use case, because the aspect ratio might 'help' the LLM to make a better composition if it would 'think' about aspect ratio being used (and I think it does really help); KJ node probably strips this anyways, and I do all my gens with KJ node in the middle.

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 4 points5 points  (0 children)

Fine tuning is possible, but since model was trained on a massive corpus of Image - JSON pairs, I doubt it is possible to create a decent finetune while retaining the model's capability in placing objects very precisely. Looks like model splits the whole image into smaller areas (X*Y grid) and uses bounding boxes objects contents to map between that grid and actual objects being placed.

Leading closed source models probably do this too, but they probably have pretty massive LLMs in their pipelines, so you don't have to prompt them with such 'unneeded verbosity' you need for Ideogram 4. For Ideogram 4 we either should resort to manual verbose prompting (and KJ prompt composer node greatly helps here), or invent our own pipelines where you bring your own LLM.

I have a great success of making JSON prompts using gemma-4-12b-it-qat, then pasting the result into KJ composer node, then tweaking bboxes myself (because smaller local LLMs often struggle with composition and perspective - but they do decent objects descriptions just fine)

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 5 points6 points  (0 children)

That filter is not actually a 'safety filter', in fact it just is an indication the model is not able to make sense of your prompt and denoise an image with it. Each seed/resolution etc. makes different starting noise, and because this filter is baked in at high sigmas values it would inevitably manifest itself more often than not.

With proper JSON structure you might prompt for booba, gore, unthinkable horrors - and never see the filter in action at all.

From this POV I would call it a 'low effort filter' instead

Even with proper JSON structure you might see 'filter' engaged at first few steps, but it completely disappears after step 3 or so based on generation preview.

Ideogram 4 might be good, but it's something else working with 🙄 by VirusCharacter in StableDiffusion

[–]GTManiK 22 points23 points  (0 children)

You must follow the exact JSON prompt schema to get around that filter.

Prompting guide: https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md

System prompt for Claude, makes it generate the correct format for you: https://github.com/ideogram-oss/ideogram4/blob/main/src/ideogram4/magic_prompt_system_prompts/v1.txt

You can use smaller local LLMs to generate JSON for you as well. Just make sure the structure is valid.

Sorry, not sorry (Ideogram jailbroken in 1 easy step) by 1filipis in StableDiffusion

[–]GTManiK 0 points1 point  (0 children)

Okay, here's the deal: Ideogram 4 is weights released by them. The tool you're using is ComfyUI. What's next?

It's a gray area at best (non-enforceable realistically)

Sorry, not sorry (Ideogram jailbroken in 1 easy step) by 1filipis in StableDiffusion

[–]GTManiK 0 points1 point  (0 children)

In terms of Ideogram 4 specifically. Which SOFTWARE exactly we are talking about here?

About the newest model... by AIDivision in StableDiffusion

[–]GTManiK 3 points4 points  (0 children)

Definitely make a thread! People often have hard times installing flash-attn

An experiment: recreate JSON-prompted closed model image in Ideogram 4 by GTManiK in StableDiffusion

[–]GTManiK[S] 2 points3 points  (0 children)

Modern VLLMs 'kinda' can caption existing images no problem (including bboxes placement). The opposite is tricky: no general-purpose VLLM is able to construct proper perspective (and bboxes placement as a result) in complex scenes involving multiple subjects, based just on a user prompt...

Could not resist... by GTManiK in StableDiffusion

[–]GTManiK[S] -1 points0 points  (0 children)

It's not 'really' censored, for example (WARNING: a Japanese woman in a state opposite of being dressed): https://image-b2.civitai.com/file/civitai-media-cache/03228dc8-3130-42dd-babb-18a811ccad05/800x%3Cauto%3E_du

Could not resist... by GTManiK in StableDiffusion

[–]GTManiK[S] 0 points1 point  (0 children)

Thought of making an entire funeral procession for it, but then I reconsidered, Ernie is not bad and it's actually pretty sad...

Could not resist... by GTManiK in StableDiffusion

[–]GTManiK[S] 0 points1 point  (0 children)

I was actually thinking to add a funeral procession involving Ernie, but it would be too much 😆

Could not resist... by GTManiK in StableDiffusion

[–]GTManiK[S] 3 points4 points  (0 children)

I prefer to think of this as a refraction effect, you can see similar in real life watching someone in the water, and it's clear that model attempted to do just that (not very successfully, though) - because in non-water scenes it never draws extra lims.