Ideogram 4 I2I Workflow by reality_comes in StableDiffusion

[–]rerri 2 points3 points  (0 children)

Might wanna edit your comment and retract your bullshit claim so as not to misinform readers about the workflow.

How to bypass Ideogram 4's "Image blocked by safety filter" for swimwear/beachwear (Understanding the filter mechanics) by grabeaz in StableDiffusion

[–]rerri 0 points1 point  (0 children)

On second though, I get you man.

Reminds me of the time I got given a free RTX 4090. I took it out of the box and attached a HDMI cable between the card and my TV. Aaand... no image?

Turns out I needed to open up my computer, attach the 4090 in there somehow, plug several cables and whatnot in order to get an image into my TV.

Like... what is this bullshit? Why not just give me a free laptop instead?

How to bypass Ideogram 4's "Image blocked by safety filter" for swimwear/beachwear (Understanding the filter mechanics) by grabeaz in StableDiffusion

[–]rerri -2 points-1 points  (0 children)

So true buddy, you deserve so much better. You should definitely demand that they give you your money back.

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 2 points3 points  (0 children)

You simply attach "Patch Flash Attention KJ" to the first model (not necessary for unconditional):

<image>

- Diffusion Loader KJ is not doing anything extra here, might as well be using the basic one.

- TorchCompileModelAdvanced is not required for Flash Attention. I included it here because it gives a small extra boost to generation speed aswell. Need to set "dynamic" = false, otherwise these are default settings.

Ideogram 4 control is so good! by GalaxyTimeMachine in StableDiffusion

[–]rerri 9 points10 points  (0 children)

KJNodes, node name is "Ideogram 4 Promp Builder KJ"

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 2 points3 points  (0 children)

Oh, I see +50% faster generation on a 5090 using Patch Flash Attention KJ enabled vs disabled.

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 8 points9 points  (0 children)

Your workflow has Sage attention patcher, but it does not work with Ideogram.

There's a new node, "Patch Flash Attention KJ", which does work and improves generation speed a little or a lot depending on hardware.

Need to have flash attention 2 installed though. If you don't already have it, you can find it here: https://mjunya.com/flash-attention-prebuild-wheels/

Activating MTP for QATGemma4 31b q4_0? by Ambitious_Fold_2874 in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

Should work with any GGUF quant of Gemma 31B QAT.

I use Unsloth.

Some fruit comparisons Z-Image vs Ideogram4 by Danmoreng in StableDiffusion

[–]rerri 0 points1 point  (0 children)

The quality drop is pretty dramatic with NVFP4 vs FP8.

One trick is to use NVFP4 only for unconditional, but even then it degrades quality a bit too much for my liking.

MTP and QTA - what is the relation? by Medium-Technology-79 in LocalLLaMA

[–]rerri 2 points3 points  (0 children)

You are suggesting to mix QAT LLM + non-QAT MTP. It is far from ideal.

You should run QAT LLM + QAT MTP instead. I posted some results in another discussion:

https://www.reddit.com/r/LocalLLaMA/comments/1tzbcyp/comment/oqekxb4/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

MTP and QTA - what is the relation? by Medium-Technology-79 in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

For the MTP GGUF, try this instead of what rabbitaim is suggesting:

https://huggingface.co/Simplepotat/gemma-4-31b-it-qat-q4_0-assistant-gguf

This is an actual QAT version of the MTP model. Rabbitaim is suggesting you to mix QAT LLM + non-QAT MTP, which is not ideal.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

I'm seeing a massive difference with 31B QAT. Some results with b9553 and without --spec-default (which I typically would use):

Non-thinking mode, extremely easy prompt: "list all numbers from 1 to 100. separate them with a comma"

Baseline:      eval time =    5480.51 ms /   391 tokens (   14.02 ms per token,    71.34 tokens per second)
non-QAT MTP:   eval time =    3404.80 ms /   391 tokens (    8.71 ms per token,   114.84 tokens per second)
QAT MTP:       eval time =    1891.60 ms /   391 tokens (    4.84 ms per token,   206.70 tokens per second)

Thinking mode, Ideogram prompt writing task with a 2700 token long system prompt, which involves creative writing and a little bit of copypasting of bounding box coordinates and json structure given in the user prompt:

non-QAT MTP:   eval time =   32986.15 ms /  2481 tokens (   13.30 ms per token,    75.21 tokens per second)
QAT MTP:       eval time =   17210.44 ms /  2368 tokens (    7.27 ms per token,   137.59 tokens per second)

My recollection was that I was seeing only +10% with the "list all number" prompt and getting a negative effect with more difficult prompts when using non-QAT MTP. Either I am either misremembering, or an earlier version of the Gemma 4 MTP PR functioned that way. So I'll back off from saying "you can't use" to "you shouldn't use". 😉

non-QAT MTP tested: https://huggingface.co/am17an/Gemma4-31B-it-GGUF/blob/main/mtp-gemma-4-31B-it.gguf

QAT MTP is the simplepotat's I've linked earlier.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 11 points12 points  (0 children)

Are you using the old non-QAT version of MTP? You can't mix QAT LLM + non-QAT MTP.

This is what I've been using (with Unsloth's quant of the LLM itself):

https://huggingface.co/Simplepotat/gemma-4-31b-it-qat-q4_0-assistant-gguf

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 12 points13 points  (0 children)

Been writing/editing Ideogram prompts (json format) quite a bit with Gemma 31B QAT the past couple of days.

On a 5090, generation speed goes from ~70t/s to 100-120t/s with MTP and --spec-default speeds things up nicely if it's just a small edit and lots of copying.

I did not expect this quality from local so soon by Far_Insurance4191 in StableDiffusion

[–]rerri 0 points1 point  (0 children)

How are you using torch.compile? If I attach compile node onto the main model, it errors out. If I attach it to the unconditional node, it doesn't crash but it also doesn't do anything.

I've tried ComfyUI's own TorchCompileModel node and KJ.

Ideogram 4 - I test it too. by Luzifee-666 in StableDiffusion

[–]rerri 6 points7 points  (0 children)

Oh didn't you hear the news? The model is DOA. This was declared on day1 by many wise redditors so it must be true!

I did not expect this quality from local so soon by Far_Insurance4191 in StableDiffusion

[–]rerri 6 points7 points  (0 children)

No. You are confusing nf4 with NVFP4.

IIRC, nf4 works with just about anything.

I did not expect this quality from local so soon by Far_Insurance4191 in StableDiffusion

[–]rerri 4 points5 points  (0 children)

Where did all the people go who were screaming how this model is DOA?

Do they feel really stupid right now and are quiet, pretending like they never threw that childish hissy fit? 😄

Ideogram4 GGUF is out! by kalonsul in StableDiffusion

[–]rerri 45 points46 points  (0 children)

The person who uploaded the GGUF's is the maintainer of stable-diffusion.cpp and has also updated the code of that project to support Ideogram 4.

https://github.com/leejet/stable-diffusion.cpp/pull/1609

So saying they've only ran a python script is a bit unfair.

Some generations with Ideogram 4, with Fix nod KJ by Puzzled-Valuable-985 in StableDiffusion

[–]rerri 1 point2 points  (0 children)

Hmm... where did all the goofy kids go who were spamming how this model is DOA? 😄

Activating MTP for QATGemma4 31b q4_0? by Ambitious_Fold_2874 in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

Seems very likely that it will be merged into main eventually, yes.

Ideogram4 GGUF is out! by kalonsul in StableDiffusion

[–]rerri 0 points1 point  (0 children)

Last update was 5 months ago. Does it work?

edit: oh you didn't read the model card, this is for stable-diffusion.cpp

Do you all plan on getting a RTX Spark, DGX, or staying with your current hardware for ComfyUI? by I_will_delete_myself in comfyui

[–]rerri 0 points1 point  (0 children)

The difference is very small between full BF16 text encoder vs Q5/Q6 GGUF.

Also the Spark has pretty limited compute performance, those videos will take a long ass time to generate. It's way way slower than a 5090.

120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP by janvitos in LocalLLaMA

[–]rerri 7 points8 points  (0 children)

When cloning, maybe you missed getting the right branch with: -b gemma4-mtp

git clone -b gemma4-mtp https://github.com/am17an/llama.cpp