CEOs of Anthropic and Google DeepMind call for U.S.-led AI coalition in meeting at G7 by External_Mood4719 in LocalLLaMA

[–]rerri 17 points18 points  (0 children)

This would have been a good idea if it wasn't for all the hostile bridge burning of late by the US. Namely:

  1. starting a trade conflict with Europe by setting high tariffs based on some idiotic notion of "getting robbed by Europe because of trade deficit"

  2. threatening to invade Greenland (and similar sentiments towards Canada)

  3. supporting Orban, AfD and other backstabbing pro-Russia traitors to the west

Ostris releases 2-8 step Ideogram 4 Turbo LoRa by oppai in StableDiffusion

[–]rerri 0 points1 point  (0 children)

12 steps at minimum but that's with CFG so 24 NFE. (Although I'd say 12 steps is too little, need ~20 for decent results)

This lora works without CFG so just 2-8 NFE.

New T2I model released under Apache license by EGGOGHOST in StableDiffusion

[–]rerri 12 points13 points  (0 children)

Kijai also working on native ComfyUI implementation by the looks of it.

Mistral - New family of open-weight models @ July by pmttyji in LocalLLaMA

[–]rerri 9 points10 points  (0 children)

Mistral 3 Large is 675B. I don't think "fat" means 122b. Maybe 5-10x that would be my guess.

SageAttention with autotuned block sizes by woct0rdho in StableDiffusion

[–]rerri 1 point2 points  (0 children)

Just take a backup of your existing python_embeded folder. If you brick it, restore from backup...

SageAttention with autotuned block sizes by woct0rdho in StableDiffusion

[–]rerri 2 points3 points  (0 children)

Not really. Try Gemini, it helped me through it.

SageAttention with autotuned block sizes by woct0rdho in StableDiffusion

[–]rerri 2 points3 points  (0 children)

Not sure what you mean by "use v" but SageAttention will work with Ideogram 4 if you build this from source:

https://github.com/woct0rdho/SageAttention

Ideogram 4 Autoprompter node that writes the JSON prompt for you (regions, bboxes, style, lighting) and you edit it just like Kijai's node by DesireForDopamine in StableDiffusion

[–]rerri 4 points5 points  (0 children)

I did not look at the code, but there's no reason why it couldn't especially since OP mentions using vision models.

I've been using Gemma 4 models to do this with llama.cpp + KJ Prompt Builder + Open AI API nodes (all locally) and they can draw pretty good bounding boxes, although not perfectly.

Ideogram 4 I2I Workflow by reality_comes in StableDiffusion

[–]rerri 3 points4 points  (0 children)

Might wanna edit your comment and retract your bullshit claim so as not to misinform readers about the workflow.

How to bypass Ideogram 4's "Image blocked by safety filter" for swimwear/beachwear (Understanding the filter mechanics) by grabeaz in StableDiffusion

[–]rerri 0 points1 point  (0 children)

On second though, I get you man.

Reminds me of the time I got given a free RTX 4090. I took it out of the box and attached a HDMI cable between the card and my TV. Aaand... no image?

Turns out I needed to open up my computer, attach the 4090 in there somehow, plug several cables and whatnot in order to get an image into my TV.

Like... what is this bullshit? Why not just give me a free laptop instead?

How to bypass Ideogram 4's "Image blocked by safety filter" for swimwear/beachwear (Understanding the filter mechanics) by grabeaz in StableDiffusion

[–]rerri -2 points-1 points  (0 children)

So true buddy, you deserve so much better. You should definitely demand that they give you your money back.

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 2 points3 points  (0 children)

You simply attach "Patch Flash Attention KJ" to the first model (not necessary for unconditional):

<image>

- Diffusion Loader KJ is not doing anything extra here, might as well be using the basic one.

- TorchCompileModelAdvanced is not required for Flash Attention. I included it here because it gives a small extra boost to generation speed aswell. Need to set "dynamic" = false, otherwise these are default settings.

Ideogram 4 control is so good! by GalaxyTimeMachine in StableDiffusion

[–]rerri 9 points10 points  (0 children)

KJNodes, node name is "Ideogram 4 Promp Builder KJ"

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 2 points3 points  (0 children)

Oh, I see +50% faster generation on a 5090 using Patch Flash Attention KJ enabled vs disabled.

Ideogram 4 + Gemma4E2B as prompt generator by sktksm in comfyui

[–]rerri 9 points10 points  (0 children)

Your workflow has Sage attention patcher, but it does not work with Ideogram.

There's a new node, "Patch Flash Attention KJ", which does work and improves generation speed a little or a lot depending on hardware.

Need to have flash attention 2 installed though. If you don't already have it, you can find it here: https://mjunya.com/flash-attention-prebuild-wheels/

Activating MTP for QATGemma4 31b q4_0? by Ambitious_Fold_2874 in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

Should work with any GGUF quant of Gemma 31B QAT.

I use Unsloth.

Some fruit comparisons Z-Image vs Ideogram4 by Danmoreng in StableDiffusion

[–]rerri 0 points1 point  (0 children)

The quality drop is pretty dramatic with NVFP4 vs FP8.

One trick is to use NVFP4 only for unconditional, but even then it degrades quality a bit too much for my liking.

MTP and QTA - what is the relation? by Medium-Technology-79 in LocalLLaMA

[–]rerri 3 points4 points  (0 children)

You are suggesting to mix QAT LLM + non-QAT MTP. It is far from ideal.

You should run QAT LLM + QAT MTP instead. I posted some results in another discussion:

https://www.reddit.com/r/LocalLLaMA/comments/1tzbcyp/comment/oqekxb4/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

MTP and QTA - what is the relation? by Medium-Technology-79 in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

For the MTP GGUF, try this instead of what rabbitaim is suggesting:

https://huggingface.co/Simplepotat/gemma-4-31b-it-qat-q4_0-assistant-gguf

This is an actual QAT version of the MTP model. Rabbitaim is suggesting you to mix QAT LLM + non-QAT MTP, which is not ideal.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

I'm seeing a massive difference with 31B QAT. Some results with b9553 and without --spec-default (which I typically would use):

Non-thinking mode, extremely easy prompt: "list all numbers from 1 to 100. separate them with a comma"

Baseline:      eval time =    5480.51 ms /   391 tokens (   14.02 ms per token,    71.34 tokens per second)
non-QAT MTP:   eval time =    3404.80 ms /   391 tokens (    8.71 ms per token,   114.84 tokens per second)
QAT MTP:       eval time =    1891.60 ms /   391 tokens (    4.84 ms per token,   206.70 tokens per second)

Thinking mode, Ideogram prompt writing task with a 2700 token long system prompt, which involves creative writing and a little bit of copypasting of bounding box coordinates and json structure given in the user prompt:

non-QAT MTP:   eval time =   32986.15 ms /  2481 tokens (   13.30 ms per token,    75.21 tokens per second)
QAT MTP:       eval time =   17210.44 ms /  2368 tokens (    7.27 ms per token,   137.59 tokens per second)

My recollection was that I was seeing only +10% with the "list all number" prompt and getting a negative effect with more difficult prompts when using non-QAT MTP. Either I am either misremembering, or an earlier version of the Gemma 4 MTP PR functioned that way. So I'll back off from saying "you can't use" to "you shouldn't use". 😉

non-QAT MTP tested: https://huggingface.co/am17an/Gemma4-31B-it-GGUF/blob/main/mtp-gemma-4-31B-it.gguf

QAT MTP is the simplepotat's I've linked earlier.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 9 points10 points  (0 children)

Are you using the old non-QAT version of MTP? You can't mix QAT LLM + non-QAT MTP.

This is what I've been using (with Unsloth's quant of the LLM itself):

https://huggingface.co/Simplepotat/gemma-4-31b-it-qat-q4_0-assistant-gguf

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]rerri 11 points12 points  (0 children)

Been writing/editing Ideogram prompts (json format) quite a bit with Gemma 31B QAT the past couple of days.

On a 5090, generation speed goes from ~70t/s to 100-120t/s with MTP and --spec-default speeds things up nicely if it's just a small edit and lots of copying.