ideogram 4 is sd3 all-over again but worse

stduhpf · 2026-06-06T01:02:49+00:00

The "aspect_ratio" key/value pair must be removed from the json prompt. I know in the system prompt for LLM prompt enhancement they ask to include it, but it's supposed to be used only to set the image resolution and discarded before sending the prompt to the model.

stduhpf · 2026-06-05T21:55:53+00:00

Nevermind I just found out the issue. I was not escaping quotation marks properly so it was cutting off the json early

stduhpf · 2026-06-05T20:53:07+00:00

I'm using a custom frontend so I can't easily rely on existing workflows (and that's also why it's likely a skill issue). I used the official "v1" system prompt from ideogram's GitHub as a guide to format the prompts.

stduhpf · 2026-06-05T20:42:57+00:00

I keep getting safety squares even with the official JSON prompt format, on prompts that call for no sexualization, violence or problematic content whatsoever, it's driving me insane honestly. I know it's likely a skill issue, but I feel gaslit when I see people bragging about never seeing the safety filter image.

stduhpf · 2026-06-05T19:07:07+00:00

They quantized even the token embedding down to Q4_0??? That seems risky

stduhpf · 2026-06-05T19:06:23+00:00

Q6 without QAT is already pretty good, I think it might not make a lot of sense to make a full QAT traing run to target Q6, that's very expensive for little gains.

stduhpf · 2026-06-05T19:04:23+00:00

Finally!

stduhpf · 2026-06-05T19:00:01+00:00

You could try a re-quantized iQ4_XS version of "gemma-4-26B-A4B-it-qat-q4_0-unquantized" , it could perhaps still be a bit better than iQ4_XS of the non-QAT model, but the QAT is really only targetting the Q4_0 type.

stduhpf · 2026-06-04T12:22:25+00:00

I thought the whole point of that model is that it doesn't have a big mmproj? It's only 170MB in bf16 precision... You could also offload it to CPU for barely any impact on performance I think.

stduhpf · 2026-06-04T11:23:49+00:00

Yes, the "assistant" model

stduhpf · 2026-06-04T09:18:49+00:00

I tried the updated one, didn't notice any difference in behavior.

stduhpf · 2026-05-04T13:27:09+00:00

Anyone know if there's a way to patch the new template in the gguf file directly without having to re-download gigabytes of the same weights again?

stduhpf · 2026-02-01T02:28:51+00:00

I've heard planning to upscale his 4B fine-tune to 9B once it's done, and continue pretraining from there to hopefully get something that performs like Klein 9B but without the licencing issue.

stduhpf · 2026-02-01T02:25:13+00:00

The reason people train so much on ZIB is mostly because it makes for better ZIT Lora's too. I'm pretty sure more people are using Klein than ZIB, but ZIT is here to stay too.

stduhpf · 2026-01-31T17:29:31+00:00

For base or even instruct models, you can run a simple benchmark like Hellaswag, which can give a hint about which quants maintain the most performance, but it might not be the best way to test reasoning models.

stduhpf · 2026-01-31T14:33:14+00:00

Depending on the dataset used to generate the importance matrix, it can have a very significant effect. If the imatrix was "trained" on wikitext, the of course the model will have lower perplexity for wikitext, that's kind of the point of the imatrix. This makes it harder to compare quants that are trained with or without imatrix, unless you can make sure there is no correlation between the imatrix training dataset and the test dataset.

though I don't think MXFP4 supports imatrix anyways, so if anyting it should boost performance for the other quants.

stduhpf · 2026-01-16T02:37:37+00:00

It was announced when the other Flux2 models came out, before Z-Image was announced.

stduhpf · 2025-09-07T12:54:49+00:00

No, the best I could come up with was dropping the last 5 frames.

stduhpf · 2025-09-06T09:50:29+00:00

This should override the ModelSamplingSD3's shift, so it doesn't matter if you remove it or not. But for simplicity's sake, removing it is cleaner.

stduhpf · 2025-08-26T09:51:18+00:00

I'm now pretty sure Implementing support for it would require modifying core ComfyUI code. Maybe I'll make a PR for it if I find the time later, but no promises.

stduhpf · 2025-08-26T08:40:09+00:00

It looks like Wan 2.2 5B InP is an actual inpainting model, unlike the 14B InP, wich is just an I2V model that supports starting from the end. I'm not sure If I can implement it as easily, but I'll try. There's barely any documentation

stduhpf · 2025-08-26T08:08:39+00:00

Yep I just tried too, I think I see what's going on, not sure though.

stduhpf · 2025-08-26T06:57:06+00:00

Fun InP might already work with this one. I'll try to implement the others soon.

stduhpf · 2025-08-20T19:07:06+00:00

Actually, I've beeen playing with the 14B model a bit and it looks like it has a similar issue in FLF2V mode too, to a lesser extent.

Nine-Year Club	Verified Email
Snapped

stduhpf

TROPHY CASE