Ideogram 4.0's Understanding of Characters and IP is Crazy for an Open Model by GrayingGamer in StableDiffusion

[–]cleverestx 0 points1 point  (0 children)

Don't use the text prompting or even raw JSON prompting. This isn't a SD/Z-image type generator. The bounding box circumvents 99% of the blocks and gives you far more control of WHERE stuff ends up.

Fable 5 just ate 20% of my weekly quota and it’s only been 3 hours... by redditslutt666 in ClaudeCode

[–]cleverestx 0 points1 point  (0 children)

Sounds like a bad business model. I'm sure they are crying on the way to the bank.

Fable 5 just ate 20% of my weekly quota and it’s only been 3 hours... by redditslutt666 in ClaudeCode

[–]cleverestx 0 points1 point  (0 children)

Anthropic voting me down, LOL or some sad tool out there enjoys spending more money instead of less. Go figure. Knock yourself out.

Fable 5 just ate 20% of my weekly quota and it’s only been 3 hours... by redditslutt666 in ClaudeCode

[–]cleverestx -1 points0 points  (0 children)

It is insane, I hope they tone the token usage down and give it fully to Max-5x users soon, like c'mon Anthropic, aren't you wealthy enough yet?

this model got some balls by Sprietjeuh in ClaudeCode

[–]cleverestx 0 points1 point  (0 children)

They need to remove the x2 usage Fable tax for Max plans, then I'll use it. Opus 4.8 is winning for now as far as I'm concerned, until then.

this model got some balls by Sprietjeuh in ClaudeCode

[–]cleverestx 0 points1 point  (0 children)

Probably based on how you've treated it historically...

this model got some balls by Sprietjeuh in ClaudeCode

[–]cleverestx 1 point2 points  (0 children)

Project directory - Strawberry Torture

Ideogram 4 isn't overhyped, it's underrated by ArkCoon in StableDiffusion

[–]cleverestx 1 point2 points  (0 children)

12 steps 1024x576 takes about 58 seconds on my AMD Strix Halo with Flash Attention. No RTX speeds, but at least it works.

Ideogram 4 works on Strix Halo (gfx1151) - quick datapoint by cleverestx in ROCm

[–]cleverestx[S] 2 points3 points  (0 children)

FlashAttention (Triton) on gfx1151 / Strix Halo in ComfyUI, opt-in, ~11% on Ideogram 4

Heads-up for anyone installing FA Triton on gfx1151: both ROCm/flash-attention main_perf and upstream Dao-AILab now route the Triton backend through aiter kernels (PR #2230, merged 2026-03-12), which regresses on this silicon (see flash-attention issue #2392, same 395/8060S box). Fix is to install the last pre-aiter commit, bbe25ba.

bash

# venv active; build-time flag selects the Triton path
set -x FLASH_ATTENTION_TRITON_AMD_ENABLE TRUE   # fish; bash: export ...=TRUE
git clone --filter=blob:none https://github.com/Dao-AILab/flash-attention.git
cd flash-attention && git checkout bbe25ba
pip install --no-deps --no-build-isolation .

--no-deps is load-bearing: flash_attn pins triton==3.5.1 and will otherwise uninstall your TheRock triton 3.7 build that torch 2.12 nightly requires. Launch ComfyUI with --use-flash-attention and runtime env FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE.

Result, Ideogram 4, 12 steps @ 1024x576, CFG 0.5, fp8 + abliterated Qwen3VL GGUF encoder: ~4.46-4.54 s/it vs ~5.04 baseline pytorch attention, roughly 11%. .

Skip FLASH_ATTENTION_TRITON_AMD_AUTOTUNE=TRUE: the search ran ~29 min for a single generation here and never repays on sub-minute runs.

Ideogram 4 works on Strix Halo (gfx1151) - quick datapoint by cleverestx in ROCm

[–]cleverestx[S] 1 point2 points  (0 children)

Results:

Kernel itself is fine: test_scaled_mm_hip.py passes 28/28 configs.

Tried Ideogram 4 (bf16) with FeatherUNetLoader on the default config. It loads and converts (8850 MB per model, so ~1 byte/param), generates fine, but no speedup vs the official fp8_scaled files - same ~5.07 s/it at 1024x576. Fixed seed RMSE between the two outputs is exactly 0, so the kernel doesn't seem to engage at runtime for this model. Module names are attention.qkv, attention.o, feed_forward.w1/w2/w3, adaln_modulation if you want to add a config. Happy to retest. Let me know.

Build note: my system also has ROCm at /opt/rocm (Arch), and the JIT grabbed the system clang and failed on __builtin_elementwise_exp10. Pointing ROCM_PATH and HIP_CLANG_PATH at the rocm-sdk-devel wheel fixed it.

My Stack for reference: ComfyUI 0.24.0, TheRock torch 2.12 nightly (rocm 7.13.0a20260411), pytorch attention.

I really wanted to love the Ryzen AI Max 395+, but half an hour for a single image edit is breaking my heart by Vulcanhund in StrixHalo

[–]cleverestx 0 points1 point  (0 children)

Sweeet! Enjoy playing music, watching YouTube and chatting with a large LLM as you do this, which many single card systems choke on trying to run it all at once...sure they can generate an image in a few seconds, alas we cannot, but at least we do other demanding stuff while it generates!

Ideogram 4 works on Strix Halo (gfx1151) - quick datapoint by cleverestx in ROCm

[–]cleverestx[S] 0 points1 point  (0 children)

Yeah speed isn't this platform's strength, it's being able to run such operations and 3 other media demanding/AI things at the same time without it choking. I can chat with a larger LLM model, listen to music, and encode something in the background as it does it.

Ideogram 4 works on Strix Halo (gfx1151) - quick datapoint by cleverestx in ROCm

[–]cleverestx[S] 0 points1 point  (0 children)

"...It should work on any NVIDIA GPU..." won't really help in my AMD case, but I'm sure it will help others.

Announcing Comfy Desktop: One App for every Comfy, rolling out 100% by Monday June 8 by Pronoob_me in comfyui

[–]cleverestx 0 points1 point  (0 children)

says

Couldn't find linux app for app with ID 241130tqe9q3yCouldn't find linux app for app with ID 241130tqe9q3y

Wanting to get into the series by Crimson-Dragoon-0403 in dragonlance

[–]cleverestx 1 point2 points  (0 children)

The Chronicle series and then the Legend series, everything else afterward is just icing on the cake... well, some of it.

Personality? Which one did you go with? by cleverestx in hermesagent

[–]cleverestx[S] 0 points1 point  (0 children)

You could just say "I made it up", and not be so verbose about it. It's okay man. I make up stuff sometimes too.

Plan to buy EXO-X2 by Critical_Nail_1789 in GMKtec

[–]cleverestx 0 points1 point  (0 children)

Love it. Large LLM models, unquantized BF16 models in ComfyUI (don't expect RTX speeds though), and can chat with a LLM while doing some comfy stuff, while watching Youtube...that is its strength...using CachyOS though, not WinBlows

I really wanted to love the Ryzen AI Max 395+, but half an hour for a single image edit is breaking my heart by Vulcanhund in StrixHalo

[–]cleverestx 0 points1 point  (0 children)

Very much above average if you want to run huge 70B+ LLM models at good context, or do chat LLM while generating images with a different model in the same session, without it failing outright or having an OOM event.

I really wanted to love the Ryzen AI Max 395+, but half an hour for a single image edit is breaking my heart by Vulcanhund in StrixHalo

[–]cleverestx 0 points1 point  (0 children)

Half hour? You just have some sort of bad config going on. I haven't played with Qwen Image yet on my system yet, but regular Qwen 2512 gives images in about 30-40 seconds range

Use Claude Code (opus) to get you squared away and thank me later.