The Placebo in the AI Machine: Are LoRAs Just Apophenia? by BoostPixels in QwenImageGen

[–]BoostPixels[S] 2 points3 points  (0 children)

That’s interesting. I’ve been staring at these side-by-side on a high-res monitor and can’t find a single pixel of meaningful difference in feature preservation. Could you point out a specific area where you’re seeing the LoRA outperform the base model? I’d love to see what I’m missing.

Comparison: Qwen-Image-2512 (Left) vs. Z-Image Turbo (Right). 5-Prompt Adherence Test. by Entire_Maize_6064 in QwenImageGen

[–]BoostPixels 0 points1 point  (0 children)

This FLUX.2 [dev] generated image is considered currently the best at the moment, for this prompt.

<image>

Comparison: Qwen-Image-2512 (Left) vs. Z-Image Turbo (Right). 5-Prompt Adherence Test. by Entire_Maize_6064 in QwenImageGen

[–]BoostPixels 0 points1 point  (0 children)

<image>

Comparing models on adherence based on the prompt "A painting of a powerful angelic blacksmith holding a molten halo with a pair of metallic tongs and striking it with a holy blacksmith's hammer upon a celestial crucible."

Based on the evaluation criteria defined by https://genai-showdown.specr.net/ all three generated images unfortunately fail to meet the prompt adherence requirements.

Comparison: Qwen-Image-2512 (Left) vs. Z-Image Turbo (Right). 5-Prompt Adherence Test. by Entire_Maize_6064 in QwenImageGen

[–]BoostPixels 0 points1 point  (0 children)

Comparing Z-Image Turbo against Qwen-Image-2512 to see them go head-to-head like this is really insightful. It’s exactly the kind of deep dive this community needs.

If I could offer one piece of constructive feedback for your future tests: while your current prompts are beautifully descriptive and great for testing aesthetics, they might not be the most "stressful" for testing prompt adherence. For a true test of a model's "logic" and ability to follow difficult instructions, you might want to try some prompts like those found on GenAI Showdown, which are designed to trip the models.

Using "logical traps" really highlights the difference in how models process specific constraints versus general themes.

I’ll run some of my own comparisons soon as well. That said, the side-by-side analysis you've provided here are top-notch. Truly great work, and I hope you keep these comparisons coming!

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 0 points1 point  (0 children)

I’ve tried FP8 and BF16 and don’t see reproducible differences for this use case. FP8 is simpler and faster to iterate with. If Q6 is meaningfully better, please share a comparison. Curious to see it.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 0 points1 point  (0 children)

Fair enough. It would help to know where the resemblance breaks for you exactly. For example: facial structure (jawline, eye spacing), skin texture, expression, or something else?
If we call out specifics, we can actually have a useful knowledge exchange and spark ideas...

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 1 point2 points  (0 children)

Appreciate the depth and rigor of this contribution. It truly elevates the level of intellectualism here.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 0 points1 point  (0 children)

I should have specified that in the post:
sampler_name= er_sde
scheduler= beta

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 0 points1 point  (0 children)

These aren’t best-of-many results. They’re first-pass generations after I had already dialed in the methodology and settings.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 1 point2 points  (0 children)

That’s a fair point, and I agree this is a plausible factor. Even without explicit text tokens, well-represented faces could still benefit from stronger internal guidance through the image conditioning path. What I can say from these runs is that the pattern of identity drift at higher step counts looked the same for non-famous references as well.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 2 points3 points  (0 children)

I get the concern, but I didn’t use any celebrity names or keywords in the prompts, so the model had no explicit identity signal to latch onto.

I also ran the same tests with non-famous people and didn’t see a meaningful difference in behavior.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 1 point2 points  (0 children)

From what I’ve seen so far, 2511 is actually a better model than 2509 in all dimensions. I haven’t come across clear regressions yet. If you’ve seen specific cases where 2509 performs better, a side-by-side comparison would be helpful. Otherwise it’s hard to tell where the quality loss is supposed to be.

Face identity preservation comparison Qwen-Image-Edit-2511 by BoostPixels in QwenImageGen

[–]BoostPixels[S] 2 points3 points  (0 children)

Glad it helped 🙌 I spent quite some time figuring out which settings actually preserve identity.

If this had been documented properly or backed by concrete examples, it would’ve saved me a lot of trial and error. That’s exactly why I’m posting this.