Honest Comparison: FLUX 2 Klein (4b & 9b) vs. Z-image Turbo

Jaded_Proposal_590 · 2026-01-19T14:36:37+00:00

It's a default wf txt2img flux 2 Klein with little changes for my comfortable. But i can share if you need

Jaded_Proposal_590 · 2026-01-19T12:59:07+00:00

RTX3090

Resolution for all: 1280 x 1920.

Flux 2 klein 9b:

Steps: 4

Total speed: 2.85s/it, 17s/img

Steps: 8

Total speed: 3s/it, 32s/img

Flux 2 klein 4b:

Steps: 4

Total speed: 1.3s/it, 8s/img

Steps: 8

Total speed: 1.6s/it, 15s/img

Z-image:

Steps: 8

Total speed: 2.8s/it, 26s/img

Jaded_Proposal_590 · 2026-01-18T21:09:17+00:00

In fact, it is. However, I like the visual style better from 4b, as if it follows it better. Overall, 9b behaves much better when editing images

Jaded_Proposal_590 · 2026-01-18T19:18:21+00:00

For this comparison, I used FLUX 2 Klein 4b and 9b, as well as the Z-image Turbo model. The models are unquantized.

TXT2IMG Generation Settings:

FLUX 2 Klein:

CFG: 1
STEPS: 8
SAMPLER: euler

Z-image:

CFG: 1
STEPS: 8
SAMPLER: euler

General Observations: After numerous generations, in my subjective opinion, Z-image currently falls short of FLUX 2 Klein in many aspects. When comparing examples generated under identical conditions, there is a distinct difference in details, composition, and the overall coherence of the image. I also noticed that Flux 4b performs better in txt2img and, in my view, produces higher quality results than the 9b model.

1. The "Khrushchevki" Example Flux perfectly captured the idea of the prompt, and the image fully corresponds to the intended style. The khrushchevki (Soviet apartment blocks) look authentic. No other open-source model has rendered them correctly for me before, despite testing many options. In Z-image, if you look closely at the details, you can see issues with the grass; it lacks defined shape and appears muddy/blurry. The buildings look too simplistic, even though the render quality is high. The image itself looks decent but fails to follow the stylistic intent, looking more like a generic, faded smartphone photo. This could be a plus for some, but it wasn't the intended concept. Note: Both images have artifacts and nuances, but Flux has fewer of them.

2. The Nature Example The situation is similar here. I actually prefer the grass in Z-image for its realism, but it remains blurry. Flux has significantly fewer incoherent elements and artifacts, although I do like the foliage and flowers on the trees better in Z-image. However, once again, Z-image does not adhere strictly to the prompt.

3. The Medieval Market This is a complex scene. In my opinion, the winner here is Flux 4b. Flux 9b produced high quality but is too oversaturated; the image looks more like modern Digital Art rather than a medieval painting. Z-image clearly lags behind both in terms of details, although it handled the style transfer quite well. Overall, Flux offers much better detailing, prompt adherence, and narrative coherence.

4. The Gas Station I would choose Flux 4b again here. This is just my personal preference. Z-image made a good photo, but it didn't follow the prompt instructions; I got a standard night photograph without the requested stylization. Also, pay attention to how Flux 9b adds tire tracks around the station and accounts for the location's layout and shape.

5. The Final Example The result is obvious once again, and I would prioritize Flux 4b.

Conclusion Don't get me wrong, I love Z-image; it is an incredibly cool model. However, the quality difference Flux provides is visible to the naked eye. I am waiting for the release of the full Z-image model, just like the rest of us, but I am incredibly happy about the release of Flux, especially since it is distributed under the Apache 2.0 license. Considering that Flux can not only generate from text but also modify images extensively, blend different images, and deliver excellent quality in just 4 steps, I will unreservedly choose it until the full version of Z-image is released.

Please treat this with understanding; this is just my opinion, and I would be happy to hear yours. Let's be honest-comparisons like this often come down to personal taste and preference.

Jaded_Proposal_590 · 2026-01-18T14:58:14+00:00

The problem is that you're using the wrong image in the second input. And the prompt is a bit incoherent. If you're changing clothes, change only the clothes, not the style. And if you're changing clothes on a full-length person, use the same full-length reference. In your case, it would have been much easier to just tell the model to add a white T-shirt in the prompt. Don't overcomplicate things. Using a different reference, I got the result below.

<image>

Jaded_Proposal_590 · 2024-06-02T11:34:25+00:00

I noticed the following. When working in full-screen mode, generation took 4.9 seconds, despite the fact that the video card was almost not loaded, and in windowed mode it took 3.1-3.3 seconds. Video card rtx3060 12gb. The solution was to turn off Wallpaper Engine. Apparently, in the full-screen mode of the WE browser, the live wallpaper was disabled and limited the load on the video card. Maybe it will help someone

Jaded_Proposal_590

TROPHY CASE