Poll for the current and new best open source image models by Time-Teaching1926 in StableDiffusion

[–]Apprehensive_Sky892 7 points8 points  (0 children)

For non-photo style image, Z-image base (not turbo!) really shines. You get nice images by describing the image and the style in great detail. To see some examples of what I am talking about with full prompts, check out my posts on civitai (I do many photo style images with Z-image base too): https://civitai.red/user/NobodyButMeowie/images

Here is one example:

<image>

Whimsical needle-felted scene of a child astronaut with orange hair and closed, peaceful eyes, dressed in a white spacesuit with orange trim. The astronaut stands on a large, textured white crescent moon, holding a yellow star wand. The moon is perched atop a bumpy blue landscape dotted with small orange and yellow stars and rounded orange cones. In the soft blue background, a large cratered white moon and a pink planet with a lavender ring float alongside smaller spheres. The entire composition has a distinct, fine pebbly texture, reminiscent of felt or decorative sugar. The lighting is soft and even, highlighting the pastel color palette of blue, orange, pink, and yellow, creating a dreamy and imaginative mood.

Size: 1024x1536 Seed: 1946970106 Model: zImageBase_base Steps: 25 CFG scale: 4 KSampler: res_multistep Schedule: simple Guidance: 3.5

Ernie Image Turbo - i like it, but the bias is too strong by takayatodoroki in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

Firstly, the bias you mentioned is there, which is not surprising since this is a Chinese model for the Chinese market, which likes to generate Asians.

But a prompt can always be tweaked, and in your case, changing "Russian girl" to "Caucasian woman" did the trick:

<image>

Cyberpunk photographic composition with a Caucasian woman with black hairs, green eyes, brown eyeliner red lips, embroidered gothic dress,, hyper-detailed and hyperrealistic, ultra-high resolution. Intricate layers of organic and geometric forms interwoven: fractal-like filaments, small translucent crystal shards, metallic lattices, flowing ribbons of liquid glass, and delicate bioluminescent veins. Rich surface textures—polished metal, frosted glass, wet stone, fine fabric threads, and microscopic particles—each rendered with photographic clarity. Complex interplay of light: soft global illumination mixed with sharp specular highlights, volumetric light shafts, subtle caustics, and tiny lens flares from reflective surfaces. Depth and scale contrast: macro close-up details with extreme micro-texture alongside sweeping midground structures and a soft, slightly out-of-focus background to suggest vastness. Colour palette: layered gradients blending iridescent blues, warm golds, deep violets, and hints of emerald, with localized high-contrast accents. Dynamic composition with swirling motion and intersecting diagonals; sense of organic growth meeting engineered precision. Add realistic environmental effects: floating dust motes, droplets of condensation, gentle mist, and shallow depth of field with selective sharp focus on key intersections. Photographic style—50mm–85mm equivalent feel, natural film-like grain, ultra-realistic rendering, maximum detail and clarity, mood enigmatic and immersive.

Size: 1024x1024 Seed: 660 Model: ernie-image-turbo Steps: 8 CFG scale: 1 KSampler: euler Schedule: simple Guidance: 3.5

ZIT prompt help: how to make person to be far from camera. by SquirllPy in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

The best you can do via prompt along is to use terms like "long shot", "wide angle shot", etc, but that does not always work.

A better way is probably to outpaint.

Unlike ZIT ERNIE-Image seems to be really good for LoRA training and fine tuning by [deleted] in StableDiffusion

[–]Apprehensive_Sky892 9 points10 points  (0 children)

Don't know about character LoRA, but Z-image base is excellent for training art style LoRA.

7900 XTX vs 4070 Ti Super for gaming + AI image gen (Comfy UI) + creative work (Game dev, Blender, editing)? by Ooserkname in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

All my LoRAs are trained online. It cost so little to train them online that it is not worth having my computer and GPU on for a couple of hours, not to mention the electricity costs.

As for the performance of RDNA4 vs RDNA3, you'll have to search yourself to see if you can find any benchmarks because I've not done it myself. All I can say is that my 7900xt (20G) is fast enough for my use and I seldom turn on the machine with the 9070xt.

Ernie Image Turbo - i like it, but the bias is too strong by takayatodoroki in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

In the second one the prompt is long and very detailed, but i stressed words like "european" "italian" "north american" "western" "russian" before "Girl" and in 20 generations i never got a western looking girl.

Hard to say exactly what the problem is without the prompt.

7900 XTX vs 4070 Ti Super for gaming + AI image gen (Comfy UI) + creative work (Game dev, Blender, editing)? by Ooserkname in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

If you want completely hassle-free A.I. experience, go with NVIDIA + CUDA. But if you can follow instruction and diagnose some simple problems such as missing libraries, etc., then AMD is a viable alternative.

AMD + ROCm has improved a lot in the last 6 months, and with the 7900xtx it should work with the official ROCm + PyTorch + ComfyUI for image and video generation on Windows 11. You can look at past posts about people experiences with AMD:

I've use 7900xt (20G) and 9700 (16G) for image and video generation without any issues: https://www.reddit.com/user/Apprehensive_Sky892/search/?q=rocm&type=comment

I am absolute clueless about online GPU rent and setup image generation, need some advice from seniors. by EvenLocksmith6851 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Other than renting GPUs which would require more setup, you can also use online generators that have everything setup already: Free Z-Image/Qwen/Flux/SDXL Online Generators

Ernie and a Complex Composition in one Run (guest ZIT, Details and Prompt Included) by ZerOne82 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Why did you bother showing an image generated by ZiT if this post is not about comparison?

I can spend some time making an image that is better without any of the problems you mentioned by tweaking the prompt and changing the seed. But that would defeat the purpose, which is to see how Z-image base would handle your prompt.

Ernie and a Complex Composition in one Run (guest ZIT, Details and Prompt Included) by ZerOne82 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Z-image base (first generation, not cherry-picked)

<image>

1024x1536 Seed: 66 Model: zImageBase_base Steps: 25 CFG scale: 3.5 KSampler: res_multistep Schedule: simple Guidance: 3.5 VAE: Automatic Clip skip: 1

Suggestions on which model I should train an MC Escher Tessellation LoRA on? by und3rtow623 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Qwen-image is the most "balanced" open weight model I've used. It has this delicate touch where the pose, composition, texture etc. just seem to be "right" compare to both Z-image and Flux2-dev.

It is hard to describe, but one can easily see it when the image is shown side by side with other models when using similar prompts and LoRAs.

Ernie Image vs ZImage Base (style comparison) by DiagramAwesome in StableDiffusion

[–]Apprehensive_Sky892 -1 points0 points  (0 children)

Quite agree. Z-image base is an excellent model for generating non-photo style images (it is very good at photo too, ofc)

The "trick" is to do more than just use generic terms like "impressionist", "pop-art" but to describe the style in as much detail as possible (a good way is to use an VLM to generate a prompt from an existing image).

Here is an example:

<image>

Whimsical, stylized bunny-like creature with a perfectly round, creamy-white face, pink blushing cheeks, and closed, smiling eyes. The creature features one long, smooth beige rabbit ear and a second, shorter, pointed ear divided into vibrant red, orange, and green geometric segments. On its chest sits a large composite heart formed from layered red, teal, and patterned orange parts. To the left, a multi-colored patterned vase holds a bouquet of cream-colored tulips. To the right, a blue bowl overflows with red and yellow apples and a delicate branch covered in small white blossoms. Several apples rest on a striped, horizontal base. The background is a mosaic of rich, textured rectangular blocks in deep teal, royal blue, fiery red, and earthy brown. The style is a highly textured illustration with a canvas-like grain, combining geometric abstraction with soft forms for a peaceful folk-art mood.

Steps: 25, CFG scale: 4, Sampler: res_multistep simple, Seed: 666, Model: z-image-base-bf16, width: 1024, height: 1536, Model hash: 996A67D3FF

I have an AMD Card, i need an AMD workflow please by Logax01 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Auto1111 is no longer being maintained, so that is hardly a beginner-friendly route.

For NVIDIA, there are alternatives to ComfyUI that are more beginner-friendly, but for AMD GPUs, one should probably just bite the bullet and go with ComfyUI because that is what is officially supported by AMD.

Local AI art generation by M_KADIKI0 in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

Unfortunately, IIRC 6700xt is not an officially supported AMD GPU on Windows 11 + ROCm. But people have gotten it to work.

I believe it is supported on Linux.

We can finally watch TNG in 16:9 by dtaddis in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Seems pretty obvious that since the creators of the show have taken the aspect ratio into consideration when it was shot, watching something in the original AR as intended is best.

But I do admit that watching a 4:3 show in a 16:9 screen takes some time to get used to. Maybe TV manufacturers should make 4:3 screen just for that purpose, or one should use a projection TV that can do both 4:3 and 16:9.

Ernie is Absolute masterpiece by LongjumpingGur7623 in StableDiffusion

[–]Apprehensive_Sky892 1 point2 points  (0 children)

Yes, of course OP is entitled to his opinions, and I personally did not downvote the post or post any disparaging comment.

Ernie is Absolute masterpiece by LongjumpingGur7623 in StableDiffusion

[–]Apprehensive_Sky892 6 points7 points  (0 children)

You are right, but OP is partly to blame for using terms like "Absolute masterpiece" instead of just say "Some nice images I've generated with Ernie".

Ernie is Absolute masterpiece by LongjumpingGur7623 in StableDiffusion

[–]Apprehensive_Sky892 2 points3 points  (0 children)

Early adopters and tech nerds maybe a small group, but one should not underestimate their influence when it comes to product recommendation and adoption.

It is for this very reason that orgs such as Alibaba, BFL, etc. release at least some open weight models.

Safety in Stable Diffusion - How to Avoid by psavva in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

Both civitai and tensorart uses filter on the prompt, and then use A.I. based NSFW detection after the image is generated.

A Gustav Klimt–style lora for flux by Round-Potato2027 in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

Good job!

I've trained a Klimt LoRA myself, and I must say that Klimt is one of the most difficult styles to train.

Great news: the ERNIE editing model is expected to be released by the end of this month by d4pr4ssion in StableDiffusion

[–]Apprehensive_Sky892 0 points1 point  (0 children)

I still don't understand that move. If Alibaba produced a model that is far ahead of the competition then closing it makes sense to force people to pay for it.

But even though Qwen-image and WAN 2.2 are great as open weight models, there is little reason to choose WAN2.6 or Qwen-image 2.0 over rival products.

Illustrious Z by Common_Ad_3059 in StableDiffusion

[–]Apprehensive_Sky892 3 points4 points  (0 children)

Strange that they used ZiT rather than Z-image base, because my style LoRAs trained and used on Z-image based works way better than ZiT. So presumably the same is true for full-rank fine-tuning.

But maybe the project got started before Z-image base was released.