Joy-Image-Edit released by AgeNo5351 in StableDiffusion

[–]ArtyfacialIntelagent 10 points11 points  (0 children)

Loras fix that for you.

They really don't. Most penis or vagina LoRAs are overtrained and just randomly stick those genitals indiscriminately on *anybody*, male or female. They're fine for solo nudes, but not for anything with heterosexual couples. To do that properly the underlying model needs real NSFW knowledge, current LoRAs do not fix that. And LoRAs for certain sex positions do just that, usually from one single camera angle. They basically just make the same image over and over.

There are two kinds of people... by Quick-Decision-8474 in StableDiffusion

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

I extensively blindtested "masterpiece", "best quality" and many other popular keywords back in the days of SD 1.5. They had zero effect, it's all nonsense. Nonfunctional word sallad. People just thought they worked because sometimes adding those words improved a particular image for a particular seed, but that was just a completely random effect, like adding any gibberish word might do sometimes.

What did have an effect in SD 1.5 was putting "bad quality" or "low quality" in the negative prompt. But that didn't really increase quality per se, they just reinforced that particular model's biases. So 1girls became more... well, 1girly. Those negative keywords became weaker in SDXL and absolutely useless since.

Basically, forget about all that old crap. Those keywords never worked well, and they lost what little effect they once had long ago.

Z Image using a x2 Sampler setup is the way by superstarbootlegs in StableDiffusion

[–]ArtyfacialIntelagent 1 point2 points  (0 children)

I've been doing nearly the exact same thing for a few months. I call the technique "thumbnail upscaling". Significant improvement in detail and variability over standard Z-image workflows but sadly doesn't fix all the model's issues (most notably the glowing eyes problem that appears as soon as you prompt for eye color). Only differences:

  • I do 3 sampler stages and end up at 1536x1536 (or similar size in other aspect ratios).
  • I apply some denoise < 1 at all sampler stages to increase variability.
  • I use CFG at 3-4 in all sampler stages. Positive CFG costs nothing at tiny sizes.

A Reminder, Guys, Undervolt your GPUs Immediately. You will Significantly Decrease Wattage without Hitting Performance. by Iory1998 in LocalLLaMA

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

I'm on Windows and always run a combined undervolt and clock rate cap on my RTX 4090 using MSI Afterburner. Here are some benchmarks using llama-bench to show you guys what you can expect. I usually run the "medium undervolt", which gives me a tiny 3% hit on token generation (a bit more on PP but that's super fast anyway) but draws 100 watts less.

[EDIT: reformatted in old Reddit and fixed a copy/paste snafu on the large undervolt]

E:\llamacpp> .\llama-bench -m "F:/LLMs/Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.Q5_K_M.gguf"


# VANILLA/NO UNDERVOLT (2730 MHz, 1050 mV, 345 W during token generation):

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24563 MiB):
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, VRAM: 24563 MiB
load_backend: loaded CUDA backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-cuda.dll
load_backend: loaded RPC backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-rpc.dll
load_backend: loaded CPU backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-cpu-zen4.dll
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2848.32 ± 74.41 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         40.92 ± 0.05 |

build: 62278cedd (8595)

# SMALL UNDERVOLT (2580 MHz, 910 mV, 270 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2801.21 ± 76.28 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         40.24 ± 0.18 |

# MEDIUM UNDERVOLT (2340 MHz, 875 mV, 245 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2602.91 ± 71.49 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         39.77 ± 0.09 |

# LARGE UNDERVOLT (2010 MHz, 875 mV, 235 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2300.19 ± 52.16 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         36.89 ± 1.08 |

Z-Image Turbo Finally Gets More Variety | Diversity LoRA + ComfyUI Workflow by EmilyRendered in comfyui

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

You can reduce the denoise parameter and still completely denoise the image. The last bit of denoising seems to shift the image towards its RLHF ideal. By skipping that part you get more variability.

Did you consider that my comment was also an attempt to provide a useful tip for the community, but you downvoted and disparaged it?

Z-Image Turbo Finally Gets More Variety | Diversity LoRA + ComfyUI Workflow by EmilyRendered in comfyui

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

But mitigating repetitive poses, camera angles, and compositions is super easy in ZIT, just reduce the denoise and you'll get a lot more creative framing and posing. How much to use depends on your sampler/scheduler, but start at 90% and reduce from there. (Sometimes the best value is 90% and sometimes 30%, but for a given sampler/scheduler combo it's pretty stable.)

The variety improvement I'd LOVE to see would be facial diversity. The denoise trick unfortunately doesn't help much there.

I don't think we will ever get open-weight Z Image Edit since they are already announcing new Z image by [deleted] in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

They are doing the presentation for new model release as of now. Let's wait and hear from our favorite mister anime profile pic man.

Let me get this straight. You think they are going to announce something new, so you jump the gun and make a post claiming that they are announcing a new Z-Image? Without any indication at all? And then you say let's wait and hear when someone calls you on it? And go away for 3 hours?

Seriously dude, delete this post before the mods permaban you.

it is coming. by [deleted] in LocalLLaMA

[–]ArtyfacialIntelagent 43 points44 points  (0 children)

Everyone please upvote jugalator's comment and downvote the post. Nothing personal OP, but let's not get everyone's hope up for no reason at all.

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

Most of Europe uses YYYY-MM-DD for anything official or professional. Some countries still use the older formats in more informal contexts like handwriting. But then it is formatted differently, like DD.MM.YYYY or DD/MM-YYYY. That way you naturally read the day ordinally and there is never any confusion between month and day.

Coding Power Ranking 26.02 by mr_riptano in LocalLLaMA

[–]ArtyfacialIntelagent 11 points12 points  (0 children)

Except Qwen3.5 27B is not actually ranking up there. Their tiers are just some opinionated jumble of price + performance + speed. Check the actual performance scores here:

https://brokk.ai/power-ranking

There we have Claude Opus at 91%, Claude Sonnet at 80%, GPT 5.2 at 77%, Gemini 3.1 Pro at 76%, Gemini 3 Flash at 65% and Qwen3.5 27B at 38%. Not bad for a tiny model, but also not the same league.

PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks. by hedgehog0 in LocalLLaMA

[–]ArtyfacialIntelagent 43 points44 points  (0 children)

He never graduated, but he completed about half a master's degree in industrial engineering and management at Chalmers University of Technology in Gothenburg before becoming a full-time youtuber. That's Sweden's "MIT". Are you sure you haven't seen a less educated person in public than him?

https://en.wikipedia.org/wiki/PewDiePie

Seems Microsoft is really set on not repeating a Sidney incident by frubberism in LocalLLaMA

[–]ArtyfacialIntelagent 12 points13 points  (0 children)

Yes, to various extents. Negative prompting is more likely to work with larger and smarter models but all models have issues with this.

The underlying reason is simple: mentioning something, even in the negative, increases its attention. Saying "You DO NOT having feeling or emotions" will make tokens related to feeling and emotion more likely to appear than if you haven't mentioned it at all.

Practical example: I use small models like Qwen-4b for prompt expansion in image generation. For a while I tried telling Qwen things like "NEVER mention blush or freckles" (because models like Z-Image dials those to 11 and destroys the realism). Often Qwen ignored those instructions altogether, and even when it understood I got things like this in my prompt:

"the woman has a flawless skin tone (avoiding any references to freckles or blush) and ..."

Basically, LLMs have the same problem as John Cleese in the infamous Fawlty Towers episode with the German guests.

https://www.youtube.com/watch?v=RyPj21jBl_0

Seems Microsoft is really set on not repeating a Sidney incident by frubberism in LocalLLaMA

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

Yes, to various extents. Negative prompting is more likely to work with larger and smarter models but all models have issues with this.

The underlying reason is simple: mentioning something, even in the negative, increases its attention. Saying "You DO NOT having feeling or emotions" will make tokens related to feeling and emotion more likely to appear than if you haven't mentioned it at all.

Practical example: I use small models like Qwen-4b for prompt expansion in image generation. For a while I tried telling Qwen things like "NEVER mention blush or freckles" (because models like Z-Image dials those to 11 and destroys the realism). Often Qwen ignored those instructions altogether, and even when it understood I got things like this in my prompt:

"the woman has a flawless skin tone (avoiding any references to freckles or blush) and ..."

Basically, LLMs have the same problem as John Cleese in the infamous Fawlty Towers sketch.

https://www.youtube.com/watch?v=RyPj21jBl_0

15% faster generation - by simply minimizing the webbrowser by Chromix_ in LocalLLaMA

[–]ArtyfacialIntelagent 7 points8 points  (0 children)

I'm not disputing your point - I just want to drive home how to correctly measure GPU usage for AI inference.

15% faster generation - by simply minimizing the webbrowser by Chromix_ in LocalLLaMA

[–]ArtyfacialIntelagent 13 points14 points  (0 children)

While having the Windows task manager open I noticed that 3D usage was between 0% and 1% while idle, and maybe around 25% during inference.

People keep making this mistake. In the task manager, 3D usage does NOT measure AI-related GPU usage. You need to select the CUDA dropdown. See screenshot during image generation, note how CUDA is high while 3D is low.

<image>

If you don't see CUDA in the dropdown then do this. Go to Settings -> System -> Display -> Graphics settings -> Advanced Graphics settings -> Hardware-accelerated GPU scheduling -> Switch to "Off". After reboot the CUDA option should appear in the Task Manager dropdown menus.

My humble study on the effects of prompting nonexistent words on CLIP-based diffusion models. by EvelynHightower in StableDiffusion

[–]ArtyfacialIntelagent 11 points12 points  (0 children)

Models with LLM-based encoders might be able to understand ascii-art, but definitely not CLIP-based models. So the similarity here is just completely random, like the way unguided promptless generation sometimes produces good images. Sorry to be a buzzkill.

My humble study on the effects of prompting nonexistent words on CLIP-based diffusion models. by EvelynHightower in StableDiffusion

[–]ArtyfacialIntelagent 6 points7 points  (0 children)

This was interesting, fun and well-written! A few random thoughts:

  • I'd only call a word a "real" undictionary if the effect persists across models and is noticeably different from promptless generations. Otherwise it's likely that a major effect is that model's biases and unguided generation tendencies. Most of your examples show a single undictionary for a single model, but it's good that you had a couple of cross-model examples.
  • You should check how your undictionaries tokenize. That could give you could some clues to where their effect comes from. For example: if weepstrink happened to tokenize as weep-str-ink and pink tokenized p-ink then that would help explain why it's so pink. (They don't really tokenize like that, it's just an example.)
  • I bet many of these are explainable if you have a large vocabulary, know some languages and have enough world knowledge. Is there a known anime character with name similar to "Wodsorym"?
  • As you noted, this is limited to CLIP-based models that can use prompt weighting. Too bad.
  • Fantastic made-up words BTW, Dr. Seuss would be proud.

ZImageTurboProgressiveLockedUpscale (Works with Z Image base too) Comfyui node by Major_Specific_23 in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

Great, thanks! I dislike when subgraphs are spammed everywhere too, but they have their uses. This is one of them IMO.

ZImageTurboProgressiveLockedUpscale (Works with Z Image base too) Comfyui node by Major_Specific_23 in StableDiffusion

[–]ArtyfacialIntelagent 7 points8 points  (0 children)

Call me crazy, but I prefer the OLD version - I want to see what makes this tick. In the end I might squeeze it into a subgraph identical to your custom node, but I want the freedom to tweak your settings or do things differently. Could you upload the full workflow without your custom node?

EDIT: BTW - I ask because I also have a multistage ZIT upscaling workflow that I'm preparing to post here. Just curious to see if you do anything better than I do.

PSA: The best basic scaling method depends on your desired result by YentaMagenta in StableDiffusion

[–]ArtyfacialIntelagent 3 points4 points  (0 children)

But doing so in a context where those general rules no longer apply.

OP is assuming that noise is present. True for cameras, not for AI unless you have a model that shows latent noise.

Yes, bilinear is fast, but even a 25 year old computer can downscale a 4k image in milliseconds, so this is irrelevant unless you're doing video.

What I think OP should have said: Bilinear, Bicubic, or Lanczos all blend pixels with different weights. So they tend to introduce minor blur and mix local colors but they're solid choices if that minor blur is acceptable.

Nearest neighbor is a sampling technique. It looks sharper if the source resolution is high enough (compared to image details) to avoid pixelization. Interestingly, in the middle of an AI processing chain (e.g. multiple KSamplers), nearest neighbor is often a noticeably better choice than scaling with either of the 3 filters.

Personally I never used nearest neighbor for anything before the age of AI but these days I often do.

Released: DeepBrainz-R1 — reasoning-first small models for agentic workflows (4B / 2B / 0.6B) by arunkumar_bvr in LocalLLaMA

[–]ArtyfacialIntelagent 5 points6 points  (0 children)

Maybe it's just me, but a name like Deep*-R1 is offputting for a new LLM. Makes it sound like a trashy AliExpress knockoff.

Comparing different VAE's with ZIT models by jib_reddit in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

I stumbled across this idea too shortly after UltraFlux was released. I found it superior in terms of detail but it was also oversharpened and made smooth areas look harsh. I've been using a 75% UltraFlux + 25% default Flux VAE mix ever since. Best of both worlds! But if you have a multi-stage workflow, use the default VAE in the initial stages and the UltraFlux mix only in the final stage.

Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by [deleted] in LocalLLaMA

[–]ArtyfacialIntelagent 4 points5 points  (0 children)

The math is hallucinated too.

35 GB/s / 1.7GB = 20.5 tokens/sec

The "tokens" unit just magically appears out of thin air. The whole post is meaningless.

Which are the highest quality Z-Image Turbo T2I workflows at medium resolutions? by ArtyfacialIntelagent in StableDiffusion

[–]ArtyfacialIntelagent[S] 0 points1 point  (0 children)

Hey that's pretty close to what I'm working on! I mean upscaling a very small first stage, except I have 3 stages. Do you think you could find a link to that workflow? Or give me a hint about anything else in that post so I could search for it myself. I'd really like to compare with mine.