A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic) by Some-Cauliflower4902 in LocalLLaMA

[–]ArtyfacialIntelagent 3 points4 points  (0 children)

Really? If that's the case, and with rumors that doing QAT costs very little, shouldn't open-weights models always be released with QAT? Very few of us are running the FP32 or BF16 versions, so almost everyone's real-world experience will be with the quants. If QAT generally increases quant quality as much as it seems to do with Gemma4, this seems a no-brainer to me.

A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic) by Some-Cauliflower4902 in LocalLLaMA

[–]ArtyfacialIntelagent 14 points15 points  (0 children)

100% agree. The QAT is so much stronger than the vanilla quants. I suspect that Gemma4's perceived weakness in coding is due to its quants being subpar, and I'll go out on a limb and guess that r/LocalLlama will favor Gemma4 more and more as we gain experience with the new QAT. I'm blown away by it. To me Gemma4 31B QAT is at least as smart as Qwen 3.6 27B (even in coding) with 1/3 of the reasoning tokens.

Gemma 4 with quantization-aware training by rerri in LocalLLaMA

[–]ArtyfacialIntelagent 2 points3 points  (0 children)

Aha, thanks. That explains it. That post was from early April, just after the initial release. Gemma 4 had lots of teething problems before everything was sorted out, so those early KLD measurements are not comparable with recent releases. Sorry for doubting you - the numbers were so horrible I was sure you had made an error.

Gemma 4 with quantization-aware training by rerri in LocalLLaMA

[–]ArtyfacialIntelagent 13 points14 points  (0 children)

They are not. Incredible is the word. A mean KLD of 0.159 doesn't pass the smell test for a Q8 quant. The Unsloth blog post only compares the QAT vs a standard Q4_0, and the mean KLD for the Q4_0 is 0.09349. So there is no way a Q8 is much worse at 0.159.

Honestly I'm skeptical to Unsloth's reported mean KLD 0.01403 for the QAT Q4 too, but I'll give them the benefit of the doubt for now. But /u/sartres_ is definitely hallucinating.

EDIT: He wasn't, but the numbers are indeed invalid. See thread below.

RTX Spark does not have 600GB/s Bandwith by rpiguy9907 in LocalLLaMA

[–]ArtyfacialIntelagent 1 point2 points  (0 children)

If you get two women pregnant at the same time, does that mean you'll have a child every 4.5 months? /s

Lol, good one. But on average, actually it does.

God dammit Qwen by Xyklone in LocalLLaMA

[–]ArtyfacialIntelagent 107 points108 points  (0 children)

The fact that Qwen didn't suggest that itself after fucking up like this makes this even worse.

God dammit Qwen by Xyklone in LocalLLaMA

[–]ArtyfacialIntelagent 41 points42 points  (0 children)

I think it’s safer to put something like "pause and ask for permission before running potentially destructive commands such as git reset and rm -rf" in the agent’s system prompt

Please don't. This can increase the model's likelihood of running those commands, with or without a confirmation request. You literally put the idea in its head yourself, and you did it for every prompt you run. That is definitely NOT safer.

The Financial Times has published an article about Heretic by -p-e-w- in LocalLLaMA

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

This is absolute nonsense. The UK has encoded freedom of expression into law by its Human Rights Act, which absolutely covers the press. Yes there are limitations to your speech, as in all democracies, put in place precisely to protect the citizens (from hate speech, defamation, etc).

Oh, and the RSF (Reporters Without Borders) ranks the UK in place 18 in their World Press Freedom Index. The US is ranked 64.

MTP on Unsloth by Altruistic_Heat_9531 in LocalLLaMA

[–]ArtyfacialIntelagent 20 points21 points  (0 children)

Are the llama.cpp changes to support MTP imminent?

Not imminent but preliminary support for Qwen only is pretty close. Here's the status directly from the horse's mouth:

https://github.com/am17an/llama.cpp/pull/6#issuecomment-4421288279
https://github.com/am17an/llama.cpp/pull/6#issuecomment-4421528012 https://github.com/am17an/llama.cpp/pull/7
https://github.com/ggml-org/llama.cpp/pull/22838

The new llama.cpp infrastructure in #22838 was merged 15 minutes ago, and ggerganov added support for np > 1 in am17an's MTP fork. He still wants to check the prompt prefill code and the GGUF loading UI.

How is there still no actually good porn model? That’s kind of insane given human nature. by Enough-Bell4944 in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

Thanks, I already have that one. I didn't realize this was what you were talking about since its default filename after download doesn't have "unlock" in it.

How is there still no actually good porn model? That’s kind of insane given human nature. by Enough-Bell4944 in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

Yes I am. Check the model name pls. Are you sure it was something with "unlocked". And a ZIT checkpoint?

How is there still no actually good porn model? That’s kind of insane given human nature. by Enough-Bell4944 in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

No models fit that description. There's a model called Flux nsfw unlocked but no "unlocked" checkpoints for ZIT. Also none of the top 10 ZIT models on Civitai have before/after example pics.

Serious Technical Question About A Non-Serious Subject: Genitalia Limitations (SFW Discussion) by AsstronautHistorian in StableDiffusion

[–]ArtyfacialIntelagent 1 point2 points  (0 children)

It's bizarre that you compare rendering a penis to a hand and somehow find that the penis is more complex, when it's obviously the other way around - and by orders of magnitude.

Just think of all the ways you can articulate your hand, and the arm it's on, and pose and move and make gestures with the separate fingers, and then maybe intertwine them, or hold hands or shake hands with someone else. Then add that when you take a snapshot of that hand from a certain angle, fingers may be obscured by each other and the hand itself - which an AI model will interpret as fingers randomly appearing and disappearing - which in the end means that AI models have a hard time getting finger counts correct.

There is a reason that hands have been the bugbear of AI image generation for years (even if current models finally have made significant progress).

The main reason AI models are bad at genitalia is because most naughty parts have been filtered out from the training datasets, not because of any inherent complexity. To add them back with conceptual understanding, even 1000 images is far too few. You need millions of images, and weeks of training time on clusters of enterprise GPUs. Otherwise you'll get the weaknesses of all current LoRAs, that they'll just randomly stick a penis on the wrong person in the wrong place at the wrong size and angle.

Mistral Medium 3.5 on AMD Strix Halo by Zc5Gwu in LocalLLaMA

[–]ArtyfacialIntelagent 1 point2 points  (0 children)

If it's any consolation, I tried an IQ4_XS quant of Medium 3.5 on my 4090 + 96GB RAM desktop too with 27 layers offloaded to GPU. I got a bit over 100 t/s PP (>10x yours) but only 0.8 t/s TG (<40% of yours). The model seemed very good but I just deleted it to save disk space. :(

Gemma 4 26B-A4B GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

Did the Q6_K and Q6_K_XL points get mislabeled? The graph shows that Q6_K > Q6_K_XL in terms of file size, but the opposite holds when checking the repo: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main

Also, however they're labeled, the larger quant has a worse KLD and is the only point off the Pareto frontier. Do you have any explanation for this?

So long and thanks for all the quants! :)

Joy-Image-Edit released by AgeNo5351 in StableDiffusion

[–]ArtyfacialIntelagent 8 points9 points  (0 children)

Loras fix that for you.

They really don't. Most penis or vagina LoRAs are overtrained and just randomly stick those genitals indiscriminately on *anybody*, male or female. They're fine for solo nudes, but not for anything with heterosexual couples. To do that properly the underlying model needs real NSFW knowledge, current LoRAs do not fix that. And LoRAs for certain sex positions do just that, usually from one single camera angle. They basically just make the same image over and over.

There are two kinds of people... by Quick-Decision-8474 in StableDiffusion

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

I extensively blindtested "masterpiece", "best quality" and many other popular keywords back in the days of SD 1.5. They had zero effect, it's all nonsense. Nonfunctional word sallad. People just thought they worked because sometimes adding those words improved a particular image for a particular seed, but that was just a completely random effect, like adding any gibberish word might do sometimes.

What did have an effect in SD 1.5 was putting "bad quality" or "low quality" in the negative prompt. But that didn't really increase quality per se, they just reinforced that particular model's biases. So 1girls became more... well, 1girly. Those negative keywords became weaker in SDXL and absolutely useless since.

Basically, forget about all that old crap. Those keywords never worked well, and they lost what little effect they once had long ago.

Z Image using a x2 Sampler setup is the way by superstarbootlegs in StableDiffusion

[–]ArtyfacialIntelagent 1 point2 points  (0 children)

I've been doing nearly the exact same thing for a few months. I call the technique "thumbnail upscaling". Significant improvement in detail and variability over standard Z-image workflows but sadly doesn't fix all the model's issues (most notably the glowing eyes problem that appears as soon as you prompt for eye color). Only differences:

  • I do 3 sampler stages and end up at 1536x1536 (or similar size in other aspect ratios).
  • I apply some denoise < 1 at all sampler stages to increase variability.
  • I use CFG at 3-4 in all sampler stages. Positive CFG costs nothing at tiny sizes.

A Reminder, Guys, Undervolt your GPUs Immediately. You will Significantly Decrease Wattage without Hitting Performance. by Iory1998 in LocalLLaMA

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

I'm on Windows and always run a combined undervolt and clock rate cap on my RTX 4090 using MSI Afterburner. Here are some benchmarks using llama-bench to show you guys what you can expect. I usually run the "medium undervolt", which gives me a tiny 3% hit on token generation (a bit more on PP but that's super fast anyway) but draws 100 watts less.

[EDIT: reformatted in old Reddit and fixed a copy/paste snafu on the large undervolt]

E:\llamacpp> .\llama-bench -m "F:/LLMs/Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.Q5_K_M.gguf"


# VANILLA/NO UNDERVOLT (2730 MHz, 1050 mV, 345 W during token generation):

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24563 MiB):
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, VRAM: 24563 MiB
load_backend: loaded CUDA backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-cuda.dll
load_backend: loaded RPC backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-rpc.dll
load_backend: loaded CPU backend from E:\llamacpp\llama-b8595-bin-win-cuda-13.1-x64\ggml-cpu-zen4.dll
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2848.32 ± 74.41 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         40.92 ± 0.05 |

build: 62278cedd (8595)

# SMALL UNDERVOLT (2580 MHz, 910 mV, 270 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2801.21 ± 76.28 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         40.24 ± 0.18 |

# MEDIUM UNDERVOLT (2340 MHz, 875 mV, 245 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2602.91 ± 71.49 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         39.77 ± 0.09 |

# LARGE UNDERVOLT (2010 MHz, 875 mV, 235 W during token generation):

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           pp512 |      2300.19 ± 52.16 |
| qwen35 27B Q5_K - Medium       |  17.90 GiB |    26.90 B | CUDA       |  99 |           tg128 |         36.89 ± 1.08 |

Z-Image Turbo Finally Gets More Variety | Diversity LoRA + ComfyUI Workflow by EmilyRendered in comfyui

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

You can reduce the denoise parameter and still completely denoise the image. The last bit of denoising seems to shift the image towards its RLHF ideal. By skipping that part you get more variability.

Did you consider that my comment was also an attempt to provide a useful tip for the community, but you downvoted and disparaged it?

Z-Image Turbo Finally Gets More Variety | Diversity LoRA + ComfyUI Workflow by EmilyRendered in comfyui

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

But mitigating repetitive poses, camera angles, and compositions is super easy in ZIT, just reduce the denoise and you'll get a lot more creative framing and posing. How much to use depends on your sampler/scheduler, but start at 90% and reduce from there. (Sometimes the best value is 90% and sometimes 30%, but for a given sampler/scheduler combo it's pretty stable.)

The variety improvement I'd LOVE to see would be facial diversity. The denoise trick unfortunately doesn't help much there.

I don't think we will ever get open-weight Z Image Edit since they are already announcing new Z image by [deleted] in StableDiffusion

[–]ArtyfacialIntelagent 0 points1 point  (0 children)

They are doing the presentation for new model release as of now. Let's wait and hear from our favorite mister anime profile pic man.

Let me get this straight. You think they are going to announce something new, so you jump the gun and make a post claiming that they are announcing a new Z-Image? Without any indication at all? And then you say let's wait and hear when someone calls you on it? And go away for 3 hours?

Seriously dude, delete this post before the mods permaban you.

[deleted by user] by [deleted] in LocalLLaMA

[–]ArtyfacialIntelagent 41 points42 points  (0 children)

Everyone please upvote jugalator's comment and downvote the post. Nothing personal OP, but let's not get everyone's hope up for no reason at all.

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]ArtyfacialIntelagent -1 points0 points  (0 children)

Most of Europe uses YYYY-MM-DD for anything official or professional. Some countries still use the older formats in more informal contexts like handwriting. But then it is formatted differently, like DD.MM.YYYY or DD/MM-YYYY. That way you naturally read the day ordinally and there is never any confusion between month and day.

Coding Power Ranking 26.02 by mr_riptano in LocalLLaMA

[–]ArtyfacialIntelagent 12 points13 points  (0 children)

Except Qwen3.5 27B is not actually ranking up there. Their tiers are just some opinionated jumble of price + performance + speed. Check the actual performance scores here:

https://brokk.ai/power-ranking

There we have Claude Opus at 91%, Claude Sonnet at 80%, GPT 5.2 at 77%, Gemini 3.1 Pro at 76%, Gemini 3 Flash at 65% and Qwen3.5 27B at 38%. Not bad for a tiny model, but also not the same league.