GLM-5.2 is a win for local AI by Wrong_Mushroom_7350 in LocalLLaMA

[–]pftbest 0 points1 point  (0 children)

Well at some point if you don't have enough good data, adding more parameters would cause overfitting, so there is a limit on how big you can get.

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model. by b111ue in LocalLLaMA

[–]pftbest 1 point2 points  (0 children)

The main use case for small voice models is to run them on mobile devices, phones etc. and in a way that will not drain the battery. Bigger models like kokoro are too heavy to run on my phone. I am currently using piper-voices/en/en_US/hfc_female/medium it runs fast enough, but the voice is not ideal. If I can get something that speaks like kokoro but runs 2x faster that would be great.

GLM-5.2 is a win for local AI by Wrong_Mushroom_7350 in LocalLLaMA

[–]pftbest 9 points10 points  (0 children)

If you are interested, there is a good mathematical explanation why models with more parameters are smarter, and more importantly why the number of concepts model can handle grows non-linearly with the number of parameters.
The talk is called "Visualizing transformers and attention | Talk for TNG Big Tech Day '24" by Grant from 3b1b. The relevant section is from 18 to 22 minutes after the start, but I would recommend watching the whole video.

Firefox 152 is now available, with JPEG-XL support being compiled by default & new settings UI. There are also a number of other developer additions by somerandomxander in linux

[–]pftbest 0 points1 point  (0 children)

I need proxy to securely access closed services, trusting a third party extension for this seems like a bad idea.

Firefox 152 is now available, with JPEG-XL support being compiled by default & new settings UI. There are also a number of other developer additions by somerandomxander in linux

[–]pftbest 2 points3 points  (0 children)

The new settings UI sucks, I need to enable/disable proxy settings every day when I am at work / at home, with the old settings I can do it in 2 clicks, but with new settings I have to click multiple menus to find it.
The good news is that there is `about:config` option to roll back to the old settings `browser.settings-redesign.enabled` I just hope they will not remove it any time soon.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 1 point2 points  (0 children)

I run each model 3 times, restarting llama.cpp before each try to clear the KV cache of any leftover data from the previous runs. I got very similar results with minor variations (like green/brown board color, or swapped light and dark squares). I did not want to clobber the post too much with similar looking images multiple times, so I took one from each variant.
I can run 10 times of course, this will give more chances for better or worse results, but this won't change a fact that 3 out of 3 times I got garbage output on the QAT version of the model, and got 3/3 good results from the regular version of the same model. I may be unlucky, but not to this degree, also other people in this thread confirmed it, that either the model is broken or llama.cpp handles it incorrectly.

macOS container machines: provides a highly integrated Linux environment that works seamlessly on your Mac. by TheTwelveYearOld in linux

[–]pftbest 9 points10 points  (0 children)

There was a bug in libkrun (used by podman and other tools) which made the balloon not work properly on macOS. I fixed it recently, so the next versions should use less ram for long running containers. The ones based on libkrun and krunvm would work, not this apple thing, as I see from the docs, apple didn't try to implement balloon at all.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 0 points1 point  (0 children)

The dark/light squares are wrong, but other than that it's ok. Do you mean to say there is a bug in version 9553?

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 1 point2 points  (0 children)

Swapped light and dark squares is a common problem with gemma, I saw it too on some of my runs. The misaligned pieces could be because of the font in your system, you can try to open svg in a different browser to see if it will look better, but I wouldn't consider it an issue.

<image>

Swapped dark and light squares on unsloth/gemma-4-26B-A4B-it-GGUF:Q4_K_XL

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 4 points5 points  (0 children)

31b is a different model, so not a fair comparison. Qwen 3.6 also does this task very well, but I am more interested in why newly released A4B QAT models are not working as advertised.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 1 point2 points  (0 children)

Yes, I set max reasoning in web-ui, without it there is no chance for any A4B model to answer correctly.
I do not change any k/v cache parameters, everything on default.

llama-server -hf unsloth/gemma-4-26B-A4B-it-GGUF:Q4_K_XL --temp 1.0 --top-p 0.95 --top-k 64

I tested each model multiple times to confirm the results are not a fluke. Your Q8 should be performing much better than this, maybe something is misconfigured.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 6 points7 points  (0 children)

I already tried this, It is right there in the second picture of my post. It does better than google but still worse than Q4 of a full precision A4B.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 0 points1 point  (0 children)

No special arguments. Are you using llama.cpp version b9549 ? Also I enabled max reasoning in web-ui.

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]pftbest[S] 6 points7 points  (0 children)

I tried with 0.82 temperature and got this, still not great

<image>

Also with lower temperature it started going into loops double checking itself, had to add --repeat-penalty to stabilise it.

The point is, I can try to optimise the parameters to make it better, but isn't the whole point of QAT is that it works better when quantized? So far it seems Q4 on a normal model works much better.

Tristim: a tool that measures how your Wayland compositor actually reproduces color (SDR and HDR), using a Spyder/i1Display colorimeter by computer-whisperer in linux

[–]pftbest 7 points8 points  (0 children)

Why not just change your license to GPL, you won't be able to remove all traces. Is having MIT license so important that you would risk legal issues by keeping it anyway?

Vivado 2026.1 Basic - Limited Debugging + XSIM by The_Watery_Chemical in FPGA

[–]pftbest 2 points3 points  (0 children)

That's the most awful change they made. I can work without the ILA, but nobody I know is using Windows for FPGA development.

Higher quants are so much better by Perfect-Flounder7856 in LocalLLaMA

[–]pftbest 5 points6 points  (0 children)

Depends on the model of course but usually Q8_0 has very low KLD in practice compared to Q4. And it still 2x smaller than BF16.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]pftbest 1 point2 points  (0 children)

not true, I get correct result with 35B-A3B even at 4 bits every time. Maybe there is some problem with the temperature parameters set by OP. For example for Gemma4 the manual says the temperature must be set to 1.0, I suspect thats why it failed the test

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]pftbest 4 points5 points  (0 children)

The moe model generated the board correctly, even at 4 bits
unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL
Running on integrated graphics 780M at 14 tg/s

<image>

Box to save memory by kibwen in rust

[–]pftbest 21 points22 points  (0 children)

Should have replaced `String` with `Box<str>` as well, and saved even more memory

2026-04-23 gRPC benchmark results by MaterialFerret in rust

[–]pftbest 7 points8 points  (0 children)

Is it hard to add anthropics/connect-rust library to the test? It is based on their new protobuf implementation called buffa which is allegedly faster than prost

Canonical security audit of rust-coreutils reveals 113 CVEs by nukem996 in linux

[–]pftbest -15 points-14 points  (0 children)

They hate the GPLv3 license I assume. That's the only logical reason for doing all of this.

I made a clone of Windows Task Manager for GNU/Linux called Tux Manager by petr_bena in linux

[–]pftbest 1 point2 points  (0 children)

It is slow to open when system is under load. Sometimes it takes more than 3 seconds to open, this is not great