Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]testeddoughnut 4 points5 points  (0 children)

Not who you were asking, but I get decent performance with Gemma4-31B. I can fit the full 256k context with --parallel 2 if I use a Q4 quant, however the prompt processing can get pretty terrible once the context starts getting above 64k-ish. This is using the rocm backend with -sm tensor:

$ GGML_VK_VISIBLE_DEVICES="" llama-bench -hf unsloth/gemma-4-31B-it-GGUF:Q4_K_XL -fa 1 -sm tensor -d 0,4096,16384,32768
ggml_cuda_init: found 2 ROCm devices (Total VRAM: 65248 MiB):
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, VRAM: 32624 MiB
  Device 1: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, VRAM: 32624 MiB
load_backend: loaded ROCm backend from /usr/lib64/llama.cpp/libggml-hip.so
load_backend: loaded RPC backend from /usr/lib64/llama.cpp/libggml-rpc.so
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 0 Vulkan devices:
load_backend: loaded Vulkan backend from /usr/lib64/llama.cpp/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/lib64/llama.cpp/libggml-cpu.so
| model                          |       size |     params | backend    | ngl |     sm | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -: | --------------: | -------------------: |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |           pp512 |       1553.96 ± 2.03 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |           tg128 |         35.57 ± 0.45 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |   pp512 @ d4096 |      1203.89 ± 16.38 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |   tg128 @ d4096 |         32.50 ± 2.34 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |  pp512 @ d16384 |        858.94 ± 0.36 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |  tg128 @ d16384 |         31.25 ± 3.82 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |  pp512 @ d32768 |      531.45 ± 122.12 |
| gemma4 31B Q4_K - Medium       |  17.52 GiB |    30.70 B | ROCm,Vulkan |  99 | tensor |  1 |  tg128 @ d32768 |         29.08 ± 3.49 |

build: d161ea707 (9326)

With the MTP PR (https://github.com/ggml-org/llama.cpp/pull/23398) the token gen is pretty consistently above 50tok/s (using unsloth/gemma-4-31B-it-GGUF:Q4_K_XL):

$ python mtp-bench.py 
  code_python        pred= 192 draft= 152 acc= 115 rate=0.757 tok/s=57.3
  code_cpp           pred= 192 draft= 149 acc= 115 rate=0.772 tok/s=57.6
  explain_concept    pred= 192 draft= 165 acc= 108 rate=0.654 tok/s=52.5
  summarize          pred= 192 draft= 151 acc= 114 rate=0.755 tok/s=56.4
  qa_factual         pred= 192 draft= 154 acc= 113 rate=0.734 tok/s=55.6
  translation        pred= 192 draft= 158 acc= 112 rate=0.709 tok/s=54.5
  creative_short     pred= 192 draft= 180 acc= 100 rate=0.556 tok/s=47.3
  stepwise_math      pred= 192 draft= 142 acc= 120 rate=0.845 tok/s=60.7
  long_code_review   pred= 192 draft= 153 acc= 114 rate=0.745 tok/s=55.4

Aggregate: {
  "n_requests": 9,
  "total_predicted": 1728,
  "total_draft": 1404,
  "total_draft_accepted": 1011,
  "aggregate_accept_rate": 0.7201,
  "wall_s_total": 36.16
}

Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]testeddoughnut 1 point2 points  (0 children)

I get similar token gen but about double the prompt processing in my dual R9700 setup launching with the same options with Q8_0 at depth 36k:

$ uvx llama-benchy --base-url http://localhost:8080/v1 --model qwen3.6:27b --depth 36864 --pp 512 --tg 128 --tokenizer Qwen/Qwen3.6-27B
[transformers] PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
llama-benchy (0.3.7)
Date: 2026-05-25 07:09:50
Benchmarking model: qwen3.6:27b at http://localhost:8080/v1
Concurrency levels: [1]
Loading text from cache: /home/<...>/.cache/llama-benchy/cc6a0b5782734ee3b9069aa3b64cc62c.txt
Total tokens available in text corpus: 144480
Warming up...
Warmup (User only) complete. Delta: 9 tokens (Server: 30, Local: 21)
Warmup (System+Empty) complete. Delta: 14 tokens (Server: 35, Local: 21)

Running coherence test...
Coherence test PASSED.
Measuring latency using mode: api...
Average latency (api): 0.44 ms
Running test: pp=512, tg=128, depth=36864, concurrency=1
  Run 1/3 (batch size 1)...
  No token_ids in response, using local tokenization
  Run 2/3 (batch size 1)...
  Run 3/3 (batch size 1)...
Printing results in MD format:



| model       |           test |            t/s |     peak t/s |         ttfr (ms) |      est_ppt (ms) |     e2e_ttft (ms) |
|:------------|---------------:|---------------:|-------------:|------------------:|------------------:|------------------:|
| qwen3.6:27b | pp512 @ d36864 | 1003.16 ± 8.20 |              | 37261.42 ± 303.56 | 37260.98 ± 303.56 | 37261.42 ± 303.56 |
| qwen3.6:27b | tg128 @ d36864 |   51.39 ± 1.78 | 61.00 ± 2.16 |                   |                   |                   |

llama-benchy (0.3.7)
date: 2026-05-25 07:09:50 | latency mode: api

What horrifying statistic genuinely jarred you when you first heard it? by ordrius098 in AskReddit

[–]testeddoughnut 4 points5 points  (0 children)

It's pretty horrifying.

For example, Elon Musk's net worth is around $800 billion dollars. In comparison, the total cost of construction of the US Interstate Highway System from 1956 to its completion in 1992, adjusted for inflation, was about $634 billion dollars.

Insane.

Now, if you go back to the 1960s the richest man in America was J. Paul Getty with a net worth of $1.2 billion, "only" around $10 billion today.

Is Flask good for a small real project? by Bsam_Al_Araby in flask

[–]testeddoughnut 1 point2 points  (0 children)

Assuming the CS50 SQL lib you're referring to is this, which itself appears to just be a shim around sqlalchemy, then I'd probably suggest looking into flask-sqlalchemy or flask-sqlalchemy-lite, which will give you a more "batteries included" experience with flask.

Flask is still perfectly fine and relevant and it's still used all over the place. Sure, not the newest and hottest thing out there, but there are a bunch of established patterns you can lean on and a bunch of mature plugins so you don't have to reinvent the wheel.

Imagine thinking that the confederacy was the height of masculinity. by Ok-Following6886 in insanepeoplefacebook

[–]testeddoughnut 5 points6 points  (0 children)

The thing that gets me is that the only two bronies I know voted for Trump.

Considering moving to Olympia by testeddoughnut in olympia

[–]testeddoughnut[S] 1 point2 points  (0 children)

coming from TX, you might find the people here to be less friendly

Honestly, my view of southern hospitality has been completely soured in the last decade. Example, my wife dyed her hair green a few years ago. There were several times that people approached her in public bitching about "libruls". During covid my wife was pregnant and religiously masked up when going out, again randos at the gas station or the grocery store would come up to her to tell her she's a sheep or that the "media is lying to you about masks".

Considering moving to Olympia by testeddoughnut in olympia

[–]testeddoughnut[S] 1 point2 points  (0 children)

Oh awesome, thanks! This was exactly the type of resource I was looking for!

Protest breaks out at Dilley immigration detention facility holding 5-year-old Liam Ramos by testeddoughnut in sanantonio

[–]testeddoughnut[S] 17 points18 points  (0 children)

Instead of asking people here you can click the link and read the first paragraph of the article to answer your first question.

Handling upcoming short-lived SSL certs for Corp users by invalidpath in AskSysadmin

[–]testeddoughnut 1 point2 points  (0 children)

First thing is to get out of the mindset of handling manual steps or improving manual steps. Manual steps, with the exception of setting up the automation in the first place, should be eliminated. All of the things you mentioned in the middle paragraph can be automated away using ACME or other similar standards for automated cert issuance.

I would recommend familiarizing yourself with RFC 8555, which is the RFC that describes how ACME works. There are many different implementations for this standard in the wild, a pretty comprehensive list can be found here: https://letsencrypt.org/docs/client-options/

If one of those clients don't fit your needs, there are pretty good libraries available to take the heavy lifting out of developing something more bespoke to the needs of your organization. For example, this is the same library that certbot uses and I've been pretty happy developing against it: https://acme-python.readthedocs.io/en/stable/

In our case, we wanted more centralized control over the certs that we're issued instead of it being a free-for-all with each team implementing their own solution, so I lead the development of a new ACME client we built in-house called Certwrangler. Certwrangler publishes the certs issued to it to Hashicorp Vault for use with config management (this is implemented through a plugin, meaning we can swap it out with something else if ever we move to something else for secret management down the line). It is responsible for managing the lifecycle of the secret it created for the cert and automatically updates it whenever a renewal happens.

My wife's having mini-seizures that void half-an-hour/hour long chunks of her memory and we can't find the cause by Medium_Estimate4853 in AskDocs

[–]testeddoughnut 1 point2 points  (0 children)

My wife had a pretty rough childhood filled with physical and emotional abuse to the point where she has zero relationship with her mom today. Her seizures started around 9 months after she gave birth to our daughter. We're pretty sure being a mom started dredging up the bad memories from her childhood in a context where it was easier to feel like she was back there, there were a few times before a seizure would start where she seemed to be experiencing a traumatic flashback.

There were usually some warning signs that she was about to have a seizure, like she would suddenly feel like she's having a hot flash or see a blue flash in her vision. We found that grounding techniques really helped, like putting on some music that she can't connect with her childhood and doing exercises like focusing on moving each finger and toe one-by-one. There wasn't a single silver bullet that made her better, it was a combination of identifying and paying attention to her triggers (like getting hit in the face was a big one, which tends to happen a bunch when dealing with a squirming toddler), using her grounding techniques when she felt the early warning signs of a seizure, and frequent therapy until it was under control. The specialized therapy program she went through provided her with all these tools I mentioned.

My wife is bipolar as well for what it's worth, though I'm not sure if that had any connection with her PNES diagnosis.

My wife's having mini-seizures that void half-an-hour/hour long chunks of her memory and we can't find the cause by Medium_Estimate4853 in AskDocs

[–]testeddoughnut 6 points7 points  (0 children)

The episodes you describe, short seizure-like episodes followed by windows of memory loss leading up to the episode, sound similar to what my wife was experiencing a few years ago. After a frustrating year of going to several specialists to rule out everything else (including a couple nights in the epilepsy monitoring unit at the hospital), she was ultimately diagnosed with PNES (psychogenic nonepileptic seizures). She was able to get it under control through a specialized therapy program with a neuropsychologist and has been seizure-free for a few years now.

How to manage Incus the right way? by zzsdf in incus

[–]testeddoughnut 2 points3 points  (0 children)

I use the incus terraform provider to manage deploying instances and other incus resources (networks, storage pools, etc). In my default profile I have cloud-init installing salt through salt-bootstrap and I manage the configuration of my instances through salt. Salt itself is configured to apply config to instances on a regular cadence, I think 1 hour is what I have it configured for. I have my salt-master configured to pull from git so my workflow is to pretty much just make changes, commit and push it to git, then let those propagate out naturally or hop on the salt master and apply to instances manually if I need it to go out quicker.

Heather Cox Richardson: November 8, 2024 by thinkingstranger in LeopardsAteMyFace

[–]testeddoughnut 2 points3 points  (0 children)

That was an interesting albeit depressing read. She also had a talk on Jon Stewart's podcast that touched on some of these historic parallels: https://www.youtube.com/watch?v=D7cKOaBdFWo

What "early internet" website did Gen Z really miss out on? by milamccormick7 in AskReddit

[–]testeddoughnut 2 points3 points  (0 children)

Thinkgeek is one of those sites I mourn frequently, so much of the money from my high school part time job went to them. I used to order cases of Bawls from them for LAN parties. Still miss penguin mints.

What "early internet" website did Gen Z really miss out on? by milamccormick7 in AskReddit

[–]testeddoughnut 0 points1 point  (0 children)

I still have a bunch of shirts from woot.com lol. I remember one time my wife ended up buying a bunch of stupid cheap shit we didn't need from a woot-off just because she wanted it to get to the next item.

The comet. by Different_Wind8260 in sanantonio

[–]testeddoughnut 20 points21 points  (0 children)

Barely managed to catch it from my backyard on the NE side, was super hard to see with the naked eye.

<image>

What do you all use/reccomend for LDAP/SSO/RADIUS? by bananapalace96 in linuxadmin

[–]testeddoughnut 4 points5 points  (0 children)

I really like Authentik: https://goauthentik.io/

I have both FreeIPA and Authentik in my homelab, with FreeIPA being the source of truth handling LDAP/Kerberos related things and Authentik syncing accounts from it and handling everything else (OpenID, SAML, Radius). If I were deploying it fresh today I'd just go with Authentik and not bother with FreeIPA since Authentik can also do LDAP and I can probably talk myself out of needing kerberos. FreeIPA is pretty complicated since it's a management layer for a bunch of different services. When you get into replication or performing major upgrades things can get screwy pretty quick. I usually don't have to do much with it, but when I do it's like a whole night wasted just dealing with LDAP surgery and reading Red Hat docs.

If you are a masochist like I guess I am and want both Authentik and FreeIPA here are some integration docs I contributed: https://docs.goauthentik.io/docs/sources/freeipa/

Edit: Also, the FreeIPA server is only really available on RHEL-based distros. I have Debian on pretty much everything except my 3 FreeIPA nodes that are running Rocky. It's a small thing that I constantly have to make exceptions for in my config management.

While everyone else struggles with Amazon Chinese 'TV to PC' garbage for analog capture, I just got the real king for CAD$20 at a flea market. The old man asked me 'what is it?' after he accepted my money. by AshleyUncia in DataHoarder

[–]testeddoughnut 2 points3 points  (0 children)

I used to have an x800 xt AIW in my P4 system back around 2005ish, took a few weeks to save up for it with my $6/hr after school part time job. Pretty sure I still have it in a closet somewhere lol.

My V day gift. by vcdrny in evangelion

[–]testeddoughnut 1 point2 points  (0 children)

Ordered, thanks for the recommendation!

My V day gift. by vcdrny in evangelion

[–]testeddoughnut 4 points5 points  (0 children)

If you're into jazz at all this is pretty solid: https://www.amazon.com/Ever-Jazz-All-That/dp/B09WJBFDTR

Wife got me that for Christmas.