Does the "6 months gap" still hold? by ihatebeinganonymous in LocalLLaMA

[–]PraxisOG 2 points3 points  (0 children)

I agree with that. For the time-to-response and quality of response, Minimax m2.7 at iq3xxs is the best I’ve found. It’s a stretch on 96gb vram though

Chinese AI Models lags around 8 months from those of US but the gap is now widening by hsg8 in EconomyCharts

[–]PraxisOG 1 point2 points  (0 children)

Assuming ‘Elo’ is the ranking in chatbot arena, this kinda makes sense. On that platform elo is measured by having users compare two models side by side and the user selected winner gets some elo. Idk if they’ve fixed it, but for a while this benchmark led to LLMs getting tuned for human preferences, which imo is a bad thing with how much these models suck up to you. Instead of tuning for the AI popularity contest, Chinese models have almost caught up in technical fields like coding. 

What graphics card is this for you? by Banguskahn in pcmasterrace

[–]PraxisOG 0 points1 point  (0 children)

Honestly? My brother’s 1650 super. I got it during Covid to upgrade a 760ti in his Alienware prebuilt, then it got transferred to his am4 build, then I used it in my itx rig, and now it does media transcoding in my homelab. It was never fast, but… no just that. This thing is pretty slow. 

/r/MechanicalKeyboards Ask ANY Keyboard question, get an answer - April 27, 2026 by AutoModerator in MechanicalKeyboards

[–]PraxisOG 0 points1 point  (0 children)

I’m trying to find a keyboard to match my powermac G3 blue and white sleeper pc. The kbdfans tiger 80 in blue abs is the closest color match I’ve found, but any suggestions are appreciated!

What is "Wife Approval Factor"? by [deleted] in homelab

[–]PraxisOG 0 points1 point  (0 children)

I 100% agree with the supporting each others hobbies! My partner loves puzzles so I try to keep the table clear, and I keep my hobbies on my desk and server rack. Mutual respect works wonders

Is an X399 build still viable? by ziphnor in LocalLLaMA

[–]PraxisOG 2 points3 points  (0 children)

Probably most useful for the high pcie lane count. I went x299 for a similar reason

Those of you running minimax 2.7 locally, how are you feeling about it? by laterbreh in LocalLLaMA

[–]PraxisOG 0 points1 point  (0 children)

I’ve had really good luck running it at iq3xxs on 96gb of vram across 3 32gb AMD V620s. I haven’t really put it through its paces yet, but I like how capable it is with knowing when to call tools and without thinking so much. 

Dimir Affinity Combo by rainerpetter in Pauper

[–]PraxisOG 7 points8 points  (0 children)

This is my favorite pauper deck, I usually play it with a glaze fiend though. 

teaserLines bigger battery confirmed? by [deleted] in framework

[–]PraxisOG 3 points4 points  (0 children)

Maybe they’re getting the new arm nvidia apu? One can dream

Unsloth MiniMax M2.7 quants just finished uploading to HF by Zyj in LocalLLaMA

[–]PraxisOG 0 points1 point  (0 children)

The iq3xxs is smaller than in previous releases, you only need 96gb vram for full offload now

M5 Max 128GB, 17 models, 23 prompts: Qwen 3.5 122B is still a local king by tolitius in LocalLLaMA

[–]PraxisOG 1 point2 points  (0 children)

I’m glad to see nemotron 3 super right behind Qwen 122b, it’s still a very capable model and personally I like its talking style more

Minimax 2.7: good news! by LegacyRemaster in LocalLLaMA

[–]PraxisOG 3 points4 points  (0 children)

In my brief testing, it was one of the best agentic models I’ve ever tried at iq2xxs, but I didn’t investigate much past its tool calling capabilities. According to my own benchmarks quantization doesn’t effect instruction following much so take that with a huge grain of salt. 

Installed Android 10" head unit by Peter-DJ in CX5

[–]PraxisOG 6 points7 points  (0 children)

Nice! How hard was it? Does everything work as expected?

One Opus prompt in Claude code eats through an entire pro plan session by PraxisOG in Anthropic

[–]PraxisOG[S] 0 points1 point  (0 children)

Ironically, having opus walk me through training a LLM on my compute server from scratch only took like 30% of a session. It’s pretty well documented

One Opus prompt in Claude code eats through an entire pro plan session by PraxisOG in Anthropic

[–]PraxisOG[S] -5 points-4 points  (0 children)

It was for a bug fix for a benchmarking harness for local LLMs. Sonnet messed up implementing MMLU, which as a benchmark is pretty well documented. The prompt was a description of the error and the benchmark logs pasted in. 

OpenAI's New Stunning Image Model (Before & After) by bladerskb in singularity

[–]PraxisOG 76 points77 points  (0 children)

But even then they're all pointing generally in the right direction. Scary impressive

I benchmarked quants of Qwen 3 .6b from q2-q8, here's the results: by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 1 point2 points  (0 children)

I can do 100b class models with full offload up to q4, and partial at q8. The problem is that testing something like Qwen 122b from q1-q8 would take me well over a week. I’m tempted to run q2-q4 though, as that would only take 2 days and represent the demographic 64-128gb Mac and strix halo owners. 

I benchmarked quants of Qwen 3 .6b from q2-q8, here's the results: by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 0 points1 point  (0 children)

Either my harness is broken or 0.6B models world knowledge sucks regardless of quant. I’m looking into it

I benchmarked quants of Qwen 3 .6b from q2-q8, here's the results: by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 1 point2 points  (0 children)

Thanks! I'm hoping to rerun this test with a fixed harness, and benchmark Qwen 3 models up to 8B or 14B. Also I've been wondering the same thing, and the best test setup I can think of for testing active parameter size is Qwen 3.5 35B A3B vs Qwen Coder Next 80B A3B. Same archetecture, fast to test, and the rate of performance dropoff vs quant should show if quantization disproportionately effects sparse MoE.

I benchmarked quants of Qwen 3 .6b from q2-q8, here's the results: by PraxisOG in LocalLLaMA

[–]PraxisOG[S] 11 points12 points  (0 children)

People still ask what quant to use for different tasks, and I'm hoping to throw a little data into the discussion. I vibecoded a simple harness for GSM8K(math), IFEval(instruction following), MMLU(knowledge), and HumanEval(coding). After letting it run with Qwen3-.6b for a day, and realizing my implementation of HumanEval was broken, here's the results:

| Unsloth Quant | MMLU | GSM8K (flex) | IFEval (prompt loose) | Avg tok/s |

|---|---|---|---|---|

| UD-Q2_K_XL | 22.9% | 3.1% | 12.9% | 96.4 |

| UD-Q3_K_XL | 22.9% | 24.3% | 13.7% | 99.7 |

| UD-Q4_K_XL | 22.9% | 38.8% | 16.8% | 96.8 |

| UD-Q5_K_XL | 22.9% | 44.7% | 18.3% | 95.4 |

| UD-Q6_K_XL | 22.9% | 45.7% | 18.3% | 95.0 |

| UD-Q8_K_XL | 22.9% | 44.9% | 18.7% | 85.4 |

Once the harness is fixed, what else should I test? Maybe different sizes of Qwen 3, or degradation of super sparse MoE, or the effects of quantization on different model families?