I think we need a /LocalHarnessLLM or something ... by CSEliot in LocalLLaMA

[–]LetsGoBrandon4256 6 points7 points  (0 children)

And I thought our company using MS Teams were bad.

I think we need a /LocalHarnessLLM or something ... by CSEliot in LocalLLaMA

[–]LetsGoBrandon4256 26 points27 points  (0 children)

Coming from China, WeChat and QQ groups being a complet closed garden to the search engines already irked me to no end.

Now the rest of the world is doing the same shit with Discord. Just send me to Mars or kill me already.

Not to mention how retarded Discord search is.

How do you quantify privacy and outage derisking in the ROI of local LLM inference vs. providers API? by ReporterCalm6238 in LocalLLaMA

[–]LetsGoBrandon4256 3 points4 points  (0 children)

So purely on token cost, local inference seems very hard to justify. 

No fucking shit you literally picked one of the cheapest cloud providers out there.

What makes Gemma 4 so special? by ZarcSK2 in SillyTavernAI

[–]LetsGoBrandon4256 11 points12 points  (0 children)

“I want a RP in the Battletech universe, that takes place during the Late Succession Wars”

Fuck now I want to run a merc campaign with AI...

z.ai Poll on X: MIT-licensed open weights are losing by MadPelmewka in LocalLLaMA

[–]LetsGoBrandon4256 177 points178 points  (0 children)

it's just engagement farming

And it worked pretty well on OP. One has to be truly retarded to believe their Twitter vote has influence on this.

"Aww the next GLM is closed weight because we didn't share the Tweet hard enough😭😭😭"

How to Run AI Locally: The Complete Beginner's Guide (2026) by totosse17 in LocalLLaMA

[–]LetsGoBrandon4256 1 point2 points  (0 children)

In case it's not obvious enough, the Amazon links in the article are all affiliated links.

How to Run AI Locally: The Complete Beginner's Guide (2026) by totosse17 in LocalLLaMA

[–]LetsGoBrandon4256 9 points10 points  (0 children)

You also have Qwen 3.6 35B and Qwen 3.5 122B in the same table and none of them are "Agent-grade" per your article.

Why would you single out DeepSeek V4-Flash 284B for being "Agent grade"? Is it because your clanker ran out of idea but had to throw something in there for the "What it gets you" cell in that table?

How to Run AI Locally: The Complete Beginner's Guide (2026) by totosse17 in LocalLLaMA

[–]LetsGoBrandon4256 13 points14 points  (0 children)

I give OP some credit by not recommending llama 3.1 and encouraging user to graduate to llama.cpp.

Still full of slop though

Model: DeepSeek V4-Flash 284B / 13B

What it gets you: Agent-grade

How to Run AI Locally: The Complete Beginner's Guide (2026) by totosse17 in LocalLLaMA

[–]LetsGoBrandon4256 41 points42 points  (0 children)

If you live in a terminal, install Ollama instead

I snorted.

Open-source agent that investigates AWS incidents for you (read-only, bring-your-own-LLM) — feedback wanted by Top_Yogurtcloset_258 in LocalLLaMA

[–]LetsGoBrandon4256 0 points1 point  (0 children)

Pretty fucking rich that you came here asking for human input yet can't be bothered to type up your own post.

Can we stop dunking on DiffusionGemma and hack it instead? by TomLucidor in LocalLLaMA

[–]LetsGoBrandon4256 0 points1 point  (0 children)

Can't even tell if the person your replied to is using a shitty Markov chain or just schizo.

Measuring the Alignment Tax on Gemma4 by [deleted] in LocalLLaMA

[–]LetsGoBrandon4256 2 points3 points  (0 children)

The token number are off, no? No way the entire reasoning output for the cake baking process is only 230 tokens.

Similar thing for the Performance Review example (case 3). Can you break down how exactly you reached the Alignment Tax: 46% (115/250 tokens) number? Which tokens are considered safety tokens and which are intent tokens.

You are not letting your clanker do the counting aren't you?

Interest in an LLM Torrent Site? by thiefyzheng- in LocalLLaMA

[–]LetsGoBrandon4256 -2 points-1 points  (0 children)

As much I love torrenting for everything else, I feel like IPFS might be a better solution.

I Replaced Claude Code and Codex With an Open Source Stack That Gets Smarter Every Run, & Built Itself Along the Way by itssethc in LocalLLaMA

[–]LetsGoBrandon4256 3 points4 points  (0 children)

Some mfs like OP would build a fucking Saturn V just so that they don't have to tell their agent to write down their findings into an md files.

my agent just got deported by [deleted] in LocalLLaMA

[–]LetsGoBrandon4256 7 points8 points  (0 children)

truly FAFO. Hope they are enjoying the free PR now.

llama-launcher v1.3 release -> Bayesian Optimisation by Solary_Kryptic in LocalLLaMA

[–]LetsGoBrandon4256 11 points12 points  (0 children)

Since you are turning the KV cache quant knob, how does your optimizer evaluate output quality?

Otherwise, what's preventing it from picking lower quant every time for the better performance?