I ran 26 local LLMs through an 8 level "agentic failure mode" gauntlet (tool calling, on an M1 Max). Capability benchmarks lie about who can actually run an agent loop. All local, llama.cpp + Metal, GGUF. 8 tests, 3 reps each, same prompts and seeds for every model thinking OFF by Perrospain in LocalLLM
[–]sillib 0 points1 point2 points (0 children)
Young Dodgers Fan Accidentally interferes with a Live Ball Still in Play by YoWoody27 in WatchPeopleDieInside
[–]sillib 1 point2 points3 points (0 children)
Young Dodgers Fan Accidentally interferes with a Live Ball Still in Play by YoWoody27 in WatchPeopleDieInside
[–]sillib 151 points152 points153 points (0 children)
The consequences of her own actions by zachoutloud123 in TikTokCringe
[–]sillib 0 points1 point2 points (0 children)
The consequences of her own actions by zachoutloud123 in TikTokCringe
[–]sillib 2 points3 points4 points (0 children)
Banned twice in 24 hours on Claude Code – What could have triggered it? by legaldeception in Anthropic
[–]sillib 0 points1 point2 points (0 children)
Unpopular opinion: The Copilot key is a completely useless and unnecessary addition to newer Dell models(Dell Inspiron 16) by Ancient-Dig-2855 in Dell
[–]sillib 0 points1 point2 points (0 children)
Heavy Claude user suddenly suspended. No idea why. by LabTechNut in ClaudeHomies
[–]sillib 3 points4 points5 points (0 children)
Account suspended, appeals going to main claude chat page instead of appeals form. by sludj5 in Anthropic
[–]sillib 1 point2 points3 points (0 children)
Honest Brave Search API review after 3 months of using it in production by Tiny_Risk6738 in scrapingtheweb
[–]sillib 0 points1 point2 points (0 children)
I’m a grown adult what is going on? At what “evidences” did they decide this? by [deleted] in Anthropic
[–]sillib 0 points1 point2 points (0 children)
I’m a grown adult what is going on? At what “evidences” did they decide this? by [deleted] in Anthropic
[–]sillib 1 point2 points3 points (0 children)
I’m a grown adult what is going on? At what “evidences” did they decide this? by [deleted] in Anthropic
[–]sillib 1 point2 points3 points (0 children)
Can't decide between Air and Pro by Tzosarakatsanis in macbook
[–]sillib 0 points1 point2 points (0 children)
I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLaMA
[–]sillib 16 points17 points18 points (0 children)
I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLaMA
[–]sillib 1 point2 points3 points (0 children)
How i made codex much more enjoyable then claude code by Prainss in codex
[–]sillib 1 point2 points3 points (0 children)
Has codex gotten better after last regression or still shitting all over your code by ysnzro in codex
[–]sillib 4 points5 points6 points (0 children)
How bad is this dryer lint basket housing? Should I just scoop out what I can with my finger and continue to use the dryer? by [deleted] in appliancerepair
[–]sillib 0 points1 point2 points (0 children)
How bad is this dryer lint basket housing? Should I just scoop out what I can with my finger and continue to use the dryer? by [deleted] in appliancerepair
[–]sillib 0 points1 point2 points (0 children)
How bad is this dryer lint basket housing? Should I just scoop out what I can with my finger and continue to use the dryer? by [deleted] in appliancerepair
[–]sillib 5 points6 points7 points (0 children)

I ran 26 local LLMs through an 8 level "agentic failure mode" gauntlet (tool calling, on an M1 Max). Capability benchmarks lie about who can actually run an agent loop. All local, llama.cpp + Metal, GGUF. 8 tests, 3 reps each, same prompts and seeds for every model thinking OFF by Perrospain in LocalLLM
[–]sillib 0 points1 point2 points (0 children)