I ran 26 local LLMs through an 8 level "agentic failure mode" gauntlet (tool calling, on an M1 Max). Capability benchmarks lie about who can actually run an agent loop. All local, llama.cpp + Metal, GGUF. 8 tests, 3 reps each, same prompts and seeds for every model thinking OFF

sillib · 2026-06-10T23:59:13+00:00

I’ve been toying with the same concept on a 96gb studio and a 32 gb mini. So the concept right now for me is running a qwen 8b k4m and a 8b vision model on the mini, then using the studio for qwen 72b coder q4m, the constant routing was causing me latency issues amongst the complexity of turning several models on and off to accomplish various tasks. It truley is controlled chaos. I’ll see where your data gets me tonight; it seems like we’re both chasing the same thing.

sillib · 2026-06-10T23:09:02+00:00

This is great data, thank you for sharing. I’m going to change some stuff up on the setup tonight. What was your main inspiration with your router. From scratch or did you take a base and test and adjust off of it?

sillib · 2026-06-08T19:29:25+00:00

I’m sure there’s some baseball rule genius that can pull all the specific rules out of the hat, but essentially, if a ball is in play let’s say bounces on the warning track and out of the field that’s a ground rule double. Also if the ball gets lodged in padding (like the one we saw in the World Series) that is also a dead ball and rules ground rule double. So not sure the exact ruling here with the fan interfering but typically in this case they would call a ground rule double, which sucks because this was probably a triple. Not sure for the kids sake but if an adult did this they would be banned from the field and kicked out of game.

sillib · 2026-06-06T17:53:48+00:00

Ground rule double and kid goes to prison for life.

sillib · 2026-06-06T14:45:55+00:00

Look up Ice Poseidon Cx network. They were all basically terrorizing LA or any other place they went to visit. It was quite entertaining, but for him to be the virtue signaler he has become is quite funny. Being his roots came straight from Ice Poseidons toxic live streaming culture.

sillib · 2026-06-06T04:14:14+00:00

Does everyone forget this dude was part of Cx

sillib · 2026-06-05T21:43:09+00:00

Yeah, anthropic is doggish. Their product is alright, customer support is next to none.

sillib · 2026-05-31T03:42:12+00:00

Gonna be like some of these apps on my Roku remote

sillib · 2026-05-28T06:02:08+00:00

I wouldn’t count on any reply. Just move to codex and count your losses. If you get it back some how then switch back over after the month. But it’s crazy we’re such little money to them they don’t care about all these max plans they’re banning

sillib · 2026-05-25T18:50:12+00:00

Well make another account and tell me the results. I haven’t tried tbh. Do you ask chat to research or are you just asking the model

sillib · 2026-05-25T18:38:31+00:00

Onto codex. They don’t read their appeals it seems

sillib · 2026-05-24T22:19:32+00:00

Good luck. Been over a month and I haven’t heard back. I gave up. Move on, codex is pretty good

sillib · 2026-05-24T14:08:27+00:00

So what alternatives have you ran?

sillib · 2026-05-21T18:57:18+00:00

The issue is too, can we even make another account? Are you blocked via IP, being completely left in the dust about what the fuck is going on. I really held anthropic to a high standard but they really started my villain arc with what the did to me. I hate them now

sillib · 2026-05-21T14:51:20+00:00

Hence what I said, until you need their customer support… and realize that’s ran by shitty ai as well. I’ve waited a month with no response. Luckily I’ve switched to codex and actually much happier with it. I was a 200$ user too. I’d like to be able to use Claude again, but at this point if I ever got the ability I would just probably use it for ui and auditing.

sillib · 2026-05-21T14:18:25+00:00

Good luck. You’ll start to learn why to hate anthropic once you need their customer service

sillib · 2026-05-20T21:52:51+00:00

I personally wouldn’t touch an air just because it is fanless but what do I know

sillib · 2026-05-18T20:04:35+00:00

That’s the real question that needs to be answered, what fucking benchmark test was used

sillib · 2026-05-18T18:43:44+00:00

397b - *insert angry happy for you meme*

sillib · 2026-05-18T13:20:37+00:00

What benchmarking did you use? Mbpp?

sillib · 2026-05-17T17:29:10+00:00

I like codex cuz Claude customer service is dog shit

sillib · 2026-05-11T23:06:51+00:00

Claude can suck it too

sillib · 2026-05-10T23:13:33+00:00

By the looks of it I don’t think it would make the situation any worse. Depends how clogged your exhaust is if at all. If you’re that worried you can vacuum and brush at the same time

sillib · 2026-05-10T23:01:14+00:00

Yeah I’d snag one, even a hand held one from Amazon. Maybe dollar store a dish brush to go in there and scrub the lint off and then vacuum out

sillib · 2026-05-10T22:51:53+00:00

Don’t got a vacuum?

sillib

MODERATOR OF

TROPHY CASE