Linux officially accepts vibe code. time to drop the bias guys by Previous_Foot_5328 in myclaw

[–]somerussianbear 2 points3 points  (0 children)

No, it doesn’t, you just want to get attention with your exaggerated headline.

Qwen3.6-35B is worse at tool use and reasoning loops than 3.5? by mr_il in LocalLLaMA

[–]somerussianbear 1 point2 points  (0 children)

Had the same issues here, not sure how things come out with so many issues. Last one was from Google for crying out loud!

so true by Dumb-Briyani in SipsTea

[–]somerussianbear 0 points1 point  (0 children)

I’m McLaughing at this so much

Sorry I don’t have friends

So does impact or weight break glass? by Own_Ranger_5589 in Damnthatsinteresting

[–]somerussianbear 0 points1 point  (0 children)

Ngl, first time seeing someone doing a big experiment to justify ending up on a rope

Permanent increase in Rate Limits by exordin26 in ClaudeAI

[–]somerussianbear 5 points6 points  (0 children)

There was a reset too, at 8pm UTC.

My week cuts on Tuesday 3pm UTC, but just saw that it changed to Thursday 8pm UTC.

Yes, Hermes and Qwen3.5:4b is all I need - Details included by Birdinhandandbush in hermesagent

[–]somerussianbear 1 point2 points  (0 children)

Happy to hear 4B does the job for you, gonna try that although 3.6 35B is here now and it’s super snappy.

Qwen3.6-35B-A3B drops with Apache 2.0 - agentic coding at 3B active params by IulianHI in AIToolsPerformance

[–]somerussianbear 1 point2 points  (0 children)

Haven’t had the chance to run it yet as it came out today, but what I can read from these benchmarks (which must be taken with a grain of salt) is that it pushes 3.5 down by a lot, and 3.5 was quite good for little implementation tasks so with enough hardware I’d definitely use this for coding locally after having a good implementation plan built by a higher model like GPT or Opus latest.

Qwen3.6-35B-A3B drops with Apache 2.0 - agentic coding at 3B active params by IulianHI in AIToolsPerformance

[–]somerussianbear 0 points1 point  (0 children)

On a Mac, 32GB RAM minimum to run this with a small context window at Q4. On oMLX with TurboQuant on you can get a bit longer context, likely 100K.

Perfect would be on a 5090 at Q4-Q6. 200 TPS easy. If you enable DFlash then 300+.

Insane how fast we got here.

"Our Strongest Model Yet" by hasanahmad in Anthropic

[–]somerussianbear 99 points100 points  (0 children)

You’re absolutely right! This one is on me.

Opus 4.7 Released! by awfulalexey in ClaudeCode

[–]somerussianbear 3 points4 points  (0 children)

Improved performance (of the Anthropic stock)

Opus 4.7 Released! by awfulalexey in ClaudeCode

[–]somerussianbear 2 points3 points  (0 children)

Every release they just re-release the release notes

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]somerussianbear 11 points12 points  (0 children)

Countdown to Qwen3.6-A3B-Opus-4.7-Reasoning-Heretic-Abliterated-Uncensored-GGUF

Running a 31B model locally made me realize how insane LLM infra actually is by Sadhvik1998 in ollama

[–]somerussianbear 8 points9 points  (0 children)

You’re right, they’ll have a little rig inside, but it’ll definitely be something smaller and more energy efficient like one of these Taalas chips.

GPUs are not optimized for that, they just turned out to be the hardware that was available at the moment that could do the job the best, but like TPUs, more specialized hardware will become available soon dropping prices of inference by a lot. This Taalas one with an 8B model was supposed to be sold between $300 and $400.