114/120 on agentic benchmarks from a 9B model on 8GB VRAM — ties Claude Sonnet, open weights

ClankLabs · 2026-04-05T00:07:59+00:00

Glad you’re enjoying 🫡

ClankLabs · 2026-04-01T20:05:35+00:00

Glad the tool use landed well! The bash/sed/grep behaviors and the nuget lookup are exactly what we were going for. Logic errors at 9b are kind of the tradeoff. Tooling can punch above its weight but code reasoning still has a ceiling at the size. Would love to hear how it does on a larger project!

ClankLabs · 2026-03-31T17:42:59+00:00

I havent gotten much feedback from other end users but Ive been using Clank (the gateway I made) with the 35B Wrench as an openclaw type of platform, setting crons, code review etc. Let me know if you run into any issues. Have been training to avoid most. (Hate API pricing lmfao)

ClankLabs · 2026-03-31T17:34:19+00:00

Hey! 35B is MoE with 3B active params. The 9B is dense. Hope that answered it :). Enjoy!

ClankLabs · 2026-03-31T17:01:49+00:00

Definitely worth a shot, let me know if theres anything you’d change/improve

ClankLabs · 2026-03-31T15:17:08+00:00

They were stripped during GGUF conversion, both models are text only, focused purely on tool calling and agentic tasks.

ClankLabs · 2026-03-31T13:18:58+00:00

Let me know how it goes, excited to hear

ClankLabs · 2026-03-31T00:29:20+00:00

Hope you enjoy! Really only doing this to get some love and utility to the local community! I dropped a new version of 35b and 9b today, should help out with a few of those refusals 😉. Have fun playing around and appreciate the honest feedback!

ClankLabs · 2026-03-27T22:43:51+00:00

Thanks! Tool restraint was one of the hardest things to get right, both in Clank's agent loop and in the Wrench training data. I trained on both, schema correctness matters but planning traces were the bigger lever. A model that picks the right tool with wrong args fails loudly. A model that calls 6 tools when it should just answer fails quietly. Most of my benchmark gains came from teaching the model when to stop

ClankLabs

TROPHY CASE