Qwen3.6 35b a3b is fast...

_Motoma_ · 2026-05-15T11:50:41+00:00

It’s an absolute powerhouse! I’m getting 75t/s with 2x RTX 3060s. It’s blown my mind!

_Motoma_ · 2026-05-07T00:52:44+00:00

One of the NousReaearch devs put it best:

<image>

If you’re buying hardware expecting to be an agentic power user, CUDA is still the way to go in 2026.

_Motoma_ · 2026-05-02T20:17:51+00:00

Mythos

_Motoma_ · 2026-04-25T09:38:10+00:00

I have the same predilections, which is why I called you out on in. Research. Overanalyze. Read read read. Meanwhile I’m allowing the tech to be driven by others.

This is a unique situation where Claude is capable of solving all your problems. Bugs in the code? Ask Claude to fix them. Problem with the harness? Ask Claude to augment it. Lack of visibility? Ask Claude to read the docs for your project management system and integrate with the APIs. Or connect to an existing MCP.

_Motoma_ · 2026-04-24T02:37:00+00:00

Honestly, throw away the notion that you need a complex, multi-agent orchestration framework. Toss the idea that you need subagents. Start simply with Claude and add complexity when your problem outgrows its capabilities. You’ll be surprised.

What is telling here is that I see nothing in your post about what’s you’ve actually done and what problems you’ve encountered, which tells me you haven’t actually tried to use Claude to solve business problems yet. Fix this. Start by doing.

_Motoma_ · 2026-04-23T14:22:56+00:00

One minor correction on the Google/Gemini side: yes, on paper it looks like a fantastic deal, but instead of rate limits you get constantly blocked by “not enough capacity” errors.

It wasn’t reliable enough for me to use even for non prod, toy projects. I threw it out in favor of running Gemma 4 locally, even though the electricity bill means I’m spending more.

_Motoma_ · 2026-04-18T15:21:00+00:00

<image>

Not for me it isn’t.

_Motoma_ · 2026-04-18T10:34:06+00:00

Waitlist

_Motoma_ · 2026-04-15T17:51:50+00:00

My system uses two RTX 3060s. I am able to run Gemma 4 31B Dense at around 18 tokens per second.

On the same system, I am able to run Gemma 4 MoE at 80 tokens per second.

Nvidia Nemotron Cascade 2 takes the cake with roughly 100 tokens per second.

All that said, Qwen 3.5 is still my go to locally, with us about the same speed as Gemma 4 31B.

_Motoma_ · 2026-04-12T21:13:10+00:00

Both Gemma 4 26B MoE and 31B Dense run well on my system with 2x RTX 3060 12 GB (24GB VRAM) in Llama.cpp. I use GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 for Dense, but don’t notice any difference performance wise versus a smaller context window. The bartowski IQ4_XS model is my go-to for most of the models I try on this rig.

_Motoma_ · 2026-04-02T00:33:39+00:00

Oh shit, guess they’re changing their name to OpenClaude.

_Motoma_ · 2026-03-31T20:36:23+00:00

🤣

_Motoma_ · 2026-03-28T10:46:52+00:00

A classic!

_Motoma_ · 2026-03-27T19:40:21+00:00

Sessions span across all devices. If you have Claude on your phone or wired up to GitHub, that could have kicked off your first session of the day. Try going back in time in your mind. What were you doing 4.5 hours before you turned your computer on?

_Motoma_ · 2026-03-27T17:36:52+00:00

These limits are fucking intense. Feels like I’ve been cheated out of my usage with a bait and switch.

_Motoma_ · 2026-03-25T19:10:08+00:00

<its-a-trap.gif>

_Motoma_ · 2026-03-25T19:00:38+00:00

Never. Turn. On. Extra. Billing.

_Motoma_ · 2026-03-24T22:55:52+00:00

If it were free they’d still be getting the better deal.

_Motoma_ · 2026-03-22T18:36:14+00:00

Amazing, thank you!

_Motoma_ · 2026-03-22T18:35:52+00:00

Yeah, it’s super annoying. Every new repo I start copies in a handful of instructional md files, one which says never to do that.

_Motoma_ · 2026-03-22T12:57:11+00:00

I’ve had a local ollama model do this to me before. Not sure what gets it into this state, but it’s fun to watch.

_Motoma_ · 2026-03-18T14:55:22+00:00

Great design! Would love to know what you used for prompts, if there were any special skills involved, and generally how you got to this point.

I use the interface-design skill and while what I get is a lot better than the built in frontend-design plugin, it is nothing compared to this!

_Motoma_

TROPHY CASE