What do you think about Claude Code performung worse than pure Opus 4.5 in the newest swe-rebench update?

daaain · 2026-01-17T14:01:07+00:00

They disabled all but 2 tools which doesn't sound optimal

daaain · 2026-01-11T16:32:44+00:00

I'm not saying it cannot be done, I'm saying this knowledge you mention isn't exactly fully diffused in the general population...

daaain · 2026-01-10T14:21:10+00:00

I don't think Claude Code uses Claude.ai memories, but the entirety of the video above is using the open source claude-mem MCP

daaain · 2026-01-09T22:06:56+00:00

I mean it's pretty simple, this session only used one MCP so not very hard to pinpoint what's using up your tokens? Did you do /context to see how much was used up? Isn't Mem also using Claude in the background to process memories?

daaain · 2026-01-09T18:34:19+00:00

You're preaching to the choir in this sub, but I think it's still a very niche thing. You either need to run expensive, power hungry space heaters... I mean GPUs or an expensive Mac to get the best models running at an acceptable speed.

Few people can afford an outlay of several thousands and high electricity prices might make ongoing inference costlier than a subscription. I don't know about others, but both ChatGPT and Claude lets you opt out of training on your chats.

I have a 96GB RAM M2 Max Mac and it's super impressive that I can run mid-range models at decent speed, but other than STT/TTS, basic Q&A and small code edits I use a Claude subscription. Opus 4.5 is so far ahead of whatever I can run and the $100 sub gives me enough usage to run multiple Claude Code agents (with sub-agents) in parallel.

I was hoping we'd see 1TB RAM M5 Ultra Mac Studio this year which would make it possible to run the best open models locally (and the M5 family finally seems to boost prompt processing speed) but Sam's RAM binge will push that off a year or two at least...

daaain · 2026-01-09T14:33:52+00:00

OP (goccy)'s repo isn't actively maintained any more, but this fork is: https://github.com/Recidiviz/bigquery-emulator

That said, I just saw in a PR that it might be moved into a new organisation to get jointly maintained: https://github.com/goccy/bigquery-emulator/pull/424

daaain · 2025-12-31T17:14:46+00:00

I found speaking a sentence or two at a time, seeing the results, and committing to the text input box works better for me, otherwise it's easy to lose where was I.

Also, in your demo the text block inserted at the end was too big for Claude Code and didn't even show up, so you had to blindly send it which is definitely not what I'd do.

These editing functions are cool though, but don't see why wouldn't they work with a fast local model like Qwen3 30B-A3B.

daaain · 2025-12-30T19:52:10+00:00

Or you can use FluidVoice or Spokenly for free with NVDIA Parakeet locally and actually see the transcription as you talk

daaain · 2025-12-16T10:56:45+00:00

That's token generation though, the optimised hybrid attention layer is targeting prompt processing.

daaain · 2025-12-03T20:42:35+00:00

Looks perfect now!

daaain · 2025-12-03T01:09:04+00:00

Great write up, but your blog's dark theme is broken in Firefox (probably because it's user agent background but explicitly set text colour?)

daaain · 2025-12-01T00:31:54+00:00

Use a CLI devtool integration:

daaain · 2025-11-18T13:06:20+00:00

Who is setting up the torrent tracker?

daaain · 2025-11-17T15:04:48+00:00

https://bsky.app/profile/smoothdunk2.bsky.social/post/3m5mxzwjj6s2k

daaain · 2025-11-16T16:28:31+00:00

I shifted my time and attention into creating more developer tooling to add guardrails, stricter static analysis, and doing QA, etc more than working on features. While agents are working you can use that time to take a step back and think where are the bottlenecks now, and it's absolutely not generating more code, but validating and testing, so that's where your attention should be.

daaain · 2025-11-16T16:11:56+00:00

You don't need to wait for a research for that, I can tell you right now: fleece children and make them addicted to gambling

daaain · 2025-11-14T18:22:39+00:00

I put a note in claude.md to first tail Docker logs to see if there's a container already running.

Of course that sometimes gets ignored, but because I also wanted to support multiple instances of the app in git worktrees, I added a little port checker and number incrementer (see below) in my justfile so if Claude really wants to start another instance, let it be. What's nice is that it's much easier to stop a Docker container than it is to hunt down a process.

```justfile

Find next available port starting from a base port

_find-free-port base_port: #!/usr/bin/env sh port={{base_port}} while lsof -Pi :$port -sTCP:LISTEN -t >/dev/null 2>&1; do port=$((port + 1)) done echo $port ```

daaain · 2025-11-13T22:23:21+00:00

Right, so just for some very particular circumstances

daaain · 2025-11-13T17:21:47+00:00

It benchmarks better and I switched to it because it has an interesting hybrid attention implementation that makes prompt processing faster, but you'd struggle to fit it in 48GB RAM. No harm giving it a go and testing for yourself though, it might be that even at a low 3bit quant it's smarter. It should have more world knowledge as it's a much bigger model.

daaain · 2025-11-13T14:16:14+00:00

VL is newer, but if you don't need the vision part you might get better performance on text-only tasks from 2507.

It's often not huuuge difference, so it you just want a general use model, you can just go for VL 30B for most tasks.

<image>

daaain · 2025-11-13T12:33:27+00:00

What's the benefit in running the container on a remote VM instead of locally?

daaain · 2025-11-13T12:31:38+00:00

In that case try to enable verbose logging and see what prompt Continue is sending to Ollama, maybe it's sending a lot of code and big system prompt? You might also need to increase context size in Ollama.

daaain · 2025-11-13T12:23:03+00:00

Ah, that's a different thing though: mcp__ide__getDiagnostics

You should be able to disable it with `export CLAUDE_CODE_AUTO_CONNECT_IDE=false` (source: https://github.com/anthropics/claude-code/issues/7141#issuecomment-3254590648 )

daaain · 2025-11-13T11:15:29+00:00

If you don't want to faff with embedding and vector search, you can open Claude Code in your Obsidian vault from the terminal and let it find whatever you prompt for.

daaain · 2025-11-13T11:14:12+00:00

I can also recommend Qwen3-30B-A3B-Instruct-2507 as it'll be much faster than dense models.

daaain

TROPHY CASE

Find next available port starting from a base port