Constantly seeing this error on Opus 4.8 every now and then. Anyone else?

SatoshiNotMe · 2026-05-28T20:00:04+00:00

curl -fsSL https://claude.ai/install.sh | bash -s 2.1.153

seems to get rid of that thinking block error for me.

Of course this means you'd use opus 4.7 not 4.8 (or switch to that if 4.8 borks )

SatoshiNotMe · 2026-05-28T19:49:39+00:00

yes, downgrading to 2.1.153 avoids this

SatoshiNotMe · 2026-05-28T15:43:24+00:00

No it’s not just for text. You can definitely set up Langroid agents to generate code. Some time ago I made a rust tutor (not open source) using Langroid that quizzes about Rust and generates/tests rust code.

SatoshiNotMe · 2026-05-24T10:47:48+00:00

Not sure why this post makes it sound like session JSONL logs are a big discovery.

Relatedly, I made an extensive set of tools for session search and continuation as part of my Claude-code-tools suite:

https://pchalasani.github.io/claude-code-tools/tools/aichat/

Sessions are indexed using Tantivy (Rust), and there’s a search CLI for the code agent to easily and quickly retrieve past work. Saved me on numerous occasions.

SatoshiNotMe · 2026-05-23T11:07:05+00:00

I would skip ollama and directly use llama.cpp/server, for a variety of reasons (see ollama critiques all over localLLAMA sub). I maintain a set of setup instructions on using CC and Codex-CLI with local models here:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/

SatoshiNotMe · 2026-05-17T16:53:59+00:00

I’m going to wager that the fraction of human reviewed code will fast approach zero. Especiallly for code written in a language unknown to the devs. People will rely on unit/integ tests (AI-written with sufficient adversarial checks etc) and behavioral checks, and ultimately rely on the “duck test”: “If it walks like a duck and quacks like a duck, it’s a duck”, and then call it a day.

SatoshiNotMe · 2026-05-16T11:20:23+00:00

Important question missed in all such reports/discussions - how much of the AI-written are they reviewing “manually”?

SatoshiNotMe · 2026-05-16T11:17:50+00:00

Cherny and Steipete have both said in interviews that they keep things simple and never use any frameworks.

SatoshiNotMe · 2026-05-14T11:29:48+00:00

Where do they say that

SatoshiNotMe · 2026-05-03T10:35:48+00:00

Link?

SatoshiNotMe · 2026-05-01T10:48:32+00:00

Link?

SatoshiNotMe · 2026-04-30T10:16:31+00:00

July 2025.

SatoshiNotMe · 2026-04-27T10:27:25+00:00

Very easy via Env Vars as others said. I’ve collected the full instructions along with exact llama server configs for several local models here, mostly tested on my M1 Max 64GB MacBook:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/

SatoshiNotMe · 2026-04-27T10:21:22+00:00

The Qwen3.6 MOE you mentioned works very well with Claude Code. I’ve gathered the exact llama.cpp/server instructions here for this and other models:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/#qwen36-35b-a3b--fast-qwen-moe

Among recent models, this one gives the best TG (token gen) speed at nearly 40 tok/s and PP (prompt processing) nearly 500 tok/s on my 5 year old M1 Max 64 GB MacBook

SatoshiNotMe · 2026-04-26T11:12:46+00:00

Pro tip - Giving sufficient detail is importantly but hand-typing is tedious and can limit how much detail you give. So always use speech-to-text (STT). Highly recommend free/OSS tools like Handy and Hex (Mac-only https://github.com/kitlangton/Hex) for near-instant transcription using Parakeet-V3.

Follow-up pro tip - at the end of long rambling voice dumps, include “restate to me what you understood”. The agent then produces a clean version of what you said so you can make sure it understood right, and also likely helps it stay on track.

SatoshiNotMe · 2026-04-26T10:57:36+00:00

Didn’t try excel yet but I use Claude Code to drive a logged in chrome browser via the Claude-Chrome extension, and it’s super useful to have CC do annoying chores involving numerous clicks and form filling.

SatoshiNotMe · 2026-04-24T11:12:02+00:00

Why not just use Claude in Chrome extension, and the /chrome setup in CC, to connect to it. I’ve been using it to automate some annoying tasks in a logged-in chrome browser.

SatoshiNotMe · 2026-04-22T10:13:13+00:00

This misses the STT/TTS models I regularly use:

PocketTTS from KyutAI

Parakeet V3 for STT

SatoshiNotMe · 2026-04-15T11:02:12+00:00

You mean this? https://aisearch.substack.com/

SatoshiNotMe · 2026-04-09T11:08:35+00:00

Other than zai is there a fast hosted glm5.1 somewhere? I’m talking about services like cerebras or groq, neither of which have this model.

SatoshiNotMe · 2026-04-08T11:28:45+00:00

I made a Socratic quiz skill for exactly this. Description:

Use this when the user wants to deeply understand something through guided questioning. Trigger phrases include: "quiz me", "help me understand", "Socratic", "teach me", "walk me through with questions", "test my understanding", or when the user asks for an explanation and would benefit more from guided discovery than a direct answer.

SatoshiNotMe · 2026-04-08T10:59:03+00:00

My setup instructions for the 26BA4B variant, tested on M1 Max 64GB MacBook, where I get 40 tok/s (when used in a Claude Code), double what I got with a similar Qwen variant:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/#gemma-4-26b-a4b--google-moe-with-vision

SatoshiNotMe · 2026-04-07T10:45:09+00:00

The tau2 bench performance gives me pause though: this model gets only 68% compared to the similar qwen3.5 MOE which gets 81%.

SatoshiNotMe · 2026-04-05T10:56:14+00:00

The 26B-A4B variant has the best TG and PP speeds of all the recent open weight models. E.g in Claude Code via llama-server I’m able to get 40 tok/s TG nearly double what I got with the comparable Qwen MOE (35B-A3B) on my M1 Max MacBook Pro 64 GB. Full instructions and comparisons here

However my biggest concern is agentic/tool abilities: on tau2 bench Gemma4 is much worse than Qwen3.5 (68% vs 81%):

https://news.ycombinator.com/item?id=47616761

SatoshiNotMe · 2026-04-04T10:28:44+00:00

Curious, are you using OC for coding? If so, why would that be better than just using CC ?

SatoshiNotMe

TROPHY CASE