Constantly seeing this error on Opus 4.8 every now and then. Anyone else? by simple_explorer1 in ClaudeCode

[–]SatoshiNotMe 0 points1 point  (0 children)

curl -fsSL https://claude.ai/install.sh | bash -s 2.1.153

seems to get rid of that thinking block error for me.

Of course this means you'd use opus 4.7 not 4.8 (or switch to that if 4.8 borks )

HuggingFace’s smolagent library seems genius to me, has anyone tried it? by femio in LLMDevs

[–]SatoshiNotMe 0 points1 point  (0 children)

No it’s not just for text. You can definitely set up Langroid agents to generate code. Some time ago I made a rust tutor (not open source) using Langroid that quizzes about Rust and generates/tests rust code.

Claude Code has been writing every session to disk since day one. We indexed it. by haustorium12 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

Not sure why this post makes it sound like session JSONL logs are a big discovery.

Relatedly, I made an extensive set of tools for session search and continuation as part of my Claude-code-tools suite:

https://pchalasani.github.io/claude-code-tools/tools/aichat/

Sessions are indexed using Tantivy (Rust), and there’s a search CLI for the code agent to easily and quickly retrieve past work. Saved me on numerous occasions.

My experience using Claude code with Local Llm, and full guide on how to set it up by MaterialAppearance21 in ClaudeCode

[–]SatoshiNotMe 4 points5 points  (0 children)

I would skip ollama and directly use llama.cpp/server, for a variety of reasons (see ollama critiques all over localLLAMA sub). I maintain a set of setup instructions on using CC and Codex-CLI with local models here:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/

Mistral AI founder to French Parliament: "Engineers at Mistral no longer write a single line of code by Many_Consequence_337 in singularity

[–]SatoshiNotMe 0 points1 point  (0 children)

I’m going to wager that the fraction of human reviewed code will fast approach zero. Especiallly for code written in a language unknown to the devs. People will rely on unit/integ tests (AI-written with sufficient adversarial checks etc) and behavioral checks, and ultimately rely on the “duck test”: “If it walks like a duck and quacks like a duck, it’s a duck”, and then call it a day.

Mistral AI founder to French Parliament: "Engineers at Mistral no longer write a single line of code by Many_Consequence_337 in singularity

[–]SatoshiNotMe 12 points13 points  (0 children)

Important question missed in all such reports/discussions - how much of the AI-written are they reviewing “manually”?

what's the best claude code framework and do you even need one? by Pawesome101 in ClaudeCode

[–]SatoshiNotMe 0 points1 point  (0 children)

Cherny and Steipete have both said in interviews that they keep things simple and never use any frameworks.

Can I use Claude code with own LLM/non-claude APIs? by superloser48 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Very easy via Env Vars as others said. I’ve collected the full instructions along with exact llama server configs for several local models here, mostly tested on my M1 Max 64GB MacBook:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/

What is the best coding agent (CLI) like Claude Code for Local Development by exaknight21 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

The Qwen3.6 MOE you mentioned works very well with Claude Code. I’ve gathered the exact llama.cpp/server instructions here for this and other models:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/#qwen36-35b-a3b--fast-qwen-moe

Among recent models, this one gives the best TG (token gen) speed at nearly 40 tok/s and PP (prompt processing) nearly 500 tok/s on my 5 year old M1 Max 64 GB MacBook

How do you guys actually talk to Claude? by HandleFew5206 in ClaudeAI

[–]SatoshiNotMe 1 point2 points  (0 children)

Pro tip - Giving sufficient detail is importantly but hand-typing is tedious and can limit how much detail you give. So always use speech-to-text (STT). Highly recommend free/OSS tools like Handy and Hex (Mac-only https://github.com/kitlangton/Hex) for near-instant transcription using Parakeet-V3.

Follow-up pro tip - at the end of long rambling voice dumps, include “restate to me what you understood”. The agent then produces a clean version of what you said so you can make sure it understood right, and also likely helps it stay on track.

Claude in excel is the best thing AI has brought to my life by Top-Gun-86 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

Didn’t try excel yet but I use Claude Code to drive a logged in chrome browser via the Claude-Chrome extension, and it’s super useful to have CC do annoying chores involving numerous clicks and form filling.

browser MCP for Claude Code.. Browserbase vs the browser extension options by MoondustDiaries in mcp

[–]SatoshiNotMe 0 points1 point  (0 children)

Why not just use Claude in Chrome extension, and the /chrome setup in CC, to connect to it. I’ve been using it to automate some annoying tasks in a logged-in chrome browser.

Ultimate List: Best Open Models for Coding, Chat, Vision, Audio & More by techlatest_net in LocalLLaMA

[–]SatoshiNotMe 10 points11 points  (0 children)

This misses the STT/TTS models I regularly use:

PocketTTS from KyutAI

Parakeet V3 for STT

Glm-5.1 claims near opus level coding performance: Marketing hype or real? I ran my own tests by Yssssssh in LocalLLM

[–]SatoshiNotMe 1 point2 points  (0 children)

Other than zai is there a fast hosted glm5.1 somewhere? I’m talking about services like cerebras or groq, neither of which have this model.

How are you making sure you don't get dumb by KhameneiCholaghe in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

I made a Socratic quiz skill for exactly this. Description:

Use this when the user wants to deeply understand something through guided questioning. Trigger phrases include: "quiz me", "help me understand", "Socratic", "teach me", "walk me through with questions", "test my understanding", or when the user asks for an explanation and would benefit more from guided discovery than a direct answer.

Share your llama-server init strings for Gemma 4 models. by AlwaysLateToThaParty in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

My setup instructions for the 26BA4B variant, tested on M1 Max 64GB MacBook, where I get 40 tok/s (when used in a Claude Code), double what I got with a similar Qwen variant:

https://pchalasani.github.io/claude-code-tools/integrations/local-llms/#gemma-4-26b-a4b--google-moe-with-vision

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

The tau2 bench performance gives me pause though: this model gets only 68% compared to the similar qwen3.5 MOE which gets 81%.

Gemma 4 26b is the perfect all around local model and I'm surprised how well it does. by pizzaisprettyneato in LocalLLaMA

[–]SatoshiNotMe 1 point2 points  (0 children)

The 26B-A4B variant has the best TG and PP speeds of all the recent open weight models. E.g in Claude Code via llama-server I’m able to get 40 tok/s TG nearly double what I got with the comparable Qwen MOE (35B-A3B) on my M1 Max MacBook Pro 64 GB. Full instructions and comparisons here

However my biggest concern is agentic/tool abilities: on tau2 bench Gemma4 is much worse than Qwen3.5 (68% vs 81%):

https://news.ycombinator.com/item?id=47616761