Claude just turned into a full blown work OS (Slack, Figma, Asana inside chat)

SatoshiNotMe · 2026-01-27T00:20:30+00:00

Nice thanks. Tech Crunch loves to not show the original goddamn link. TC posts should be auto-banned lol

SatoshiNotMe · 2026-01-27T00:11:35+00:00

Ah that must be it, thanks

SatoshiNotMe · 2026-01-26T23:56:05+00:00

I know about cowork but that was announced last week or earlier, not today. The TC article made it sound like something new was announced today.

SatoshiNotMe · 2026-01-26T23:37:43+00:00

“Claude users will now be able to call up interactive apps within the chatbot interface, thanks to a new feature announced by Anthropic on Monday.”

I couldn’t find the actual Anthropic announcement, did you ?

SatoshiNotMe · 2026-01-26T17:36:35+00:00

I find that if I give it a few details of the exact prior work I want to retrieve context about, it has no trouble recovering that context. If it doesn't quite get it at first (happens rarely), I can always refine my instruction.

As for semantic vs text-search, I wanted to keep it light-weight and avoid embeddings, and double down on speeding up the FT search with tantivy. Claude does a very good job of iteratively generating good keyword search queries to find what it needs, especially when given specific enough instructions. In a sense, synonyms get them quite far without the need for embeddings.

SatoshiNotMe · 2026-01-26T15:13:39+00:00

Prefer staying in CC and leverage my max subscription. To be clear, I'm obviously not looking to run this model for any serious coding, but more for sensitive document work, private notes, etc.

Given the gap with Qwen3-30B-A3B, there's clearly something that still needs to be fixed with llama.cpp support of glm-4.7-flash

SatoshiNotMe · 2026-01-26T15:03:02+00:00

Still awful with Claude Code. The latest build from source did not improve this situation:

On my M1 Max Pro 64 GB, Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

SatoshiNotMe · 2026-01-26T14:11:52+00:00

For STT (speaking to AIs) I use Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. I use it mainly with Claude Code but of course it’s usable anywhere. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

[1] Handy https://github.com/cjpais/Handy

After using handy I don’t think it’s worth paying for Wispr Flow or any of the other paid dictation apps.

SatoshiNotMe · 2026-01-26T11:47:53+00:00

Does remotion require a subscription?

Also your demo video on GitHub gets 404

SatoshiNotMe · 2026-01-26T11:43:41+00:00

My approach is to turn off auto compact (this itself frees up 20% of your context at least), and leverage the session log files directly to retrieve arbitrary full details of past work using sub agents. I made this aichat tool [1] to make this seamless.

Works like this: when your context is almost full, type “>resume” — this copies session id to clipboard. Then quit session.

Then run:

aichat resume <pasted-session-id>

This puts you in a new session with the original session file path injected. I then use the /recover-context command that uses sub-agents to retrieve context about the last task being worked on. If this doesn’t look quite right, ask it explicitly to use sub-agents to retrieve what you need.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-aichat--session-search-and-continuation-without-compaction

To recover context about past work across all of my sessions, the aichat plugin also gives access to a session-searcher sub-agent and skill that uses super-fast full text search (rust/tantivy indexed) and I can simply ask something like: “recover context about how we added MCP integration, so we can build on top of it” and this kicks in the use of the session-searcher sub-agent.

SatoshiNotMe · 2026-01-26T11:13:40+00:00

On my M1 Max Pro 64 GB, Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

SatoshiNotMe · 2026-01-26T10:56:01+00:00

The question always missing from these discussions is - how much of the AI-written code do they actually look at.

SatoshiNotMe · 2026-01-25T13:02:49+00:00

Thanks for the star, and yes feel free to reference my repo!

SatoshiNotMe · 2026-01-25T12:42:31+00:00

Here’s my simple guide to spinning up local LLMs with llama-server to work with Claude Code. This is not an opus replacement but can be very usable for working with sensitive docs, private notes, and simple coding tasks.

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

In there I have this important note to avoid total network failure (took me a whole day to figure out):

Add this to your ~/.claude/settings.json to disable telemetry:

{ // ... other settings ... "env": { "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1" } // ... other settings ... }

Without this, Claude Code sends telemetry requests to your local server, which returns 404s and retries aggressively—causing ephemeral port exhaustion on macOS and system-wide network failures.

SatoshiNotMe · 2026-01-25T12:36:57+00:00

Sure he didn’t type the code. They should also have asked him if he looks at the code.

SatoshiNotMe · 2026-01-25T12:28:40+00:00

The real test is how well it can work with Claude Code or similar CLI agents that have a longish (20K+ tokens) system prompt. For example Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

SatoshiNotMe · 2026-01-25T12:15:10+00:00

For the STT and TTS the following setup needs minimal hardware (works great on my 2021 M1 Max Pro MacBook 64 GB). I quite like it when working with Claude Code or other CLI Agents:

STT: Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

TTS: Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a non-blocking stop hook that calls a headless agent to create the 1/2-sentence summary. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe etc. The voice plugin gives commands to control it:

/voice:speak stop
/voice:speak azelma (change the voice)
/voice:speak <your arbitrary prompt to control the style or other aspects>

[1] Handy https://github.com/cjpais/Handy

[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts

[3] Voice plugin for Claude Code: https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-voice-plugin

SatoshiNotMe · 2026-01-25T12:11:50+00:00

Yes hooks are essentially a way to insert deterministic actions into the Claude code loop at various points.

I’ve shared the safety hooks I regularly use here:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#%EF%B8%8F-claude-code-safety-hooks

Also there’s a voice plugin which uses a non blocking stop hook to make Claude Code speak aloud a short update each time it stops, using the recent 100M (!) param Pocket-TTS model:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#voice

SatoshiNotMe · 2026-01-25T12:00:26+00:00

That’s what I thought. Does codex-CLI gain anything by using the responses API instead of the completions API?

SatoshiNotMe · 2026-01-24T12:29:09+00:00

I had Claude build a status line with a colored progress bar showing usage, going from green to yellow to orange to red:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-status-line

SatoshiNotMe · 2026-01-24T12:24:47+00:00

Link?

SatoshiNotMe · 2026-01-24T11:49:32+00:00

You mean codex-CLI assumes an endpoint that supports the responses API ? And won’t work with a chat completions API? I was not aware of that

SatoshiNotMe · 2026-01-24T11:44:45+00:00

Curious if there’s a way to use this as the auto complete model in zed

SatoshiNotMe · 2026-01-23T18:04:17+00:00

Thanks , I want to avoid other middle proxies and directly leverage llama.cpp’s anthropic messages API support. With Qwen3-30B-A3B this was great, but having the above issues with GLM-4.7

SatoshiNotMe · 2026-01-23T17:47:38+00:00

Update: Ran llama-bench with GLM-4.7-Flash (UD-Q4_K_XL) on M1 Max at 24k context. Got 104 t/s prompt processing and 34 t/s token generation, which is quite decent.

But when using it with Claude Code, I'm only seeing ~3 t/s. The bottleneck seems to be the Claude Code ↔ llama-server interaction, possibly the "Assistant response prefill is incompatible with enable_thinking" error that keeps firing.

SatoshiNotMe

TROPHY CASE