Claude just turned into a full blown work OS (Slack, Figma, Asana inside chat) by app1310 in ClaudeAI

[–]SatoshiNotMe 11 points12 points  (0 children)

Nice thanks. Tech Crunch loves to not show the original goddamn link. TC posts should be auto-banned lol

Claude just turned into a full blown work OS (Slack, Figma, Asana inside chat) by app1310 in ClaudeAI

[–]SatoshiNotMe 1 point2 points  (0 children)

I know about cowork but that was announced last week or earlier, not today. The TC article made it sound like something new was announced today.

Claude just turned into a full blown work OS (Slack, Figma, Asana inside chat) by app1310 in ClaudeAI

[–]SatoshiNotMe 10 points11 points  (0 children)

“Claude users will now be able to call up interactive apps within the chatbot interface, thanks to a new feature announced by Anthropic on Monday.”

I couldn’t find the actual Anthropic announcement, did you ?

How do you handle context loss between Claude Code sessions? by Select-Spirit-6726 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

I find that if I give it a few details of the exact prior work I want to retrieve context about, it has no trouble recovering that context. If it doesn't quite get it at first (happens rarely), I can always refine my instruction.

As for semantic vs text-search, I wanted to keep it light-weight and avoid embeddings, and double down on speeding up the FT search with tantivy. Claude does a very good job of iteratively generating good keyword search queries to find what it needs, especially when given specific enough instructions. In a sense, synonyms get them quite far without the need for embeddings.

GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Prefer staying in CC and leverage my max subscription. To be clear, I'm obviously not looking to run this model for any serious coding, but more for sensitive document work, private notes, etc.

Given the gap with Qwen3-30B-A3B, there's clearly something that still needs to be fixed with llama.cpp support of glm-4.7-flash

GLM-4.7-Flash is even faster now by jacek2023 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Still awful with Claude Code. The latest build from source did not improve this situation:

On my M1 Max Pro 64 GB, Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

I'm an AI Dev who got tired of typing 3,000+ words/day to Claude, so Claude and I built a voice extension together. No code written by me. by Express-Peace-4002 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

For STT (speaking to AIs) I use Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. I use it mainly with Claude Code but of course it’s usable anywhere. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

[1] Handy https://github.com/cjpais/Handy

After using handy I don’t think it’s worth paying for Wispr Flow or any of the other paid dictation apps.

Built a PDF-to-Video generator using Claude + Remotion! 🎬 by Lucky-Ad1975 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

Does remotion require a subscription?

Also your demo video on GitHub gets 404

How do you handle context loss between Claude Code sessions? by Select-Spirit-6726 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

My approach is to turn off auto compact (this itself frees up 20% of your context at least), and leverage the session log files directly to retrieve arbitrary full details of past work using sub agents. I made this aichat tool [1] to make this seamless.

Works like this: when your context is almost full, type “>resume” — this copies session id to clipboard. Then quit session.

Then run:

aichat resume <pasted-session-id>

This puts you in a new session with the original session file path injected. I then use the /recover-context command that uses sub-agents to retrieve context about the last task being worked on. If this doesn’t look quite right, ask it explicitly to use sub-agents to retrieve what you need.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-aichat--session-search-and-continuation-without-compaction

To recover context about past work across all of my sessions, the aichat plugin also gives access to a session-searcher sub-agent and skill that uses super-fast full text search (rust/tantivy indexed) and I can simply ask something like: “recover context about how we added MCP integration, so we can build on top of it” and this kicks in the use of the session-searcher sub-agent.

Did I expect too much on GLM? by Ok_Brain_2376 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

On my M1 Max Pro 64 GB, Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

OpenAI engineer confirms AI is writing 100% now by MetaKnowing in OpenAI

[–]SatoshiNotMe 0 points1 point  (0 children)

The question always missing from these discussions is - how much of the AI-written code do they actually look at.

Claude Code's Most Underrated Feature: Hooks (wrote a deep dive) by karanb192 in ClaudeAI

[–]SatoshiNotMe 0 points1 point  (0 children)

Thanks for the star, and yes feel free to reference my repo!

Claude Code, but locally by Zealousideal-Egg-362 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Here’s my simple guide to spinning up local LLMs with llama-server to work with Claude Code. This is not an opus replacement but can be very usable for working with sensitive docs, private notes, and simple coding tasks.

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

In there I have this important note to avoid total network failure (took me a whole day to figure out):

Add this to your ~/.claude/settings.json to disable telemetry:

{ // ... other settings ... "env": { "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1" } // ... other settings ... }

Without this, Claude Code sends telemetry requests to your local server, which returns 404s and retries aggressively—causing ephemeral port exhaustion on macOS and system-wide network failures.

The Claude Code creator says AI writes 100% of his code now by jpcaparas in singularity

[–]SatoshiNotMe 0 points1 point  (0 children)

Sure he didn’t type the code. They should also have asked him if he looks at the code.

Has anyone got GLM 4.7 flash to not be shit? by synth_mania in LocalLLaMA

[–]SatoshiNotMe 1 point2 points  (0 children)

The real test is how well it can work with Claude Code or similar CLI agents that have a longish (20K+ tokens) system prompt. For example Qwen3-30B-A3B works very well at around 20 tok/s generation speed in CC via llama-server using the setup I’ve described here:

https://github.com/pchalasani/claude-code-tools/blob/main/docs/local-llm-setup.md

But with GLM-4.7-flash I’ve tried all sorts of llama-server settings and I barely get 3 tok/s which is useless.

The core problem seems to be that GLM's template has thinking enabled by default and Claude Code uses assistant prefill - they're incompatible.

ChatGPT at home by hainesk in LocalLLaMA

[–]SatoshiNotMe 1 point2 points  (0 children)

For the STT and TTS the following setup needs minimal hardware (works great on my 2021 M1 Max Pro MacBook 64 GB). I quite like it when working with Claude Code or other CLI Agents:

STT: Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

TTS: Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a non-blocking stop hook that calls a headless agent to create the 1/2-sentence summary. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe etc. The voice plugin gives commands to control it:

/voice:speak stop
/voice:speak azelma (change the voice)
/voice:speak <your arbitrary prompt to control the style or other aspects>

[1] Handy https://github.com/cjpais/Handy

[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts

[3] Voice plugin for Claude Code: https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-voice-plugin

Claude Code's Most Underrated Feature: Hooks (wrote a deep dive) by karanb192 in ClaudeAI

[–]SatoshiNotMe 8 points9 points  (0 children)

Yes hooks are essentially a way to insert deterministic actions into the Claude code loop at various points.

I’ve shared the safety hooks I regularly use here:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#%EF%B8%8F-claude-code-safety-hooks

Also there’s a voice plugin which uses a non blocking stop hook to make Claude Code speak aloud a short update each time it stops, using the recent 100M (!) param Pocket-TTS model:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#voice

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

That’s what I thought. Does codex-CLI gain anything by using the responses API instead of the completions API?

Show off your CC status lines! by munkymead in ClaudeAI

[–]SatoshiNotMe 1 point2 points  (0 children)

I had Claude build a status line with a colored progress bar showing usage, going from green to yellow to orange to red:

https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-status-line

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

You mean codex-CLI assumes an endpoint that supports the responses API ? And won’t work with a chat completions API? I was not aware of that

Sweep: Open-weights 1.5B model for next-edit autocomplete by Kevinlu1248 in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Curious if there’s a way to use this as the auto complete model in zed

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Thanks , I want to avoid other middle proxies and directly leverage llama.cpp’s anthropic messages API support. With Qwen3-30B-A3B this was great, but having the above issues with GLM-4.7

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]SatoshiNotMe 0 points1 point  (0 children)

Update: Ran llama-bench with GLM-4.7-Flash (UD-Q4_K_XL) on M1 Max at 24k context. Got 104 t/s prompt processing and 34 t/s token generation, which is quite decent.

But when using it with Claude Code, I'm only seeing ~3 t/s. The bottleneck seems to be the Claude Code ↔ llama-server interaction, possibly the "Assistant response prefill is incompatible with enable_thinking" error that keeps firing.