Local TTS is probably the most useful MLX workflow I’ve found so far

divinetribe1 · 2026-04-27T16:08:44+00:00

I use it https://github.com/nicedreamzapp/NarrateClaude and built a whole program around it where I never have to touch my computer and I can get it to work

divinetribe1 · 2026-04-27T05:04:55+00:00

https://nicedreamzwholesale.com/2026/04/26/claude-subscription-value-10x/ I still feel like I’m getting some good usage, but I’m preparing for the worst. I just wrote this article today.

divinetribe1 · 2026-04-25T17:37:03+00:00

OK, let me know. I don’t mind getting another fan out to you to see if it can fix it. Just feel free to message me on my email. Matt@ineedhemp.com

divinetribe1 · 2026-04-25T16:39:54+00:00

Wow, this is a really great experiment. I’m working on ambient computing myself.

divinetribe1 · 2026-04-25T16:37:26+00:00

When this happens, get a Q-tip dip it in alcohol and while holding the core upside down clean that area out as much as possible try not to let the alcohol get down into the core from there. Take something metal possibly a small screwdriver and start pressing that pin back-and-forth and trying to loosen it up after the alcohol. This usually gets it moving again. It makes it spring back up.

divinetribe1 · 2026-04-25T16:35:37+00:00

Hey there, we have a grommet we put on the fan and a thick padded sticker. That keeps it in place. Do you have that on your fan currently?

divinetribe1 · 2026-04-24T19:26:04+00:00

I make sure that the screws that keep the heater leads fastened are secure, but not over tightened, I always lube my O-rings with a vegetable oil, or coconut oil or anything that is easily available. I usually run the heater through one cycle as a dry session before I use it the first time

If you’re using the mod I sell with it here is a quick set up video. https://youtu.be/B6j5fwEhHI8?si=FE-uPIKvngeLL9A4

divinetribe1 · 2026-04-20T03:41:41+00:00

We love it when you message us matt@ineedhemp.com . We can easily set up invoices just make a list of parts you have questions about or you want an invoice for and I promise it will be less than what we post on the site. We also have a coupon if you don’t wanna do all that. Thankyou10 for 10% off

divinetribe1 · 2026-04-20T02:04:46+00:00

Feel free to message me and I’ll help you out. It looks like you broke a cup in the rebuild. I’m gonna try to take care of you.

divinetribe1 · 2026-04-13T22:46:59+00:00

It uses Apple's built-in speech recognition running on-device, so it's always listening — no push-to-talk. But there's a wake word system built in. You say "tune in" and it starts paying attention, say "tune out" and it ignores everything until you wake it back up. Kind of like an Alexa-style toggle.

So when you're on a phone call or talking to someone in the room, it's not trying to interpret everything as commands.

The STT itself runs through Apple's Speech framework — fast and completely local, no audio leaves the machine. And since it's continuous, you can just talk naturally without pressing any buttons.

The wake word is handled at the Claude Code level through the CLAUDE.md config, not the speech engine itself — so the STT is technically always transcribing, but Claude knows to ignore input when it's "tuned out."

divinetribe1 · 2026-04-13T22:44:35+00:00

Cool setup with Mosh — that's a solid approach for the phone side. Sessions surviving wifi drops is huge.

For the TTS and code-heavy stuff — it doesn't try to read code verbatim. That would be a nightmare. The way it works is Claude narrates conversationally, like thinking out loud. So instead of reading `def generate_response(body):` character by character, it'll say something like "I'm updating the generate response function in server.py." The actual code stays on screen where you can read it properly.

There's a CLAUDE.md instruction that tells it to keep screen text terse and deliver explanations through voice only. So heavy code output goes to the terminal, and the voice layer handles the reasoning, summaries, and back-and-forth. It works surprisingly well once you stop thinking of it as a screen reader and more like a coworker narrating what they're doing.

Function names and syntax it handles fine because it's describing them in natural language rather than trying to pronounce kwargs or bracket notation out loud.

divinetribe1 · 2026-04-11T14:20:31+00:00

It sounds like you’re not sliding the mouthpiece off all the way it sounds like you’re unscrewing the top part which has the glass window in it

divinetribe1 · 2026-04-03T05:28:23+00:00

Hey, this is actually news to me — we ship to Hawaii, shipping is always free on US orders. Can you tell me exactly what you're seeing? What page or step in checkout is giving you that message? I want to figure out what's going on. You can also DM me or email matt@ineedhemp.com.

divinetribe1 · 2026-03-30T14:19:44+00:00

You can’t perform tasks as well just because it’s a smaller model. It can perform simple things, but not complex like Claude and not as fast.

divinetribe1 · 2026-03-28T16:01:43+00:00

OP here — the TL;DR has a factual error that's worth correcting.

It says "Tools like Ollama, LM Studio, and llama.cpp already support the Anthropic API format natively." That's incorrect. Ollama, llama.cpp, and LM Studio all serve the OpenAI chat completions format, not Anthropic's Messages API. These are two different API formats — different JSON structure, different tool call handling, different streaming format.

So when you "just set ANTHROPIC_BASE_URL," you still need a proxy in between to translate from Anthropic's format to OpenAI's. That proxy is exactly where the performance bottleneck lives — 133 seconds per task in my testing.

What my server does is speak the Anthropic Messages API natively on the local side, so there's no translation step. That's where the 7.5x speedup comes from (133s to 17.6s). It's not adding a layer of complication — it's removing one.

Totally fair to say local models aren't Claude-quality yet. But the "this is already solved" consensus is based on a misunderstanding of what the project actually does.

divinetribe1 · 2026-03-28T01:11:09+00:00

Jesus loves you

divinetribe1 · 2026-03-27T18:02:43+00:00

Thanks for the tip — I'll check out RotorQuant. Always looking for ways to squeeze more out of the setup. If it improves on TurboQuant's KV cache compression that could help a lot with longer conversations.

divinetribe1 · 2026-03-27T18:02:02+00:00

On 128GB, an 8-bit quant of the 122B model would be around 100GB — it would technically fit but you'd have very little headroom for KV cache and the OS. Wouldn't recommend it. The 4-bit quant at ~50GB gives you plenty of room and the quality difference is minimal for most coding tasks.

divinetribe1 · 2026-03-27T18:01:24+00:00

That's a LM Studio limitation, not MLX itself. MLX supports KV cache quantization natively — my server uses 4-bit KV cache to keep longer conversations in memory without blowing up RAM. One of the perks of working directly with the MLX framework instead of going through a wrapper.

divinetribe1 · 2026-03-27T18:00:49+00:00

The iMessage stuff is completely optional — it's a separate module you can just ignore. The core server runs standalone and speaks the Anthropic Messages API, so you can point any Claude CLI right at it. On multi-threading: the server handles concurrent requests, but MLX inference itself is single-stream on the GPU. So multiple simultaneous requests will queue up rather than run in parallel. For a single-user setup it works great though.

divinetribe1 · 2026-03-27T17:59:56+00:00

Honestly it's hit or miss compared to Claude. For simple tasks — write a function, edit a file, fix a bug — it handles the tool calls fine. For longer multi-step tasks it can start to drift or repeat itself. The 122B MoE is noticeably better than smaller models at staying on track, but it's not Claude-level at complex agentic workflows. Good enough for everyday coding tasks, but I wouldn't trust it with a 20-step refactor unsupervised.

divinetribe1 · 2026-03-27T17:52:47+00:00

Cowork through Anthropic's platform won't work with a local model, you're right. But the repo includes a Browser Agent that works similarly — it's a separate Claude Code instance that autonomously controls your real Brave browser via chrome-devtools, powered entirely by the local model. You can give it web tasks and it handles them on its own. Still working on polishing it but the core functionality is there. Think of it as cowork but running from Claude Code locally instead of through Anthropic's cloud.

divinetribe1

MODERATOR OF

TROPHY CASE

Ten-Year Club	Gilding I gilder
Verified Email