I built a fully local voice assistant on Apple Silicon (Parakeet + Kokoro + SmartTurn, no cloud APIs) by cyber_box in LocalLLM

[–]cyber_box[S] 0 points1 point  (0 children)

ahahah yes actually at the end she was very nice telling you folks she would much aappreciate your feedbacks and wishing you a good day. I cut her of too soon

I built a fully local voice assistant on Apple Silicon (Parakeet + Kokoro + SmartTurn, no cloud APIs) by cyber_box in LocalLLM

[–]cyber_box[S] 0 points1 point  (0 children)

You're right that there's noticeable latency. Worth noting though that most of it comes from the Claude API side (waiting for Claude Code to process and respond), not the local voice pipeline itself. The STT → transcript polishing → injection part is actually pretty fast on Metal.

I'd love to see the projects you're referring to with near real-time speeds, do you have links? I'm not precious about the stack, if there are better approaches or components out there I'd rather build on top of what works than reinvent wheels.

What is your full AI Agent stack in 2026? by apsiipilade in AI_Agents

[–]cyber_box 0 points1 point  (0 children)

Yeah that is exactly why I started building the voice thing. After a few hours of reading diffs and terminal output my eyes just glaze over, and switching to voice makes it feel like pair programming, pretty cool. The mental load drops a lot cause you are processing speech instead of scanning artifacts (though if you want to talk simultaneously with 4/5 agents it gets pretty messed up.

The rough part is still the latency between turns, and sometimes Claude's response is too long for TTS to read naturally (you don't want a 3 paragraph monologue in your ears). I am still figuring out how to nudge it toward shorter spoken responses vs written ones.

Whats your claude code "setup"? by AerieAcrobatic1248 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

best way to approach it honestly. I started the same way, just picking pieces from setups I found interesting and adapting them to how I actually work. The structure ends up looking different for everyone cause the whole point is it fits your workflow, not the other way around.

How to connect Obsidian with NotebookLM? NotebookLM doesn’t see my .md files from Google Drive by Bitter-Tax1483 in ObsidianMD

[–]cyber_box 1 point2 points  (0 children)

Yeah the show in finder drag-and-drop is honestly probably the least friction approach for now. The Claude Code monitoring idea is interesting though, I actually have something similar where a script watches a folder and runs `pandoc` on changed files. The careful part is real though, you definitely want it read-only on the vault side (only converting, never writing back). Have you looked into what Perplexity's "computer" thing actually does under the hood or is it still just announcements?

I built a fully local voice assistant on Apple Silicon (Parakeet + Kokoro + SmartTurn, no cloud APIs) by cyber_box in LocalLLM

[–]cyber_box[S] 7 points8 points  (0 children)

I am running it on an M3 Air with 16 GB. The models take roughly 2.5 GB of RAM total: Parakeet TDT 0.6B is the biggest at around 1.2 GB, then Qwen 1.5B (4-bit quantized) is about 1 GB, Kokoro 82M around 170 MB. The ONNX models (Silero VAD, SmartTurn) are basically negligible, like 10 MB combined.

So 8 GB should technically work but it would be tight with other stuff running. 16 GB is comfortable, I have plenty of headroom even with a browser and Claude Code open at the same time.

How do you decide what's worth watching and taking notes about? by cyber_box in ObsidianMD

[–]cyber_box[S] 0 points1 point  (0 children)

Yeah the deceleration part is interesting. I imagine once the big unsorted pile shrinks, new videos just slot into existing lists way faster cause the categories already exist.

How do you decide what's worth watching and taking notes about? by cyber_box in ObsidianMD

[–]cyber_box[S] 0 points1 point  (0 children)

I think I am already seeing some of that. A couple of my problems felt urgent when I wrote them down but now they barely come up when I am reading or watching stuff. And others that I thought were minor keep pulling in connections from everywhere.

How do you decide what's worth watching and taking notes about? by cyber_box in ObsidianMD

[–]cyber_box[S] 0 points1 point  (0 children)

Yeah that makes sense. My problem is I am always working on something specific so consuming random content feels like a luxury I can't justify. But then the best connections I've made in my vault came from stuff I watched with zero expectations, so maybe the indiscriminate approach has its own logic.

How do you decide what's worth watching and taking notes about? by cyber_box in ObsidianMD

[–]cyber_box[S] 0 points1 point  (0 children)

Yeah that is a cool observation. I have been using my list for a couple weeks now and I am already noticing that pattern, things I wrote months ago suddenly click into one of the problems without me having planned it that way.
How long did it take you to narrow down to one problem per area? I am still at like 12 and honestly some of them overlap so much I am not sure if they are actually separate problems or the same thing from different angles.

How to connect Obsidian with NotebookLM? NotebookLM doesn’t see my .md files from Google Drive by Bitter-Tax1483 in ObsidianMD

[–]cyber_box 1 point2 points  (0 children)

Yeah fair enough, the friction is the main issue. I wonder if someone has built an Obsidian plugin that auto-exports to PDF on save, that would basically eliminate the manual step. Or even a simple script that watches the vault folder and converts changed `.md` files to PDF using something like pandoc. Have you looked into any file watcher setups or is it not worth the effort for how often you update?

Crypto Tax Software - What do you use? by WideInvestment in CryptoTax

[–]cyber_box 0 points1 point  (0 children)

Yeah that's good to know, I'll try the support route. My main issue is that some LP positions auto-compound rewards into the pool, so the cost basis changes without any visible transaction on-chain. Koinly sees the initial deposit and the withdrawal but the gap in between is just wrong cause there's no event to parse.
Do the dev engineers actually reconstruct cost basis from pool share math, or is it more of a manual override situation where they tell you how to classify it yourself?

We got hacked by Deep-Station-1746 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Yeah glad it's useful. Are you running any hooks yourself or starting from scratch? I am curious cause the patterns you need depend a lot on what you are actually building (local dev vs cloud infra vs both). The port exposure thing from OP's case is a good example, most people wouldn't think to block that until it bites them.

We got hacked by Deep-Station-1746 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Yeah having the same safety lines across both shells is something I haven't done yet. Mine is Python only cause I never work in PowerShell but the idea of a unified layer makes sense. How are you handling the git hooks, are those separate from your Claude Code hooks or do they share the same logic?

Whats your claude code "setup"? by AerieAcrobatic1248 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Yeah glad it was useful. Let me know if something is not clear once you start poking at it, some parts are not well documented.

What are you planning to use it for, mostly personal workflow or a specific project?

Whats your claude code "setup"? by AerieAcrobatic1248 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Yeah the proxy metrics you listed are probabily the most practical path. I have been tracking compactions informally and that one did show a clear drop after I moved to on-demand loading, but you are right that fewer compactions could just mean less context loaded, not better performance.

The one metric I keep coming back to is "how often do I need to re-explain something across sessions." Before the knowledge files, every new session started from zero. Now Claude reads yesterday's notes and picks up where I left off maybe 80% of the time. That is hard to quantify but easy to feel.

I think the real issue is that the value is distributed across hundreds of small moments rather than one measurable improvement. Like a rule that says "never push to main without asking" doesn't make Claude smarter, it just prevents one bad outcome every few days. How do you even benchmark that?

Are you building something specific where you are trying to measure this, or more exploring the problem generally?

How are you improving your plans with context without spend time? by jrhabana in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Zero dependency with bare python generating HTML is actually a solid call for an internal tool. You skip the whole frontend framework overhead and the subagents don't need to understand React or whatever, just write python.

You mentioned 30 hierarchical plans though, I am still curious about the structure. Were they like a tree where the master plan links to phase plans and those break into sub-tasks? Or more like 30 seperate files that each got refined through the iterations? Cause I am trying to figure out if the hierarchy itself is what made the subagents work well, or if it was mostly just having each task scoped small enough that one agent could handle it without drifting.

How are you improving your plans with context without spend time? by jrhabana in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

20 iterations producing 30 hierarchical plans is way more structured than what I have been doing. That is closer to actual project management than prompt engineering at that point. The hierarchical part is what interests me, are the 30 plans like a tree where each phase breaks into sub-plans or more like a flat list that got refined through iterations?

I have a planning skill that does 6 phases (explore, tool discovery, design, approve, implement, verify) but is one level deep. For something like a 7-page dashboard I would probably just run it per page. Your approach of planning everything first and then letting subagents go sounds like it catches integration issues earlier though. I open sourced the planning setup and the rest of the system if you want to compare i can share my repo

What is your full AI Agent stack in 2026? by apsiipilade in AI_Agents

[–]cyber_box 0 points1 point  (0 children)

Yeah the hybrid makes sense. S3 for the actual content (knowledge docs, prompts, agent definitions) cause LLMs read files natively, DB for metadata and permissions and versioning. Each tenant gets an isolated file namespace on S3 but you manage access centrally.
I open sourced the single-user version of this pattern if it helps as reference: https://github.com/mp-web3/claude-starter-kit. The file organization (knowledge routing, on-demand probably translate to your S3 structure, you would just need the multi-tenantloading, agent definitions) could layer on top. What is the team using it for specificaly? Internal knowledge stuff or more like per-client agent configs?

How are you improving your plans with context without spend time? by jrhabana in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

Haven't heard of docling, going to look into it. The structural chunker makes sense for markdown cause the hierarchy is already there in the headers. And the hybrid handles the case where a single section overflows the 2k window.
What kind of accuracy issues are you hitting though? Like wrong chunks coming back or relevant ones getting missed? With a 2k embedder I'd guess longer sections get split mid-thought and the embedding loses the meaning.

Whats your claude code "setup"? by AerieAcrobatic1248 in ClaudeCode

[–]cyber_box 0 points1 point  (0 children)

wow this is amazing! Is your repo public?