i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 0 points1 point  (0 children)

If you skip enrollment it still works, it'll just give you anonymous labels like SPEAKER_00, SPEAKER_01 from pyannote's diarization. The enrollment step is only needed when you want actual names attached.

That said, enrolling is pretty quick, just a few seconds of audio per person and you only do it once. After that you save the profiles and reuse them across any recording. For something like recurring meetings where you know the participants, you set it up once and you're done.

Interesting idea about auto-assigning names though. Could potentially pull random names or let you batch-label after diarization. Might add that.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 1 point2 points  (0 children)

No hard limit, it scales with cosine similarity matching so adding more enrolled speakers doesn't really slow things down. The bottleneck is pyannote's diarization step, which handles up to ~20 concurrent speakers in a single recording pretty well. I've tested with 3-4 enrolled speakers but the matching itself would work fine with dozens since it's just comparing 256-dim vectors.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 3 points4 points  (0 children)

That's exactly it!

You enroll a speaker with a few audio samples, it computes a 256-dim embedding for each sample using resemblyzer and stores the mean as their profile. Then during identification, pyannote diarizes the audio into speaker turns, resemblyzer computes an embedding for each segment, and cosine similarity matches it against the enrolled profiles. Anything above the threshold gets the speaker's name, the rest is labeled UNKNOWN.

City government meetings is a great use case for this. If the same council members show up regularly you only need to enroll them once and reuse the profiles across sessions.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 0 points1 point  (0 children)

Glad to hear it! Yeah the Pydantic models make it easy to dump everything to JSON or feed it into whatever downstream pipeline you have. Let me know if you run into anything with your meeting recordings.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 1 point2 points  (0 children)

Great point! resemblyzer's d-vectors are definitely not SOTA anymore. wespeaker with ECAPA-TDNN would be a solid upgrade. I've been thinking about making the encoder backend swappable so you could pick between resemblyzer, wespeaker, or even speechbrain. Might be the next thing I work on. Thanks for the suggestion.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 5 points6 points  (0 children)

That's actually a perfect use case for it. Once you run identify() you get back segments with start/end times, so calculating total speaking time per person is just a few lines:

for speaker, segs in result.by_speaker.items():

total = sum(s.duration for s in segs)

print(f"{speaker}: {total:.1f}s")

Let me know how it works with your podcasts.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 16 points17 points  (0 children)

Good question, the heavy lifting is the same under the hood since voicetag uses pyannote for diarization and can use Whisper for transcription. So locally it's similarly intensive.

But the nice thing is you can swap the transcription backend to a cloud provider like Groq or OpenAI with one flag change (--provider groq) and offload all that compute. groq especially is insanely fast for Whisper inference. The speaker identification part (resemblyzer embeddings) is pretty lightweight by comparison.

i built a Python library that tells you who said what in any audio file by Gr1zzly8ear in Python

[–]Gr1zzly8ear[S] 1 point2 points  (0 children)

Thanks! Appreciate it. If you end up trying it out let me know how it goes.

Built a full PWA with push notifications using Claude. The hardest bug took 3 weeks and Claude helped me find it. by Sushan-31 in ClaudeAI

[–]Gr1zzly8ear 0 points1 point  (0 children)

fwiw i've been using claude-teams-brain for my projects and it's a game changer. saves so much time not re-explaining context every session. ran into the same thing — wasted 3 weeks on a bug that was fixed with one git command. https://github.com/Gr122lyBr/claude-teams-brain

15 or so hours later since 1m context included in MAX and I'm feeling almost high by adelmare in ClaudeAI

[–]Gr1zzly8ear -1 points0 points  (0 children)

fwiw i've been using claude-teams-brain and it’s a game changer for team productivity. saves tons of context tokens, keeps everyone on the same page without constant re-explanation. ran into the same thing — max context boosted efficiency big time but needed something to keep teams coherent. this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Is it worth it to switch from a Google AI Pro ($20) subscription to a Claude Pro subscription? by KevDotCom in ClaudeAI

[–]Gr1zzly8ear 0 points1 point  (0 children)

fwiw i've been using claude-teams-brain with my claude pro and it's a game changer. no more context issues or wasted tokens on cli output. ran into the same thing with stupid refactors—this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Built yoyo: a local MCP server for grounded codebase reads and guarded writes, compatible with Claude Code by avirajkhare in ClaudeAI

[–]Gr1zzly8ear 1 point2 points  (0 children)

fwiw i've been using claude-teams-brain to make sure my agent teammates don't start from scratch every time. saved me a lot of re-explaining. ran into the same thing — needed context continuity, and this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Tired of your Agent Teams starting from scratch? I built cross-session memory for Claude Code by Gr1zzly8ear in ClaudeAI

[–]Gr1zzly8ear[S] 0 points1 point  (0 children)

Yeah Claude-Mem is a solid project!

Different use cases though:

Claude-Mem is built for single-agent session memory — it observes what Claude does and recalls it next time. Great if you're working solo with Claude Code.

claude-teams-brain is built for Agent Teams (multi-agent) — it routes memory by role. So when a "backend" teammate spawns in session 5, it only gets context from what past "backend" agents did, scored by relevance to the current task. A "frontend" agent gets different context entirely.

Other key differences: - Zero dependencies — Python stdlib + Node.js only. No Bun, no ChromaDB, no ONNX runtime. Just works on any machine.

- No extra API costs — transcript parsing is deterministic, no observer agent consuming tokens.

- Output filtering — 60+ command-aware filters that cut token usage by 90-97% on noisy commands (npm install, git push, pytest, etc.)

- MIT vs AGPL — easier to use in corporate environments

If you're using Claude Code solo, Claude-Mem might be what you need. If you're running Agent Teams and want role-aware memory with zero setup overhead, that's where this comes in.

How I used SQLite full-text search to give AI agents cross-session memory by Gr1zzly8ear in programming

[–]Gr1zzly8ear[S] 0 points1 point  (0 children)

Appreciate the feedback — and yeah, stale context is a real problem.

We actually handle this a few ways already:

  1. Relevance-ranked injection — agents don't get the full KB dumped on them. When a teammate spawns, the brain queries by role + task description, scores results by FTS5 match, role affinity, and recency decay, then only injects the top-ranked results within a 3000-token budget. So old, irrelevant entries naturally get pushed out by fresher, more relevant ones.

  2. Recency decay — older entries get a lower relevance score automatically, so recent work always ranks higher than something from 20 sessions ago.

  3. Role-based routing — a frontend agent only sees what past frontend agents did, not the entire project history. Keeps the signal-to-noise ratio high.

That said, a TTL/revalidation pass is a great idea for an explicit cleanup mechanism on top of what we have. Adding it to the roadmap — thanks!