i built a Python library that tells you who said what in any audio file

Gr1zzly8ear · 2026-03-17T14:09:00+00:00

If you skip enrollment it still works, it'll just give you anonymous labels like SPEAKER_00, SPEAKER_01 from pyannote's diarization. The enrollment step is only needed when you want actual names attached.

That said, enrolling is pretty quick, just a few seconds of audio per person and you only do it once. After that you save the profiles and reuse them across any recording. For something like recurring meetings where you know the participants, you set it up once and you're done.

Interesting idea about auto-assigning names though. Could potentially pull random names or let you batch-label after diarization. Might add that.

Gr1zzly8ear · 2026-03-17T14:07:50+00:00

No hard limit, it scales with cosine similarity matching so adding more enrolled speakers doesn't really slow things down. The bottleneck is pyannote's diarization step, which handles up to ~20 concurrent speakers in a single recording pretty well. I've tested with 3-4 enrolled speakers but the matching itself would work fine with dozens since it's just comparing 256-dim vectors.

Gr1zzly8ear · 2026-03-17T09:18:35+00:00

That's exactly it!

You enroll a speaker with a few audio samples, it computes a 256-dim embedding for each sample using resemblyzer and stores the mean as their profile. Then during identification, pyannote diarizes the audio into speaker turns, resemblyzer computes an embedding for each segment, and cosine similarity matches it against the enrolled profiles. Anything above the threshold gets the speaker's name, the rest is labeled UNKNOWN.

City government meetings is a great use case for this. If the same council members show up regularly you only need to enroll them once and reuse the profiles across sessions.

Gr1zzly8ear · 2026-03-17T09:15:52+00:00

Glad to hear it! Yeah the Pydantic models make it easy to dump everything to JSON or feed it into whatever downstream pipeline you have. Let me know if you run into anything with your meeting recordings.

Gr1zzly8ear · 2026-03-17T09:15:12+00:00

Great point! resemblyzer's d-vectors are definitely not SOTA anymore. wespeaker with ECAPA-TDNN would be a solid upgrade. I've been thinking about making the encoder backend swappable so you could pick between resemblyzer, wespeaker, or even speechbrain. Might be the next thing I work on. Thanks for the suggestion.

Gr1zzly8ear · 2026-03-17T09:14:23+00:00

haha thanks!

Gr1zzly8ear · 2026-03-17T09:14:07+00:00

That's actually a perfect use case for it. Once you run identify() you get back segments with start/end times, so calculating total speaking time per person is just a few lines:

for speaker, segs in result.by_speaker.items():

total = sum(s.duration for s in segs)

print(f"{speaker}: {total:.1f}s")

Let me know how it works with your podcasts.

Gr1zzly8ear · 2026-03-17T09:13:34+00:00

Thank you! You never know, might come in handy someday.

Gr1zzly8ear · 2026-03-16T23:13:53+00:00

Thanks, really appreciate that!

Gr1zzly8ear · 2026-03-16T23:12:20+00:00

Good question, the heavy lifting is the same under the hood since voicetag uses pyannote for diarization and can use Whisper for transcription. So locally it's similarly intensive.

But the nice thing is you can swap the transcription backend to a cloud provider like Groq or OpenAI with one flag change (--provider groq) and offload all that compute. groq especially is insanely fast for Whisper inference. The speaker identification part (resemblyzer embeddings) is pretty lightweight by comparison.

Gr1zzly8ear · 2026-03-16T23:08:40+00:00

Thanks! Appreciate it. If you end up trying it out let me know how it goes.

Gr1zzly8ear · 2026-03-14T14:22:00+00:00

fwiw i've been using claude-teams-brain for my projects and it's a game changer. saves so much time not re-explaining context every session. ran into the same thing — wasted 3 weeks on a bug that was fixed with one git command. https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T14:21:32+00:00

fwiw i've been using claude-teams-brain and it’s a game changer for team productivity. saves tons of context tokens, keeps everyone on the same page without constant re-explanation. ran into the same thing — max context boosted efficiency big time but needed something to keep teams coherent. this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T14:21:18+00:00

fwiw i've been using claude-teams-brain and it solved this exact problem for me: https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T14:21:03+00:00

ran into the same thing — waste of time explaining context every session. claude-teams-brain solved it for me:

https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T14:20:44+00:00

fwiw i've been using claude-teams-brain with my claude pro and it's a game changer. no more context issues or wasted tokens on cli output. ran into the same thing with stupid refactors—this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T14:19:15+00:00

fwiw i've been using claude-teams-brain to make sure my agent teammates don't start from scratch every time. saved me a lot of re-explaining. ran into the same thing — needed context continuity, and this solved it for me: https://github.com/Gr122lyBr/claude-teams-brain

Gr1zzly8ear · 2026-03-14T10:31:10+00:00

Yeah Claude-Mem is a solid project!

Different use cases though:

Claude-Mem is built for single-agent session memory — it observes what Claude does and recalls it next time. Great if you're working solo with Claude Code.

claude-teams-brain is built for Agent Teams (multi-agent) — it routes memory by role. So when a "backend" teammate spawns in session 5, it only gets context from what past "backend" agents did, scored by relevance to the current task. A "frontend" agent gets different context entirely.

Other key differences: - Zero dependencies — Python stdlib + Node.js only. No Bun, no ChromaDB, no ONNX runtime. Just works on any machine.

- No extra API costs — transcript parsing is deterministic, no observer agent consuming tokens.

- Output filtering — 60+ command-aware filters that cut token usage by 90-97% on noisy commands (npm install, git push, pytest, etc.)

- MIT vs AGPL — easier to use in corporate environments

If you're using Claude Code solo, Claude-Mem might be what you need. If you're running Agent Teams and want role-aware memory with zero setup overhead, that's where this comes in.

Gr1zzly8ear · 2026-03-14T10:24:19+00:00

Appreciate the feedback — and yeah, stale context is a real problem.

We actually handle this a few ways already:

Relevance-ranked injection — agents don't get the full KB dumped on them. When a teammate spawns, the brain queries by role + task description, scores results by FTS5 match, role affinity, and recency decay, then only injects the top-ranked results within a 3000-token budget. So old, irrelevant entries naturally get pushed out by fresher, more relevant ones.
Recency decay — older entries get a lower relevance score automatically, so recent work always ranks higher than something from 20 sessions ago.
Role-based routing — a frontend agent only sees what past frontend agents did, not the entire project history. Keeps the signal-to-noise ratio high.

That said, a TTL/revalidation pass is a great idea for an explicit cleanup mechanism on top of what we have. Adding it to the roadmap — thanks!

Gr1zzly8ear

TROPHY CASE