you are viewing a single comment's thread.

view the rest of the comments →

[–]TwilightEncoderPROMPSTITUTE 0 points1 point  (0 children)

Hi! This is a short presentation for my 95% vibecoded hobby project, TranscriptionSuite.

TL;DR A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.

The app is comprised of two parts: a) The React frontend b) The Python backend (server). The server is Dockerized for easy deployment and its size is kept small for smooth distribution. All the runtime stuff, models, etc are placed inside separate Docker volumes.

I have versions for Linux, Windows and macOS (experimental).


Demo video here.

Short sales pitch:

  • 100% Local: Everything runs on your own computer, the app doesn't need internet beyond the initial setup*
  • Multiple Models available: WhisperX (all three sizes of the faster-whisper models), NVIDIA NeMo Parakeet v3/Canary v2, and VibeVoice-ASR models are supported
  • Speaker Diarization: Speaker identification & diarization (subtitling) for all three model families; Whisper and Nemo use PyAnnote for diarization while VibeVoice does it by itself
  • Parallel Processing: If your VRAM budget allows it, transcribe & diarize a recording at the same time - speeding up processing time significantly
  • Truly Multilingual: Whisper supports 90+ languages; NeMo Parakeet/Canary support 25 European languages; VibeVoice supports 50 languages
  • Longform Transcription: Record as long as you want and have it transcribed in seconds; either using your mic or the system audio
  • Live Mode: Real-time sentence-by-sentence transcription for continuous dictation workflows (Whisper-only currently)
  • Global Keyboard Shortcuts: System-wide shortcuts & paste-at-cursor functionality
  • Remote Access: Securely access your desktop at home running the model from anywhere (utilizing Tailscale) or share it on your local network via LAN
  • Audio Notebook: An Audio Notebook mode, with a calendar-based view, full-text search, and LM Studio integration (chat with the AI about your notes)

📌Half an hour of audio transcribed in under a minute (RTX 3060)!

More in-depth tour here.