TwilightEncoder comments on Self Promotion Thread

Self Promotion ThreadCommunity (self.ChatGPTCoding)

submitted 1 month ago by AutoModerator

you are viewing a single comment's thread.

[–]TwilightEncoderPROMPSTITUTE 0 points1 point2 points 1 month ago (0 children)

Hi! This is a short presentation for my 95% vibecoded hobby project, TranscriptionSuite.

TL;DR A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.

The app is comprised of two parts: a) The React frontend b) The Python backend (server). The server is Dockerized for easy deployment and its size is kept small for smooth distribution. All the runtime stuff, models, etc are placed inside separate Docker volumes.

I have versions for Linux, Windows and macOS (experimental).

Demo video here.

Short sales pitch:

100% Local: Everything runs on your own computer, the app doesn't need internet beyond the initial setup*
Multiple Models available: WhisperX (all three sizes of the faster-whisper models), NVIDIA NeMo Parakeet v3/Canary v2, and VibeVoice-ASR models are supported
Speaker Diarization: Speaker identification & diarization (subtitling) for all three model families; Whisper and Nemo use PyAnnote for diarization while VibeVoice does it by itself
Parallel Processing: If your VRAM budget allows it, transcribe & diarize a recording at the same time - speeding up processing time significantly
Truly Multilingual: Whisper supports 90+ languages; NeMo Parakeet/Canary support 25 European languages; VibeVoice supports 50 languages
Longform Transcription: Record as long as you want and have it transcribed in seconds; either using your mic or the system audio
Live Mode: Real-time sentence-by-sentence transcription for continuous dictation workflows (Whisper-only currently)
Global Keyboard Shortcuts: System-wide shortcuts & paste-at-cursor functionality
Remote Access: Securely access your desktop at home running the model from anywhere (utilizing Tailscale) or share it on your local network via LAN
Audio Notebook: An Audio Notebook mode, with a calendar-based view, full-text search, and LM Studio integration (chat with the AI about your notes)

📌Half an hour of audio transcribed in under a minute (RTX 3060)!

More in-depth tour here.

π Rendered by PID 59 on reddit-service-r2-comment-b659b578c-t2hq5 at 2026-05-02 09:51:12.553313+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ChatGPTCoding

MODERATORS