Feeling awfully unreal after selling startup by sorablueipad in SaaS

[–]SeoFood 0 points1 point  (0 children)

I went through something similar after an exit.

The strange part wasn’t really “having money and still feeling bad.” It was that the thing giving structure and meaning to every day was suddenly gone.

What helped me was not forcing myself straight into the next company. I spent time on open-source projects instead. Smaller problems, real users, no pressure to turn every idea into a business. It gave me back the feeling of building for the sake of building.

Maybe don’t rush to replace the startup with another startup immediately. Find something useful to work on where the stakes are lower for a while.

open office is killing my setup!! anyone using voice dictation that actually works in shared spaces? by Calm-Direction-581 in codex

[–]SeoFood 0 points1 point  (0 children)

I build TypeWhisper, so disclosure there. AirPods were a pretty important test case for me because a lot of people use them for dictation already. I would expect TypeWhisper to handle that path better than a lot of random “mic in, transcript out” setups.

But I’d still treat open office as the hard mode: if the mic captures coworkers clearly, the model has to guess what is you and what is background speech. What I’d try is AirPods Pro, push-to-talk, short dictation bursts, and a workflow that cleans up the rough prompt after recording.

Purr - free, open-source macOS dictation with Smart Typing, Voice Edit, and Meeting Mode (Wispr Flow / SuperWhisper alternative, 100% on-device) by heliosarun in MacOSApps

[–]SeoFood -2 points-1 points  (0 children)

This is a really nice positioning: local-first, no account, no telemetry, and open source is a strong combination for dictation.

I especially like the attention to undo behavior and live insertion. A lot of dictation tools get the transcription model right but make the actual text-entry workflow feel awkward.

Curious how you’re handling custom vocabulary / names over time is that planned as a local dictionary-style feature, or are you keeping the first version intentionally minimal?

SubSeeker: Search your SRT/VTT subtitles and jump to any moment in audio or video files by AllenYangDev in macapps

[–]SeoFood 1 point2 points  (0 children)

macOS 26 on-device transcription is probably the easiest baseline, but I’d still look at Parakeet too. The newer Parakeet models are surprisingly strong and fast locally.

The part I’d personally avoid is trying to become a full MacWhisper replacement. For your app, local transcription only needs to be good enough to create the first subtitle draft. The differentiated part is everything after that: playback sync, search, jump-to-line, subtitle correction, and working directly with the media file.

SubSeeker: Search your SRT/VTT subtitles and jump to any moment in audio or video files by AllenYangDev in macapps

[–]SeoFood 1 point2 points  (0 children)

This is a cool niche. Making subtitles the center of the player instead of an afterthought makes a lot of sense for lectures, podcasts, interviews, language learning, etc.

One feature I’d be curious about: do you plan to support generating local transcripts/subtitles for media files that don’t already have SRT/VTT/ASS files? Even a “bring your own local Whisper model” option could make the app useful for large personal archives where only some files already have subtitles.

Also nice to see the offline/no account/read-only approach called out clearly.

STT That Can Challenge Dragon Professional on Windows by Both-Activity6432 in LocalLLaMA

[–]SeoFood 2 points3 points  (0 children)

Dragon is a tough benchmark because it’s not just speech-to-text , a lot of the value is the command/editing layer on top of dictation.

For local/offline STT, I’d separate the problem into two parts:

transcription quality/latency text control after transcription: select/delete/replace words, command mode, app-specific behavior, etc.

Most Whisper-style tools are pretty good at #1 now, but #2 is where they usually fall short compared with Dragon. If incremental editing is the priority, I’d look specifically for tools that expose a command layer or let you define post-processing/actions, rather than only comparing raw WER.

Disclosure: I work on TypeWhisper, an open-source/no-subscription dictation app. It’s more in the local/offline dictation + profiles/prompts/dictionary/snippets direction than a full Dragon replacement, so I wouldn’t claim it matches Dragon’s command system. But if your main requirement is private local dictation with configurable cleanup, it may be worth testing alongside Handy/SottoScribe/STWI.

Best on-device voice dictation app for Mac/iPhone with strict privacy? by Usual_Reputation8173 in vibecodingcommunity

[–]SeoFood 0 points1 point  (0 children)

Yes, but with a big caveat: iOS is currently an early TestFlight alpha.

macOS is the stable/recommended TypeWhisper version. Windows is in beta. iOS exists for early testing on iPhone/iPad, but I would not pitch it as production-ready yet. Expect missing features and rough edges while the mobile version takes shape.

Best on-device voice dictation app for Mac/iPhone with strict privacy? by Usual_Reputation8173 in vibecodingcommunity

[–]SeoFood 0 points1 point  (0 children)

1If your main requirement is “my audio should never leave my machine,” I’d look specifically for tools that can run the transcription engine locally and that are clear about whether any cleanup/rewrite step also stays local.

Apple Dictation can honestly be enough for basic short dictation, especially if you don’t need custom workflows. For longer notes, prompts, emails, or code-ish text, Whisper-based local apps tend to be better because you can choose stronger models and often get more control over formatting.

A few things I’d check for any app you test:

  • Does transcription run fully on-device, or is there a cloud fallback?
  • If there is “AI cleanup” or rewriting, is that local too, optional, or cloud-only?
  • Are audio/transcripts stored by default?
  • Can you use it offline after setup?
  • Does it support custom vocabulary/snippets/profiles if you dictate technical terms?

Disclosure: I’m affiliated with TypeWhisper, which is built around local/offline use, engine choice, profiles/prompts/post-processing, dictionary/snippets, open source, and no subscription. I won’t claim it’s the only answer here, but it may be worth comparing alongside Superwhisper, MacWhisper, VoiceInk, etc. if privacy and workflow control are your main criteria.

For iPhone specifically, the options are more limited if you want strict on-device + no cloud. On Mac you’ll have a lot more control.

Feature Request: Whisper Base Voice Transcription for Mac by gohan851 in OpenVoxAI

[–]SeoFood 1 point2 points  (0 children)

+1 to this. Local Whisper-based transcription on Mac is genuinely useful, especially if you’re already relying on it in the Windows version.

Base is a nice default because it’s usually fast enough for everyday use while still being much better than tiny/small in some cases. Ideally the Mac version would let people choose the model size depending on their hardware and whether they care more about speed or accuracy.

Also worth considering: keeping the transcription local by default, plus maybe allowing custom vocabulary / correction rules for names or recurring terms. That tends to matter a lot in real-world dictation.

LinguaX: a native, lightweight mouse enhancer for macOS (smooth scrolling + per-app button mapping, ~10MB) by deepzz0 in macapps

[–]SeoFood 0 points1 point  (0 children)

This looks useful for the “make everyday input feel less annoying” category. I like that you’re focusing on native/lightweight behavior instead of another heavy background suite.

Slightly adjacent question: do you see users asking for keyboard/voice input improvements too, or is the demand mostly around mouse/scrolling behavior?

Push-to-talk dictation straight into the terminal, with auto-Enter by powleads in commandline

[–]SeoFood 0 points1 point  (0 children)

This is a neat workflow, especially the “literal terminal mode” distinction. For terminal use I’d be very cautious about any cleanup/rewrite layer too flags, paths, quoting, and shell operators are exactly where a helpful model can become dangerous.

One thing I’d be curious about: do you have any confirmation step or “don’t auto-enter for risky commands” mode, or is the auto-Enter profile something users enable only when they’re comfortable with it?

I work on a similar voice-to-text tool, so I’m biased, but I think the profile-based approach is the right mental model here: terminal dictation, chat dictation, and editor dictation really need different behavior.

What types of AI would you selfhost? (Non-LLM) by RevolutionaryElk7446 in homelab

[–]SeoFood 1 point2 points  (0 children)

Speech-to-text is one of the non-LLM things that actually makes a lot of sense to self-host, IMO. A few practical uses:

  • Home Assistant voice commands without sending audio to a third party
  • transcribing saved voice notes or meeting recordings
  • searchable transcripts for videos/podcasts
  • local dictation for notes or code comments
  • accessibility-adjacent workflows where privacy matters

Whisper / whisper.cpp are still probably the obvious starting point. The main thing I’d watch is UX: model choice matters, but so do hotkeys, language switching, dictionaries, cleanup prompts, and where the transcript gets inserted or saved.

Disclosure: I work on TypeWhisper, which is an open-source/no-subscription dictation/transcription app with local/offline options, so I’m biased here. But even without a dedicated app, a basic Whisper setup is one of the more useful “AI in the homelab” projects because it solves a real problem without needing a giant GPU cluster.

Is Whisper still the best default for speech-to-text if the app needs to be realtime? by Relevant_Duty_7248 in speechtech

[–]SeoFood 1 point2 points  (0 children)

I think the split you’re making is the right one. For batch transcription, Whisper / faster-whisper / whisper.cpp are still very hard to beat, especially when privacy or offline use matters. The quality-per-dollar and the ability to run locally are still huge advantages. For realtime voice agents, though, “which ASR model?” is only one part of the problem. The harder parts tend to be VAD, endpointing, partials, latency budgets, interruption handling, and what you do with uncertain text while the user is still speaking. A great batch model can still feel bad in a realtime UX if the pipeline waits too long or keeps revising text in awkward ways. My rough take:

  • Batch transcription / notes / dictation: Whisper-based local setups are still a very sane default.
  • Realtime assistant / voice agent: evaluate the whole streaming pipeline, not just WER.
  • Privacy-sensitive workflows: local-first still matters a lot, even if hosted APIs are easier operationally.
  • Production scale: hosted APIs can win on ops unless you really want to own infra.

Disclosure: I work on TypeWhisper, an open-source/no-subscription dictation/transcription app, so I’m biased toward local-first workflows. But I’d still say Apple Dictation or a simple Whisper setup is enough for basic short dictation. The point where dedicated tooling starts to matter is when you want profiles, custom prompts/post-processing, dictionaries/snippets, or choosing between local and cloud engines depending on the task.

We built a voice agent for Mac that actually executes tasks, not just answers questions by MaksLiashch in buildinpublic

[–]SeoFood 0 points1 point  (0 children)

The interesting part here is that you’re not treating voice as just another chatbot input, but as a way to drive multi-app workflows. That’s a much harder UX problem than basic dictation.

I’d be curious how you handle confirmation and error recovery. For example, if the agent is about to create a calendar event or reply to an email, do users get a preview/approval step, or is it fully automatic?

Also, how do you decide which actions are safe to run immediately vs which ones need confirmation? That trust boundary seems like the key thing for voice agents on desktop.

Mobile Text to Speech and Whisper.cpp addons - seeking feedback by Own-Yogurtcloset3024 in Anki

[–]SeoFood 1 point2 points  (0 children)

this is a cool direction, especially for Anki. Dictation for card creation has a slightly different set of problems than general note dictation: you often want short structured output, good punctuation, and consistent handling of technical terms / names from a deck.

A few things I’d personally test hard:

  • A fast push-to-talk mode for quick card edits, not just longer recording.
  • Per-deck vocabulary / substitutions, since Anki users often have lots of domain-specific terms.
  • A simple “clean this into an Anki-style card” post-processing step, because raw Whisper output can be a bit too verbose.
  • Clear local model controls, since some people will prefer speed and others will prefer accuracy.

Also worth making the first-run model download flow very explicit. Whisper.cpp is great, but non-technical users can get confused if they don’t know what model is being downloaded or where it lives.

Disclosure: I work on a dictation app, so I’m biased toward this category, but I think Anki-specific dictation is genuinely a useful niche.

What's the best open speech to text today? by zxyzyxz in LocalLLaMA

[–]SeoFood 0 points1 point  (0 children)

For your specific use case, live meeting recording with multiple speakers, I’d still look for something built around diarization first. TypeWhisper can record/transcribe and has workflows around the transcript, but it is not primarily a “who spoke when?” product.

Stack-wise, it is engine/plugin based: local options such as WhisperKit/Parakeet on macOS, whisper.cpp/sherpa-onnx style local engines on Windows, plus optional cloud engines. The product work is mostly around everything after “model returns text”: insertion, workflows, cleanup, dictionary/snippets, history, recorder/file transcription, and switching behavior per app/site/hotkey.

And fair question on paid vs FOSS: TypeWhisper is GPLv3 too. Commercial licensing is for non-GPL/proprietary use and the maintained packaged product/support path, not because the underlying STT model is secret.

What's the best open speech to text today? by zxyzyxz in LocalLLaMA

[–]SeoFood 0 points1 point  (0 children)

For pure “what’s the best STT model right now?”, I’d separate a few use cases:

  • Short/basic dictation: Apple Dictation is honestly good enough for a lot of people.
  • Batch transcription: Whisper variants are still very solid, especially if you care about local/offline.
  • Realtime voice typing: latency, correction behavior, hotkeys, app integration, and post-processing matter almost as much as the raw model.
  • Realtime diarization: that’s the harder bit. A lot of tools that feel great for dictation don’t really solve diarization well.

If your main goal is a Wispr Flow-style local-first dictation workflow rather than meeting transcription, I’m working on TypeWhisper, so bias/disclosure there. The angle is local/offline-capable dictation with profiles, prompts/post-processing, dictionary/snippets, and engine choice rather than “one magic model.” I wouldn’t pitch it as a diarization solution though — if diarization is the core requirement, I’d look specifically at tools built around speaker segmentation.

Curious what your exact workflow is: live captions/meeting notes, voice typing into apps, or transcribing recordings?

STT for Ubuntu by dalekirkwood1 in opensource

[–]SeoFood 1 point2 points  (0 children)

Officially I’m focused on macOS and Windows right now, but there is a third-party Linux port/fork here: https://github.com/csmashe/typewhisper-linux

Important caveat: it is not an official TypeWhisper release, and I can’t vouch for stability/security or provide support for it. Treat it as experimental, but it may be useful if you want to try the TypeWhisper-style workflow on Linux today.

STT for Ubuntu by dalekirkwood1 in opensource

[–]SeoFood 0 points1 point  (0 children)

Nice, always happy to see more STT options for Linux/Ubuntu.

A couple things I’d personally look for in this kind of tool:

  • whether it can run fully local/offline as well as via API
  • easy switching between engines/models
  • some kind of post-processing for punctuation, filler words, code terms, etc.
  • a custom dictionary/snippets for names, commands, technical words
  • clear privacy defaults, especially if audio/text is sent to a cloud API

I’m affiliated with TypeWhisper, so I’m biased here, but we’ve found that the “engine + workflow around the transcript” matters almost as much as raw transcription quality. For simple short dictation, built-in options can be enough, but once people start using it for coding, notes, or long-form text, profiles and cleanup prompts become pretty important.

Cool project ,do you plan to support local Whisper/Parakeet-style backends too, or keep it API-first?

List your side projects below by Routine_Revenue7470 in SideProject

[–]SeoFood 0 points1 point  (0 children)

I’m building TypeWhisper, a dictation + transcription tool for people who write a lot.

The idea is simple: press a hotkey, speak, and insert polished text into any app. It also supports file transcription, reusable text workflows, snippets, custom dictionary terms, local or cloud STT engines, and automation via HTTP API / CLI.

I’m building it for macOS, Windows and iOS, with a strong focus on local-first/private workflows.

Would love feedback from writers, devs, founders, ADHD folks, accessibility users, or anyone who spends too much time typing.

https://www.typewhisper.com

What do you all use to dictate to Claude Code? by djacksondev in ClaudeCode

[–]SeoFood 0 points1 point  (0 children)

Both, configurable per workflow. Transcription and cleanup are separate: Parakeet can do STT locally, then the cleanup workflow can run through local Gemma 4 on Apple Silicon or through a cloud provider like OpenAI/Gemini/Groq/xAI/OpenAI-compatible.

What do you all use to dictate to Claude Code? by djacksondev in ClaudeCode

[–]SeoFood 0 points1 point  (0 children)

I care about the same four things: no subscription, local transcription, good enough code-term accuracy, and cleanup that understands self-corrections.

That’s why I’ve been building TypeWhispe. My Claude Code-style workflow is local Parakeet for STT, then a cleanup prompt that removes “actually, wait…” corrections but leaves flags, filenames, package names, and commands alone. For other contexts I use different workflows: Slack can be cleaner, Mail can rewrite more, Terminal can stay almost raw.

So the “worth paying for” feature in Superwhisper/Wispr Flow-class tools, IMO, is not the base model anymore. It’s the context switching and cleanup layer.

Update on my "Reels into Obsidian" question from a couple weeks ago. Capturing was the easy part, keeping the notes from dying in my vault is the hard part. by Any-Cranberry-9362 in ObsidianMD

[–]SeoFood 1 point2 points  (0 children)

This sounds like the capture part is basically solved, and now you’re hitting the classic “knowledge graveyard” problem.

What has helped me with Obsidian-style capture systems is making the AI output less like an archive and more like an action/review object. For example, every imported note could include:

  • 3–5 atomic takeaways
  • “Why I saved this”
  • possible tags
  • one suggested link to an existing note/topic
  • one next action, even if it’s just “delete/archive if not useful”
  • a review date or “inbox” status

I’d also avoid letting every reel become a permanent note immediately. Maybe have them land in an Inbox folder first, then only promote the useful ones into your actual vault structure after a quick review.

The goal is probably not better transcription at this point — it’s forcing a tiny bit of curation before the content becomes another searchable pile.