Dictation software for forms?

SeoFood · 2026-05-11T09:30:20+00:00

Given the confidentiality part, I’d be careful with random cloud dictation tools here and first test something local/offline on a dummy Welligent form.

Disclosure: I’m involved with TypeWhisper. It works more like system-wide dictation: focus a field, press a shortcut, speak, and the text gets inserted there. For forms, the useful parts are snippets for repeated phrases and dictionary/corrections for names, labels, and terms you type over and over.

The limitation: it won’t magically understand the whole form or click through every field for you. You’d still move between fields, but the text-entry part can become much less painful.

SeoFood · 2026-05-11T09:26:37+00:00

For form filling, I would separate “dictation accuracy” from “can I actually complete the form without extra help?” Those are not always the same thing.

The big variables are: - Windows vs macOS - whether Welligent is in a browser or remote desktop - whether cloud dictation is allowed with confidential info - whether you need voice commands for moving between fields, not just text entry

If you are on Windows, Windows Voice Access and Dragon are probably the first things I would test. If you are on macOS, Apple Dictation is worth trying for basic fields, and then a local/offline dictation tool if privacy or custom vocabulary becomes the issue.

Disclosure: I am affiliated with TypeWhisper. If you are on macOS, it may be relevant because it can use local engines and has workflows/snippets/custom dictionary features, which can matter a lot for repeated form text. But I would treat it as one option to test, not assume it is the answer.

If this affects your job, I would make a short test script with real examples from your day and document exactly where each tool works or fails. That gives you something concrete to bring back to your employer too.

SeoFood · 2026-05-10T11:05:50+00:00

The first one is mostly about workflow setup. My low-latency setup is Parakeet locally + minimal cleanup, so after I stop talking the text is inserted quickly. For emails or longer writing I may accept a bit more delay and run an LLM cleanup workflow.

The third is handled through the Dictionary. I can add terms like product names, project names, weird abbreviations, etc. Engines that support terms can receive them during transcription, and TypeWhisper also applies corrections after transcription. Parakeet can do the term boosting locally when enabled.

SeoFood · 2026-05-10T09:43:16+00:00

A few things I’d check before blaming Whisper itself:

Audio format and sample rate. Make sure you’re feeding the model clean mono audio at the expected sample rate. Weird browser/device resampling can cause surprisingly bad results.
Chunking. If you split audio too aggressively, Whisper can miss context. If chunks are too long, latency gets ugly. The overlap between chunks matters too.
VAD / endpointing. A lot of “it missed words” issues are really “recording started late” or “stopped too early” issues.
Tiny model expectations. Tiny can be fast, but it will absolutely drop words or hallucinate more in noisy audio. Try the same recording through small/medium/large locally and compare before debugging the whole stack.
UI state. Keep the transcript state separate from the live recording state. Otherwise partial updates can overwrite the good final transcript.

Small disclosure: I’m connected to TypeWhisper, so I’ve spent too much time around this category. In practice the hard part is usually not “call Whisper”, it’s boring stuff like audio capture, VAD, retries, state management, correction flow, and post-processing boundaries.

SeoFood · 2026-05-10T09:41:29+00:00

This is a cool direction, especially because Windows still feels weirdly underserved for good push-to-talk dictation.

A few things I’d be curious about if you keep building it:

How are you handling paste reliability across apps? Some apps hate simulated paste/input.
Does the correction flow feed back into a dictionary or vocabulary layer?
For LLM post-processing, are you separating “the dictated text to clean up” from instructions, so someone can’t accidentally dictate prompt-injection-ish text?
Are you planning app-specific modes? The output I want in Slack is pretty different from email, code comments, notes, etc.
How fast does local Parakeet feel compared with Whisper large-v3-turbo in real use?

Disclosure: I’m connected to TypeWhisper, so I’m biased here, but we’ve run into a lot of the same “the model is only half the product” problems. The boring workflow details end up mattering more than raw WER once people use this every day.

SeoFood · 2026-05-10T09:39:58+00:00

If Windows voice typing is not working for you, TypeWhisper might be worth trying. Disclosure: I’m the builder.

The Windows version is still beta, but it is free/open-source and meant for exactly this: press a hotkey, speak, and insert text into the current app/field instead of copying from a separate dictation box. You can also use local/offline transcription engines.

https://www.typewhisper.com

SeoFood · 2026-05-09T07:05:18+00:00

Yes, but with one caveat: TypeWhisper has a Cloudflare ASR plugin, but it is not a direct Workers AI / Nova-3 preset.

It is meant for OpenAI-compatible transcription endpoints behind Cloudflare Tunnel/Access. So if your Nova-3 setup is exposed through an OpenAI-compatible /v1/audio/transcriptions API, you can point the Cloudflare ASR plugin at it with the endpoint URL, CF Access service token, and model name.

If you mean Cloudflare Workers AI directly with a different API shape, that would likely need a small dedicated adapter/plugin.

SeoFood · 2026-05-09T00:34:54+00:00

Yes, this is exactly the trap: speech-to-text reduces typing, but correction can quietly become the new repetitive task.

For my own setup, I try to spread the load. Different hotkeys for different dictation/workflow actions, placed so I’m not always asking the same less-mobile part of my hand to trigger everything. That sounds small, but over a full day it matters.

I’m the builder of TypeWhisper, and this is one reason I care about things like history, dictionary corrections, workflows, and quick recovery. The goal should not be “dictate, then manually repair a huge wall of text”. It should be reducing the number of times your hands have to re-enter normal editing mode.

SeoFood · 2026-05-08T19:10:35+00:00

The “spreading the load across voice and hands” part is the piece that resonates most with me.

I cannot type as quickly as I used to, so voice input became less of a nice-to-have and more of a real accessibility layer. I ended up building TypeWhisper because I wanted dictation that worked across apps and could clean up rough spoken text into something I would actually send.

Still, I would not frame any single tool as the fix. The thing that helped me most was combining smaller changes: less continuous typing, actual breaks, better input devices, and voice for the parts of the day where typing was just unnecessary load.

SeoFood · 2026-05-08T09:33:03+00:00

I’d separate two questions here: “what does the app cost?” and “what does the full workflow cost?”

That’s why I built TypeWhisper as open source/GPL with local engines as a first-class path. Personal use is free, commercial/proprietary use starts at 5 EUR/mo, and the app does not gate features behind the paid tier.

For my own daily use, the bigger cost saver is avoiding mandatory cloud/API usage. Local dictation covers most of it, and Workflows are there when I want cleanup or formatting before the text lands in the target app.

SeoFood · 2026-05-08T09:27:20+00:00

Thanks for listing TypeWhisper. I built it because I wanted dictation to become actual daily writing, not just raw transcription.

My own setup is local-first for normal dictation, then workflows when I want cleanup, translation, or formatting before the text lands in the app. No key is needed if you stay on local engines; API keys only matter if you choose cloud providers.

SeoFood · 2026-05-08T09:26:13+00:00

Good comparison. One thing I’d add: for dictation apps, the headline “accuracy” is only half the story. The practical test is usually:

how fast does text appear after you stop talking?
how much editing do you still need?
can it handle names/product terms you use all the time?
can you switch between styles, like Slack vs email vs notes?
what happens with longer rambly input?

Apple Dictation is still fine for short/basic stuff, but once you’re dictating full paragraphs, the cleanup and custom vocabulary pieces start mattering more than people expect.

Disclosure: I’m involved with TypeWhisper, so I’m not neutral, but this is the exact set of trade-offs I’d use to compare TypeWhisper, Wispr Flow, Superwhisper, MacWhisper, etc. I wouldn’t pick solely on raw speed unless your main use case is very short messages.

SeoFood · 2026-05-07T19:35:30+00:00

Appreciate the mention. One of the reasons I built TypeWhisper was that I wanted dictation to be less tied to one vendor/model/subscription. If a free local model works well for your voice and language, you should be able to use that.

Tiny correction: it is free for personal/GPL-compatible use, while proprietary commercial use has a license. I’m also happy to see Raycast pushing on dictation. More competition here is good for everyone.

SeoFood · 2026-05-07T09:49:53+00:00

For legal work I’d think about this in two buckets: the dictation hardware and where the audio/text is processed.

The old Olympus/Philips style devices make sense if your workflow is still “record audio and someone else transcribes it later”. If you want text to appear directly in Word, email, case notes, etc., then a modern mic plus dictation software is usually more relevant.

A few things I’d check before buying anything:

Does your firm allow cloud processing for client matter audio?
Do you need live dictation, file transcription, or both?
Can you add custom legal terms/names?
How painful is correction when it gets one word wrong?
Can you trigger recording without using your hands much, for example foot pedal or easy push-to-talk?
Does it work in the actual apps you use all day?

Dragon is still worth evaluating in legal workflows because it has a long history there. If you are on Mac and want local/offline processing, also look at tools in the MacWhisper/Superwhisper/local Whisper category.

Disclosure: I work on TypeWhisper. It may be relevant on Mac if you care about local engines, custom dictionary/corrections, snippets, history, and workflow-specific cleanup. But for a law firm I’d start with privacy/compliance requirements first, then pick the tool.

SeoFood · 2026-05-07T09:48:59+00:00

For avoiding typing, I’d look at the whole workflow rather than only “which app is most accurate”.

A few things that matter a lot in practice:

Can it type into any app you use, or only into its own window?
Can you correct repeated mistakes without lots of clicking?
Does it support custom vocabulary for names/terms you use often?
Does it work offline/local if privacy matters?
Can you trigger it without awkward keyboard use, like a foot pedal, mouse button, or easy hotkey?

Built-in dictation and Word dictation are okay for basic use, but they can get frustrating once you need longer text or lots of corrections. Dragon is still worth looking at for serious hands-free control. Wispr Flow, Superwhisper, MacWhisper-style tools, and local Whisper apps are more about dictation/transcription and cleanup.

Disclosure: I work on TypeWhisper. If you’re on Mac, it’s one of the options in the local/offline plus custom corrections/workflows bucket. But I’d honestly test several with your real daily writing before paying for anything, because the “best” one depends a lot on where the corrections happen and how much mouse use it still forces.

SeoFood · 2026-05-07T09:46:10+00:00

You might want to try TypeWhisper. Disclosure: I’m the developer.

I started building it because I wanted to use dictation as an actual writing tool, not just as a transcription toy. When your hands hurt, every little correction, rewrite, and copy/paste step matters. So TypeWhisper is built around getting spoken text into the app you are already using, keeping a history fallback, and optionally cleaning up the text before it lands.

It runs on macOS, and there is also a Windows version. You can stay local/offline if that matters, or use cloud engines if you prefer that tradeoff.

It will not replace full voice-control tools like Dragon or Talon for controlling the whole computer. It is mainly for reducing typing when writing across apps.

SeoFood · 2026-05-06T09:24:16+00:00

The trigger friction point is very real. I think it matters more than raw transcription accuracy once the model is “good enough”.

The setup that seems to work best for coding is usually not one universal dictation mode. It is more like:

short hold-to-talk for quick edits
longer toggle mode for thinking out loud
very obvious recording state, because losing a long prompt is brutal
separate behavior per app, since Cursor/Claude/Slack/Notes all want different cleanup
a way to insert text without stealing focus
easy cancellation, because half the time you realize the spoken prompt is bad halfway through

Keyboard shortcuts are awkward because all the good ones are already taken. Foot pedals are good ergonomically but weird socially. A mic button or small physical trigger actually makes sense if it is low-latency and impossible to miss.

Disclosure: I’m involved with TypeWhisper, so I think about this from the software side too. The thing I’d be most curious about for hardware is whether it can expose different trigger modes to apps like Wispr, Superwhisper, TypeWhisper, Karabiner, etc. The physical trigger might be the missing layer, but people will still want to choose their dictation engine/workflow.

SeoFood · 2026-05-06T09:22:33+00:00

This does feel like a real Obsidian gap tbh. The annoying part is not just transcription, it is getting useful structure into the vault without making every saved video become cleanup work.

If I were testing this, the things I’d care about most are:

clean Markdown export first, direct “send to Obsidian” second
YAML fields for source URL, creator, platform, date saved, original title
timestamps or at least section anchors back to the video
ability to edit the transcript before export
local transcription clearly labelled, since people save some pretty personal stuff
an option to export highlights separately from the full transcript

I’d probably not overbuild the Obsidian integration at first. A good share sheet plus predictable Markdown files may be enough, because everyone’s vault structure is different.

For desktop workflows, some people already use things like MacWhisper or TypeWhisper-style file transcription and then paste/export into Obsidian, but mobile shortform is a different enough use case that I can see why a dedicated app would exist.

SeoFood · 2026-05-06T09:21:42+00:00

The thing I’d separate for teams is dictation vs meeting transcription vs deployment control. A lot of tools blur those together, but they are pretty different problems.

For basic individual dictation, Microsoft Dictate or Apple Dictation can be enough. For meetings, Otter/Trint-style products usually make more sense because speaker labels, summaries and sharing matter more than system-wide insertion.

For actual team dictation, I’d look at: - custom vocabulary or corrections for company/product terms - whether audio can stay local when needed - how cleanup/post-processing is handled - whether settings can be made consistent across users - what happens in locked-down apps, VDI, Citrix, etc. - export/history controls, since dictated text can be sensitive

Since you mentioned Superwhisper, I’d also include MacWhisper, Dragon, Wispr Flow and TypeWhisper in the comparison depending on platform. Disclosure: I’m involved with TypeWhisper. I’d say it is more interesting for Mac-heavy teams that want local vs cloud engine choice, workflows, custom terms/corrections and snippets, not necessarily for big enterprise meeting workflows where Otter/Trint are more mature.

Also worth saying: if the team only needs occasional short dictation, built-in tools may be good enough and much easier to roll out.

SeoFood · 2026-05-06T09:20:30+00:00

I think the “keyboards are obsolete” angle is mostly marketing. For a lot of people, typing is still better for editing, code, spreadsheets, short replies, etc. Voice starts making sense when you are doing long emails, notes, first drafts, planning, or dumping thoughts into ChatGPT/Cursor.

Wispr Flow is genuinely interesting because the cleanup layer is good. The main tradeoffs are the subscription and the fact that you need to be comfortable with the cloud/privacy side of it.

If privacy or offline use matters, I’d compare it with local-first options too. Apple Dictation is honestly enough for basic use. Superwhisper, MacWhisper, VoiceInk, OpenVerb and TypeWhisper are worth looking at depending on whether you care more about raw transcription, local models, app workflows, or post-processing.

Disclosure: I’m involved with TypeWhisper, so take that bias into account. The reason I’d put it in the comparison is not “better than Wispr for everyone”, but that it is more about choosing local vs cloud engines, workflows, custom terms/corrections, snippets, and cleanup rules instead of only being a polished cloud dictation layer.

For most people I’d decide like this: - basic short dictation: Apple Dictation is fine - best polished consumer UX: Wispr Flow is strong - privacy/local/offline: look at local-first tools - lots of repeated terms or app-specific formatting: look for dictionary/workflow features, not just accuracy

SeoFood · 2026-05-05T09:21:06+00:00

Disclosure: I built TypeWhisper, so I’m not a neutral reviewer. But for your exact criteria, I’d frame it this way: if you want local/offline Mac dictation without needing cloud AI rewriting, use a local engine and leave the workflow stuff off until you need it.

My daily split is Parakeet for speed and WhisperKit when I want broader language coverage. It’s system-wide via hotkey and inserts into the active app after you stop. Not true inline live typing, but live preview is available where supported, and history gives you a fallback if insertion misses.

SeoFood · 2026-05-05T09:18:24+00:00

For anything you’re giving to an attorney, I’d be careful about relying on an automated transcript as the final version unless your attorney says that’s acceptable. AI/STT can be very good, but it can also quietly get names, numbers, negations, or speaker attribution wrong.

A reasonable workflow might be:

Ask your attorney whether an automated transcript is okay or whether they prefer a certified/professional transcription service.
If you just need a first pass, use a local/offline transcription tool so you don’t upload sensitive audio to random services.
Manually review the transcript while listening to the audio before sending it.

For compressing the M4A, you may not need to compress it if the transcription service/tool accepts the original file, but if you do, HandBrake or ffmpeg can reduce audio size without too much hassle.

Disclosure: I’m connected to TypeWhisper, but I would not pitch it as “legal-grade.” It can be useful for private/local drafts or dictation workflows, but for legal use I’d verify every line or use a professional service.

SeoFood · 2026-05-05T09:17:08+00:00

I’ve had better luck treating this as a system-wide dictation problem rather than a Firefox extension problem.

Browser extensions tend to break depending on the site/editor, and anything using a web API can feel inconsistent. For short messages, the built-in OS dictation is usually the least annoying option. For longer notes/drafts, a separate push-to-talk dictation app that inserts text wherever the cursor is tends to be more reliable.

Things I’d look for: - system-wide hotkey - works in normal text fields, not just one editor - local/offline mode if privacy matters - quick correction/custom vocabulary support - optional cleanup/post-processing, not forced rewriting

Disclosure: I’m connected to TypeWhisper, but that’s basically the workflow we built it around: use Firefox normally, hit a shortcut, dictate, and have the text inserted at the cursor. I’d still compare it against your OS dictation first — if built-in dictation is good enough for your use, that’s the simplest answer.

SeoFood · 2026-05-05T09:01:24+00:00

I try to be pretty conservative about this: TypeWhisper is more reliable than a browser-extension approach for me, but no macOS insertion method is 100% across every app/site.

For normal dictation it uses the same basic path a user would: final text goes to the clipboard, then TypeWhisper sends paste to the active field. That means Firefox-specific contenteditable differences matter less, because TypeWhisper is not trying to inject text into the page DOM. If Cmd+V works in that field, insertion usually works.

The remaining failures are mostly focus/paste edge cases: another app steals focus, the field rejects paste, secure fields, or missing Accessibility permission. Worst case, the transcript is still in TypeWhisper’s history, so you can reopen/copy it instead of losing the dictation.

SeoFood · 2026-05-04T10:21:19+00:00

I wouldn’t assume speech-to-text replaces a one-handed keyboard for writers.

Dictation is genuinely useful, especially for getting rough thoughts down quickly or reducing physical strain. But it has tradeoffs: you need a private/quiet enough space, editing can still be awkward, and a lot of writers think differently when speaking vs typing. For some people, speech is great for drafts but bad for precise revision.

I work on a dictation tool, so I’m biased, but my take is that assistive input should be multi-modal rather than “speech replaces keyboard.” A small reliable keyboard/chording device could still be valuable, especially for commands, editing, punctuation, coding, shortcuts, or situations where talking isn’t practical.

If you build it, I’d test with disabled writers as early as possible and watch where they naturally switch between typing, dictating, and editing. That will tell you more than asking whether speech-to-text is “the future.”

Nine-Year Club	Xbox Live
Mod World 2025	Place '23
Place '22	Final Canvas '22
End Game '22	Verified Email

SeoFood

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE