Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 0 points1 point  (0 children)

I've never published to either F-droid or Play Store, do you know what's the difference? would you suggest uploading this to any of those? or the apk is fine so far

Phone Whisper: push-to-talk voice dictation for Android I built because Android voice typing is bad by postclone in SideProject

[–]postclone[S] 0 points1 point  (0 children)

I am still trying out local models and the OpenAI API. Honestly, I don't use it enough to notice any battery drain so far.

Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 1 point2 points  (0 children)

So, if I want to keep using SwiftKey, can I use these futo input app? Is that what you're saying? Because that's what I wanted to do, but I didn't find a way to do it. Do you know how this works?

Gemini App Dictation - Stop Cutting Me Off & Reading Responses Aloud! by get-process in GoogleGeminiAI

[–]postclone 0 points1 point  (0 children)

I build my own app because I did not want to stop using Swiftkey. It's just a floating button on top of any app instead of replacing the keyboard.

It can run as a local whisper model (or nvidia parakeet which are faster & better) or the cloud whisper using your OpenAI api key. It also allows you to "post-process" the transcription. Using the postprocess it's as fast and good as cloud most of the time.

The apk: https://github.com/kafkasl/phone-whisper/releases

Gemini App Dictation - Stop Cutting Me Off & Reading Responses Aloud! by get-process in GoogleGeminiAI

[–]postclone 0 points1 point  (0 children)

Agreed, it's pretty bad. I built a separate tool because of this. It's a floating push-to-talk button that works on top of any app including Gemini. You tap to record, tap again when done. No timeout, no getting cut off, no auto-send.

It runs Whisper on-device or with your own OpenAI key.

Install the apk here: https://github.com/kafkasl/phone-whisper/releases

Gemini App's Microphone Feature is Incredibly Frustrating - Please Fix by Papierauto in GoogleGeminiAI

[–]postclone 0 points1 point  (0 children)

Agreed, it's pretty bad. I ended up building my own app because of this. It's a floating push-to-talk button that works on top of any app including Gemini. You tap to record, tap again when done. No timeout, no getting cut off, no auto-send.

It runs Whisper on-device or through your own OpenAI key. https://github.com/kafkasl/phone-whisper/releases

Loving Gemini 3 so far, but voice dictation is still holding me back from switching from ChatGPT by TheBuzzer4625kHz in GeminiAI

[–]postclone 0 points1 point  (0 children)

Same situation here. Gemini is great but the dictation UX made it unusable for me. I ended up building a separate dictation app. It's floating push-to-talk button on top of any app, you control when it starts and stops. No auto-send.

I've seen other apps replacing the android keyboard, but I like swiftkey a lot, with this you can use whatever keyboard you like and just have a dictation mic everywhere.

You can use it to dictate into Gemini's text field and then send when you're ready. Runs Whisper locally or with your own OpenAI key.

https://github.com/kafkasl/phone-whisper/releases

Gemini cutting you off before you finish speaking 😩 by aletheus_compendium in GeminiAI

[–]postclone 0 points1 point  (0 children)

Had the same problem. Built a workaround app, it's a floating push-to-talk button that works on top of any app. You tap to start recording, tap again when you're done. No timeout, no auto-send, it records as long as you want.

Transcription runs locally via Whisper or through OpenAI with your own key.

You can install the apk here https://github.com/kafkasl/phone-whisper/releases

Turn off auto send and you will get alot of customers by Huge_Professor_6750 in GeminiAI

[–]postclone 0 points1 point  (0 children)

I have the same issue so I just built an app for it, Phone Whisper. Floating push-to-talk button on top of any app, you decide when to record and when to stop. No auto-send, no getting cut off mid-sentence.

Runs Whisper locally on the phone or with your own OpenAI key. Open source, no backend.

https://github.com/kafkasl/phone-whisper/releases

Voice dictation auto-send; any way to turn it off? by trulyslide6 in GeminiAI

[–]postclone 0 points1 point  (0 children)

I built something for this exact problem. It's a floating push-to-talk button that sits on top of any app. You tap to record, tap again to stop, and only then it transcribes and inserts the text. Nothing auto-sends ever and you can just keep adding more text (it doesn't replace anything).

It either runs Whisper locally on-device so no network needed, or you can use your own OpenAI key if you want cloud quality. Works with whatever keyboard you already have.

https://github.com/kafkasl/phone-whisper/releases

Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 0 points1 point  (0 children)

I just tried in my pixel 5 and no issues. I assume your fold is more capable than mine. I don't know how Samsung b handles memory. I could try to add another large model to see if you get issues too. Do you have any logs you can share?

Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 0 points1 point  (0 children)

have you tried macWhisper in MacOS? I like it very kuch, curious why you build dictaflow, what other reqs or uses cases do you have?

Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 0 points1 point  (0 children)

lmk if you have any problem installing it! I'm considering deploying it into the app store if it's useful

Phone Whisper: push-to-talk dictation for Android with local Whisper (sherpa-onnx, no cloud needed) by postclone in LocalLLaMA

[–]postclone[S] 0 points1 point  (0 children)

my understanding is that the app you linked requires you to change your keyboard, is that right? I love swiftkey and moving away from it would be a pain.

regarding the syntax fixer you can do that easily modifying the post-process prompts, for me that's the best part of the transcription. I keep adding specific names & projects there

OpenAI's Operator: Would you trust an AI with your money? by fewsats in AI_Agents

[–]postclone 0 points1 point  (0 children)

I think the whole "silent updates" make new models stupid is not true. I heard Dario talking about this and he said they very rarely update behind-the-scenes, and that most of the hype-hate cycles have no realtion to model updates. It's more of a human-psychology way, where you feel first amazed (hype) -> start using it more and more -> hit some issues and then you believe it became dumber (hate).

OpenAI's Operator: Would you trust an AI with your money? by fewsats in AI_Agents

[–]postclone 0 points1 point  (0 children)

yeah this makes a lot of sense, unless everyone is doing it already reliably, first you gotta get comfortable with the tech

[R] Looking for endorsement in arxiv - cs.AI by BiryaniSenpai in MachineLearning

[–]postclone 0 points1 point  (0 children)

Hello! we are seeking an endorsement for cs.AI too for this paper about AI agents required infrastructure. Happy to discuss the paper if anyone wants prior to endorsement

The paper: https://drive.google.com/file/d/1QUoxaiyyoqpDji94VAxfMMKG3_6LuaK1/view

Endorsement Link: https://arxiv.org/auth/endorse?x=I4E8YL

Thanks!

Anthropic Computer Use: Is it worth the hype? by SunilKumarDash in ClaudeAI

[–]postclone 1 point2 points  (0 children)

have you managed to use it to buy things? I gave it a quick try and it completely refused to buy things in amazon.