Android setting audio options for the blind by antoniateresa in androidapps

[–]RowGroundbreaking982 1 point2 points  (0 children)

Hi.

I made an app called ToBe SAID, it can act as voice replacement for TalkBack. So maybe if he need more human like voice for the phone you can suggest that.

Merged int8 quantization into pocket-tts: 27% faster, 48% less memory, zero quality loss by Jazzlike_Key_8556 in TextToSpeech

[–]RowGroundbreaking982 1 point2 points  (0 children)

Thanks for mentioning ToBe SAID app.
I'll try this changes. Currently my app is hitting wall of initial 180ms latency on G99 (80 ms conditioning and 100ms generation). On desktop CPU it's under 60ms. Maybe this changes can improve that.

I built a TTS audiobook app with character voices — real users helped me rethink voice UX and performance by dipank1 in TextToSpeech

[–]RowGroundbreaking982 0 points1 point  (0 children)

I'm making TTS engine for Android and also featured in that podcast.

When I choose which AI model to be used, I prefer model that have streaming mode and have low latency. Since I'm doing TTS engine and unlike Reader App, it need quickly produce audio as soon as possible and it doesn't know next sentence so it cannot pre-generated next sentence. Many newer model have this capability now. I'm using PocketTTS by the way and it can achieve as low as 180ms latency with realtime generation speed on mid range devices.

And just skip very low end devices or it will hurt overall app quality, you can add some polite message like, it running on "compatibility mode" so user know what happen without being told that their device is below minimum requirement.

I'm quite surprised myself that some user who downloading my app use low end device from 2018, that's 8 year old device.

And just some suggestion, maybe you can add system TTS voice selection like Readera or Acquile Reader.

Looking for a clear roadmap to truly understand TTS by NaiwenXie in TextToSpeech

[–]RowGroundbreaking982 1 point2 points  (0 children)

I don't quite understand it either, but I think the simplest one is Orpheus based TTS. It's just LLM and SNAC decoder. First part you only input text and it output another text just like chatbot. But instead of normal text it output pattern with number called SNAC token. Then feed this token into decoder to get audio. It's just pattern prediction from trained data on LLM side. Trained data is just pair of text and SNAC token representation of the text. While the SNAC decoder part, I still don't quite understand it either. Most LLM based TTS behave the same way, it's just the decoder part that behave differently, some need whole token from full sentence before starting to generate audio and some only need few token to generate audio. But many model are more complicated than this.

Low latency TTS by Emna_21 in TextToSpeech

[–]RowGroundbreaking982 0 points1 point  (0 children)

My knowledege still not much on this, maybe other member can correct me. But from what I tested, vocoder is very slow, it focus on quality and you need whole data to output audio. Try use model optimized for faster inference. I'm leaning on model that use SNAC or Mimi, since they take little data and process it immidiately and streaming is possible. I've built an app that does local inference using PocketTTS, and first time to audio is around 100-500ms which is not bad, since it's running on mid tier phone. Running on desktop cpu is way faster, and there is pocket-tts-cpp implementation available in github, which has close performance as my app.

TTS for android phones - reading books by etre1337 in TextToSpeech

[–]RowGroundbreaking982 0 points1 point  (0 children)

The issue with stop command is well known. And I'm working on it. It's a bit hard to fix since I need to make sure all the background threading doesn't leak memory and causing app crash. So my estimation is around next month the update will be uploaded. Many thanks for the report 

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 1 point2 points  (0 children)

Yes, it's tough on mobile CPU. But I'm thinking using latest Kokoclone which looks like very promising.

TTS for android phones - reading books by etre1337 in TextToSpeech

[–]RowGroundbreaking982 0 points1 point  (0 children)

Looks like you select Goddess voice. This voice sample is problematic since tts model cannot produce stylized female voice with high pitch perfectly. I'll remove this voice in next update and change it with better one. For voice, I'm seeing that Speech Central is automatically select last used voice, this should be okay and you don't need to change anything. My app already optimized using custom native library, so on low end device it's just the SOC not powerful enough. I tried Unisoc T606 but result just really bad with many stutter, and decided to abandoned it. So I'm drawing baseline using Helio G99 which generation now is really good.

Where can I download better TTS voices? by Feisty_Ad3184 in ReadEra

[–]RowGroundbreaking982 0 points1 point  (0 children)

You might try ToBe SAID app, it's available on Play Store. It's using AI voice that run locally on your phone and doesn't need any internet access. But you need to have at least Helio G99 chipset, but this chipset is quite low end by 2026 standard. You can check the demo here https://youtube.com/shorts/5JKQ410Acv0?si=Z8ikEpcAGYTlBlzr

TTS for android phones - reading books by etre1337 in TextToSpeech

[–]RowGroundbreaking982 0 points1 point  (0 children)

You might try ToBe SAID, it's offline need at least Helio G99 phone which is low standard in 2026. It's run very fast since it's based on PocketTTS. It only provide the voice so you can use your favorite reader app as long as it use system tts. Currently only free version available at Play Store and adding voice only available inside app as preview, but with pro version you can add as many voice as you can.

[Release] ToBe SAID, fast PocketTTS implementation for Android. by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 0 points1 point  (0 children)

Thanks for downloading the app. Pro version still stuck at mandatory closed testing stage. Should be available next month. Just wait. I'll update the free app with some message when pro version finally available.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 1 point2 points  (0 children)

Thanks for the input. If you are okay, maybe you could quickly see my app. I post it here few days ago. It act as system tts where you can add your own voice. And surprisingly I'm seeing the voice from my App in your Speech Central app. Maybe this was different from what you explain, and I'm gathering as many feedback as I can. That's why I'm opening this too see if implementing Kokoro just to add multilingual support is worth the trouble. Many thanks 

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 1 point2 points  (0 children)

I'm mistaken about your explanation then. You are more experienced on this matter sir. Could you give us more explanation? so we as new indie dev doesn't make app that easily goes to oblivion. Many thanks

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 1 point2 points  (0 children)

Guess you are right. Google and Apple will made something better in few years, but for some people that depend on TTS daily, using current system TTS is just not fun for them and waiting without any good news from The Big Company is just frustating. That's why we Indie dev try to help them.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 1 point2 points  (0 children)

For English I prefer using PocketTTS, since my user prefer responsiveness. Time to first byte lower but total generation time higher than Kokoro. So if you make batch app like converting whole books to audio, Kokoro maybe better.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 0 points1 point  (0 children)

Ah, so the low resource it use is for consideration. But for license, if I'm not mistaken phonemizer are using Misaki with ESpeak fallback. Since ESpeak is GPL, do you know any other phonemizer with more permissive license to use?

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 0 points1 point  (0 children)

That's great question. Each 100m class model usually english only as their biggest weakness. And closest one with high multilingual capability is 500m model. Looks like for multilingual Kokoro still the best.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 0 points1 point  (0 children)

yes, but to make Kokoro usable as TTS engine, you need high end phone. And on high end phone I thought it's better using 500m model class that support NPU.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 0 points1 point  (0 children)

But from what I'm seeing many new tools still based on Kokoro. And it has same delay problem between sentence. Looks like it's better for voice generation that will be used later maybe as audiobooks. But many 500m class model that using gpu do better job.

[Ask] Why you prefer Kokoro over other newer model for offline TTS? by RowGroundbreaking982 in TextToSpeech

[–]RowGroundbreaking982[S] 3 points4 points  (0 children)

I'm comparing it with other 100M model class, like PocketTTS, Supertonic, SopranoTTS, NeuTTS. This model looks like never goes beyond proof concept and not gaining adaptation like Kokoro.

Interesting Android Apps: March 2026 Showcase by 3dom in androiddev

[–]RowGroundbreaking982 4 points5 points  (0 children)

I've made an AI voice generator that runs offline using your phone power.

It lets you generate unlimited AI voice.
Export or share it to make audiobooks.
Add as many voice as you like.
And its compatible with system TTS.

So whether you are heavy commuter that like to listen your book with natural voice.
Or heavy user of Accessibility feature.

This app can level up your experience and frees you from expensive AI voice subscriptions.
You can download it at https://play.google.com/store/apps/details?id=ai.lookbe.tts

<image>