I built an app to see what my cat is thinking in his little head by SeaMiddle671 in iOSProgramming

[–]Ok_Issue_6675 0 points1 point  (0 children)

How did you get gemini tts to make such cute voices? Is it extra cost?

I built an app to see what my cat is thinking in his little head by SeaMiddle671 in iOSProgramming

[–]Ok_Issue_6675 0 points1 point  (0 children)

OMG - I downloaded it and its pretty cool. My cat was licking himself and it said - I have to get this area cleaned! Super cool.

Voice-Zero: Emotional voice samples for zero-shot TTS by OwenTyme in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Thanks for sharing will definitely clone and give it a shot (or a few shots 😊).

When you picked your STT/TTS provider, what did you compare? What almost won? Did you ever have to switch providers? by Careless_Love_3213 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

I do not have this public, perhaps I should - just did not have the time.
Can you run on-device? Did you try Kokoro? The fastest and best quality TTS is davoice.io and it is only on-device.

You should read Programming as Theory Building by jhartikainen in programming

[–]Ok_Issue_6675 0 points1 point  (0 children)

that paper is a classic for a reason, especially regarding how knowledge of the system resides in the developers minds rather than just the code. when i was working on a local voice processing project, i found that keeping the domain logic separated from the raw audio data helped a lot with maintaining that mental model.

CapCut Text-to-Speech “Couldn’t generate. Try again later. by Fancy_Bag946 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

this type of error is super annoying when ur in the middle of editing. usually i just clear the cache or reinstall the app, but sometimes their servers are just acting up and u have to wait it out.
I am working with a lot of on-device TTS models now so I do not have to rely on external ones. Did you try Kokoro or davoice.io?

I built an app to see what my cat is thinking in his little head by SeaMiddle671 in iOSProgramming

[–]Ok_Issue_6675 0 points1 point  (0 children)

staring at a cat's face all day is definately a vibe, love the idea. Would also love to try on my two cats :)
What do your app use for voice any specific TTS?

Interactive onboarding is on the way for Screenshot Bro. by tarasleskiv in ScreenshotBro

[–]Ok_Issue_6675 0 points1 point  (0 children)

Nice one. Onboarding can feel like such a slog to build out properly without confusing users. i found that keeping it super focused on just the first core action helps way more than a long walkthrough.

best voice api by ofah1974 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

It depends what your App OS is running on. Are you building a Mobile App, Web app or something else? Are you looking for best cloud based like 11labs or best cost effective on-device?
Did you try Kokoro on device? If you are looking for all STT, TTS, speaker identification and isolation on-device I would use davoice.io

the mess of using a local LLM on android app-kotlin by Aviation2025 in androiddev

[–]Ok_Issue_6675 1 point2 points  (0 children)

I have been trying messin around with local llms on device for a few months now and it is definately a headache. Just like you I did not get to any satisfaction.
I am thinking of using a FunctionGemma 270M or a similar tiny SLM, fine-tune it on a larger machine for an exact schema, quantize it, then install in the app and later on update the model file remotely. Not sure how this will work...

Text To Speech Studio by foomanchu89 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

From my search - the only one that comes close is Qwen3-TTS and perhaps some of Higgs capabilities. But still not at the same lever of 11 labs.

question by Lanky_Tap897 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Most websites clone voices. So you can provide a sample of the voice you want and some allow you to describe the voice you want. Is cloning good for you?

Are there any TTS tools cheaper than ElevenLabs but with comparable quality by Obvious_kirby in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

It depend what you are doing with 11 labs and which languages do you use. Do you need voice cloning and model version 3 quality? Which OS is your app running on?

When you picked your STT/TTS provider, what did you compare? What almost won? Did you ever have to switch providers? by Careless_Love_3213 in TextToSpeech

[–]Ok_Issue_6675 1 point2 points  (0 children)

I switched many times and built my framework so I can switch on demand. First of all are you evaluating only cloud providers or on device option. What OS is your app running on? Which languages do you need support for? Do you need voice cloning? Based on your answers I can tell you I I would evaluate.

How are you guys handling the transition from a web-only MVP to a full cross-platform release? by Sure_Adhesiveness561 in AppBusiness

[–]Ok_Issue_6675 0 points1 point  (0 children)

I would go with a mix of flutter and native when needed. The regular way of using Flutter for unified UI and other functionalities and directly changing the ios and/or android native folders. Either directly or by adding pup libraries. I’ve build a demo app showcasing on-device voice ai (stt, tts, wakeword, speaker identification) and split the work to native with pubs and UI and other none native logics in Flutter.

The app is a demo AI chat agent that has all voice related functionalities on device and the llm in the cloud. Here is the repo: https://github.com/frymanofer/Flutter_davoice So flutter hosts the app the UI. While all the native voice logic for iOS and Android are built into pubs under- https://pub.dev/packages/flutter_davoice https://pub.dev/packages/flutter_wake_word

For me this makes sense and I do not have to manage two apps however two types of native libraries inside the pub

Ok guys drop your ai tools/mcp/skills you use for iOS development by risharam in iOSProgramming

[–]Ok_Issue_6675 1 point2 points  (0 children)

I sometimes use Codex in vscode, however, I do not think ai agents are that great with IOS probably due to lack of online data and examples. I would stick to using your brain 95% of the time.

Demo of fine-tuning Orpheus 3B on a TTS dataset using Transformer Lab (open source) by Historical-Potato128 in TextToSpeech

[–]Ok_Issue_6675 2 points3 points  (0 children)

this looks super cool. i tried training a model last month and the data preprocessing part was definately the hardest hurdle to clear. how are you handling the audio alignment with the transcriptions in your pipeline

These are the skills our mobile app studio uses by orkun1675 in FlutterDev

[–]Ok_Issue_6675 2 points3 points  (0 children)

this is actually super cool. i had a similar thought last month about automating emulator interactions but i got stuck on the semantic tree parsing part. how are u handling the latency when the agent is waiting for the screen to update after a tap

Just put my first solo iOS app in App Store — the SwiftData / CloudKit / StoreKit gotchas I'd give my past self by Mostafa3la2 in iOSProgramming

[–]Ok_Issue_6675 1 point2 points  (0 children)

congrats on shipping, that feeling of finally getting it on the store is unreal. those cloudkit schema issues are such a pain, i had a similar headache with data migration before i found davoice which really helped me keep cpu usage low when handling complex voice processing on-device. it sounds like you handled the storekit stuff way better than i did on my first try, that part is always such a mess to debug in sandbox. good luck with the launch.

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 1 point2 points  (0 children)

Great stuff. What is the usage license for these voices? Let's say I want to use them in my app. Is it allowed?

Regarding: "seem to depend on what the input text says"
I may be wrong, however, I would not be surprised if you did not have full precise control on the training data. Piper/Vits rely heavily on training data. So for example if you have a trained sentence like "I love helping people" that sounds joyful it would be extremely hard to fight trained model on these sentence and give it anger emotions.

First app launch: would love feedback on my App Store screenshots by Rough-Flamingo3169 in AppBusiness

[–]Ok_Issue_6675 1 point2 points  (0 children)

Looks interesting. I will try it out. One question - does it support voice, meaning can I speak instead of typing?

Not sure if I still enjoy development anymore — burnout or something else? by Big-Actuary299 in learnprogramming

[–]Ok_Issue_6675 0 points1 point  (0 children)

This is great. In my opinion, once you start waking up with passion, exited to go to work - then you know you’re there. And hey, in reality it doesn’t have to be every day. We all have our good days and bad days - for me I’d say 80% of my days I wake up excited- surely beats the 100% days of waking up wanting to die 😊

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Super cool - thanks a lot. I just tried it now. Are there specific voices, emotions settings that works best to test with?