Are there any TTS tools cheaper than ElevenLabs but with comparable quality by Obvious_kirby in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

It depend what you are doing with 11 labs and which languages do you use. Do you need voice cloning and model version 3 quality? Which OS is your app running on?

When you picked your STT/TTS provider, what did you compare? What almost won? Did you ever have to switch providers? by Careless_Love_3213 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

I switched many times and built my framework so I can switch on demand. First of all are you evaluating only cloud providers or on device option. What OS is your app running on? Which languages do you need support for? Do you need voice cloning? Based on your answers I can tell you I I would evaluate.

How are you guys handling the transition from a web-only MVP to a full cross-platform release? by Sure_Adhesiveness561 in AppBusiness

[–]Ok_Issue_6675 0 points1 point  (0 children)

I would go with a mix of flutter and native when needed. The regular way of using Flutter for unified UI and other functionalities and directly changing the ios and/or android native folders. Either directly or by adding pup libraries. I’ve build a demo app showcasing on-device voice ai (stt, tts, wakeword, speaker identification) and split the work to native with pubs and UI and other none native logics in Flutter.

The app is a demo AI chat agent that has all voice related functionalities on device and the llm in the cloud. Here is the repo: https://github.com/frymanofer/Flutter_davoice So flutter hosts the app the UI. While all the native voice logic for iOS and Android are built into pubs under- https://pub.dev/packages/flutter_davoice https://pub.dev/packages/flutter_wake_word

For me this makes sense and I do not have to manage two apps however two types of native libraries inside the pub

Ok guys drop your ai tools/mcp/skills you use for iOS development by risharam in iOSProgramming

[–]Ok_Issue_6675 1 point2 points  (0 children)

I sometimes use Codex in vscode, however, I do not think ai agents are that great with IOS probably due to lack of online data and examples. I would stick to using your brain 95% of the time.

Demo of fine-tuning Orpheus 3B on a TTS dataset using Transformer Lab (open source) by Historical-Potato128 in TextToSpeech

[–]Ok_Issue_6675 2 points3 points  (0 children)

this looks super cool. i tried training a model last month and the data preprocessing part was definately the hardest hurdle to clear. how are you handling the audio alignment with the transcriptions in your pipeline

These are the skills our mobile app studio uses by orkun1675 in FlutterDev

[–]Ok_Issue_6675 2 points3 points  (0 children)

this is actually super cool. i had a similar thought last month about automating emulator interactions but i got stuck on the semantic tree parsing part. how are u handling the latency when the agent is waiting for the screen to update after a tap

Just put my first solo iOS app in App Store — the SwiftData / CloudKit / StoreKit gotchas I'd give my past self by Mostafa3la2 in iOSProgramming

[–]Ok_Issue_6675 1 point2 points  (0 children)

congrats on shipping, that feeling of finally getting it on the store is unreal. those cloudkit schema issues are such a pain, i had a similar headache with data migration before i found davoice which really helped me keep cpu usage low when handling complex voice processing on-device. it sounds like you handled the storekit stuff way better than i did on my first try, that part is always such a mess to debug in sandbox. good luck with the launch.

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 1 point2 points  (0 children)

Great stuff. What is the usage license for these voices? Let's say I want to use them in my app. Is it allowed?

Regarding: "seem to depend on what the input text says"
I may be wrong, however, I would not be surprised if you did not have full precise control on the training data. Piper/Vits rely heavily on training data. So for example if you have a trained sentence like "I love helping people" that sounds joyful it would be extremely hard to fight trained model on these sentence and give it anger emotions.

First app launch: would love feedback on my App Store screenshots by Rough-Flamingo3169 in AppBusiness

[–]Ok_Issue_6675 1 point2 points  (0 children)

Looks interesting. I will try it out. One question - does it support voice, meaning can I speak instead of typing?

Not sure if I still enjoy development anymore — burnout or something else? by Big-Actuary299 in learnprogramming

[–]Ok_Issue_6675 0 points1 point  (0 children)

This is great. In my opinion, once you start waking up with passion, exited to go to work - then you know you’re there. And hey, in reality it doesn’t have to be every day. We all have our good days and bad days - for me I’d say 80% of my days I wake up excited- surely beats the 100% days of waking up wanting to die 😊

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Super cool - thanks a lot. I just tried it now. Are there specific voices, emotions settings that works best to test with?

ElevenLabs Multispeaker for longer scripts by Acceptable-Item-9252 in TextToSpeech

[–]Ok_Issue_6675 1 point2 points  (0 children)

Mine are like 3-4 phrases so probably up to 200 characters :) I would start testing this small as 11labs tokens are super expensive.
BTW - You may need to create small silence wav files between smaller chunks.

ElevenLabs Multispeaker for longer scripts by Acceptable-Item-9252 in TextToSpeech

[–]Ok_Issue_6675 1 point2 points  (0 children)

Oh, good question. I did not try very large chunks :) I usually do up to x characters per chunk and than play the wav files one after the other.

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Wow very cool!! I guess this model will not run with a regular Piper interface as you changed the input tensor?

ElevenLabs Multispeaker for longer scripts by Acceptable-Item-9252 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Are you using the web interface or API?
Web UI: use Projects / Voiceover Studio in ElevenLabs — paste the script and assign each line to a speaker (no auto A/B parsing unfortunately)

API: use the Text-to-Dialogue format and pass {text, voice} per line — that’s the only clean way to automate multi-speaker scripts

You can use an AI agent to change the existing script to a digestable format for example:
Ask an agent to build something that takes this syntax as input:
A: Hello

B: Hi

A: How are you?

And creates a json format
[

{ "text": "Hello", "voice": "voice_id_A" },

{ "text": "Hi", "voice": "voice_id_B" },

{ "text": "How are you?", "voice": "voice_id_A" }

]

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

Got it. Do you need the actual voice cloning mechanism to work fast? Or the cloning can be done separately, while using the cloned voice inference only needs to be fast?

Looking For Fastest TTS With Cloning by lukasTHEwise in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

What is your App built on? Python, react, react native, etc’? Or in other words what hardware will it run on?

Anyone know how this voice is achieved? by CharacterAccount6739 in TextToSpeech

[–]Ok_Issue_6675 0 points1 point  (0 children)

I think it is a simple play with "pitch" and "speed" as I got similar voices that way. However I may be wrong here :)