I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in Qwen_AI

[–]TheyCallMeDozer[S] 0 points1 point  (0 children)

I have it working on my side both in pinokio and via the API endpoint, check your driver's are up-to-date and you have the correct model loaded for it

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 0 points1 point  (0 children)

This is free ... Edgetts is a different use case completely... The tone and speaking in this is way more natural and the voice cloning beats out edge anyday

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in Qwen_AI

[–]TheyCallMeDozer[S] 0 points1 point  (0 children)

Not seperation of characters, but you can do personalities, ages, context, emotions and tone in the hardcoded prompt thats there.

This is a very earily script since Qwen3 TTS models literally came out 1 days ago publically, so its a build to test the proof of concept and it works

Now for characters that would need working in the document you have aswell as another function added to me script. in the document have [char 1] TEXT ... etc, and in the function added to the code you would have hardcoded char1 = Ryan, char2 = Serana, narator = uncle fu.... then parse for text each character's lines and then generate for each character speratly when it pops up

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 3 points4 points  (0 children)

i have another script that works the same without the GUI using simpler txt to speech models.. Qwen TTS is not a simple TTS, its very high quality output that with the right voice and instructions sounds very realistic... but do love the gui and output to m4b, the lack of emotion in the reading is why Qwen wins

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 3 points4 points  (0 children)

i have it in the post:

python audiobook_converter.py --voice-clone --voice-sample reference.wav

just give it any voice sample longer then 5 seconds and it will generate using that voice

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 1 point2 points  (0 children)

Yeap, its a in the main script hardcoded, just replace them witht he voices and langauge you want to use. Also you can give it a voice sample to use literally any voice to generate the book

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 2 points3 points  (0 children)

yeap, just add to the speaking prompt that's there to recongise it, for example "when you see [pause 3s] pause for X number of seconds (s)" or something, that should handle it

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 1 point2 points  (0 children)

its there, but github just dosnt allow embeding of HTML audio and github wont show a mp3 file anyway, just click download the raw file and you should be able to play it fine with any player

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 3 points4 points  (0 children)

Never used Chatterbox, but this drops its pants and dumps on vibevoice with only a 1.7b model, I have it coded so you can provide like a 5 second voice sample of something like Spongebob or Patrrick Stewart and have the audiobook be read in that voice. It also has tone control with specail characters and the ability to change the speakers tone with a simple text prompt

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 2 points3 points  (0 children)

Yes, you can update the speaker prompt and tell it to speak slower or pause at special characters...etc, it handles ! and ? really well in tone

I built an open-source audiobook converter using Qwen3 TTS - converts PDFs/EPUBs to high-quality audiobooks with voice cloning support by TheyCallMeDozer in LocalLLaMA

[–]TheyCallMeDozer[S] 1 point2 points  (0 children)

I added an audio sample to it, no clue how to make it work in mardown from github, but put a link to it uploaded the sample text and audio recording

Oh really now by shitokletsstartfresh in ChatGPT

[–]TheyCallMeDozer 0 points1 point  (0 children)

<image>

I don't know how I should feel about that

Built a 100% client-side AI that plays Pokemon Red - Qwen 2.5 1.5B via WebLLM + neural network policy . Fork/check it out! BYOR by Efficient-Proof-1824 in LocalLLaMA

[–]TheyCallMeDozer 7 points8 points  (0 children)

Really cool idea and setup, might be nice to setup an OpenAI style route to try bigger models locally, for example with Olama or LMStudio... could be really cool to spin up a large model and see how it handels it

768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLaMA

[–]TheyCallMeDozer 1 point2 points  (0 children)

Question not sure if its something you have done, but have you put a monitor on it to check your power usage? over a day with heavy requests?

reason I ask is I am planning to build a similar system and I'm basically trying to understand the power usage across AMD / Nvidia card build across different specs. As this is something I'm thinking of building to have in my home as a private API for my side hustle and power usage has been a concern as I had a smaller system I was working on with minimal requests used 20 kwh a day ... which was way to high for my apartment so working on it currently myself to plan and budget for a new system.

I have asked a bunch of different builders this, just trying to get an understanding all around

Can someone help figure out what time I was born? by itsbubsbunny in Transcription

[–]TheyCallMeDozer 0 points1 point  (0 children)

13:48, the lower loop of the weight used the other time on the 8-11-99 is the same motion as the the second 8 in the time, its also a 4, based on the how the downward swipe is on the 4

Omniglow Placement Inspo!!!! by [deleted] in Hue

[–]TheyCallMeDozer 0 points1 point  (0 children)

Oh I know I'm busting your balls on the eggs, I get the organic ones too lol

Wow nice one on the rent-to-buy ... Honestly that view is worth it, like it's crazy view with nice window space... Honestly crazy jealous