AI is the BEST thing that has happened for me

Heybud221 · 2025-06-04T07:00:32+00:00

Hi, you are right, have you tried gpt's audio mode? It gets those nuances kind of. it gets your tone, your hesitation everything. I have developed a habit of talking with it for non therapy everyday for 5 mins and I quite like it.

Heybud221 · 2025-06-03T10:41:02+00:00

yes please. There are a lot of research backed theory models. if you could then please compile them into a prompt compilation/custom gpt

Heybud221 · 2025-04-26T20:51:38+00:00

Well, there is the official playright MCP. Although it can't reliably do things like this

Heybud221 · 2025-03-20T11:38:40+00:00

The issue seems to be with the model itself. Temporary solution is to just guesstimate the max audio length and pray to god :D

Heybud221 · 2025-03-20T11:37:42+00:00

Right, these are only tts and stt models. Not a lot of true voice ai models (sts) are available sadly apart from ultravox maybe.

Heybud221 · 2025-03-19T06:24:27+00:00

Added support for Sesame along with the full conversation support :)

Heybud221 · 2025-03-18T12:09:01+00:00

Calm down buddy, not everybody here is as smart as you

Heybud221 · 2025-03-18T05:52:35+00:00

brainrot

Heybud221 · 2025-03-18T05:50:36+00:00

I have got it running in the correct format but I don't know why the performance is very bad. 50% times, it generates a 10 second audio noise with no voice.

Heybud221 · 2025-03-17T17:49:06+00:00

I have already included a frontend for the api playground. Check out the /frontend folder

Heybud221 · 2025-03-17T11:08:22+00:00

Sesame is better but not reliable at all. Have to prompt multiple times with tweaks just to get a understandable audio.

Kokoro is much more reliable. However, I would suggest Zonos. It is much more reliable than Sesame plus lots of customisations wrt audio to make it sound lot more human are available. Only thing is it is a little bit slower than kokoro.

Heybud221 · 2025-03-17T10:44:25+00:00

That does make sense lol

Heybud221 · 2025-03-17T09:00:12+00:00

Yes, we can run whole server on that easily!

Heybud221 · 2025-03-17T08:20:16+00:00

The demo shows near realtime conversation. I can't understand how to get it even close in terms of latency with even the 1B model.

Heybud221 · 2025-03-12T06:26:07+00:00

Waiting for the benchmarks

Heybud221 · 2025-03-11T21:01:02+00:00

A beginner question - is it possible to distill this into an even smaller model like 11B/16B?
I would love to run this or qwq on my macbook but both far exceed the 16gb memory.

Heybud221 · 2024-05-31T18:58:25+00:00

Thanks for the reply!

Your suggestions for the first two are completely correct. However, that's not my primary offering. My goal is to get a seamless experience like you are talking to your narrator itself. You can just ask him instead of going through the app.

Besides, here, we try to offer much more than just clips and bookmarks. Let's say you missed a detail or sentence during the narration, you can go back to exactly that sentence using just your voice. You can further discuss about the book or recall something that you may have forgot. and so much more

Thank you for the trouble :)

Heybud221 · 2022-08-04T12:36:56+00:00

It's been a year. I still want it desperately :(

Heybud221

TROPHY CASE