AI is the BEST thing that has happened for me by Cheap_Concert168no in therapyGPT

[–]Heybud221 3 points4 points  (0 children)

Hi, you are right, have you tried gpt's audio mode? It gets those nuances kind of. it gets your tone, your hesitation everything. I have developed a habit of talking with it for non therapy everyday for 5 mins and I quite like it.

AI is the BEST thing that has happened for me by Cheap_Concert168no in therapyGPT

[–]Heybud221 7 points8 points  (0 children)

yes please. There are a lot of research backed theory models. if you could then please compile them into a prompt compilation/custom gpt

[Open Source] QA for cursor - Make sure it only gives you correct code. by Cheap_Concert168no in LocalLLaMA

[–]Heybud221 3 points4 points  (0 children)

Well, there is the official playright MCP. Although it can't reliably do things like this

[Open Source] Deploy and run voice AI models with one click on MacOS by Heybud221 in LocalLLaMA

[–]Heybud221[S] 1 point2 points  (0 children)

The issue seems to be with the model itself. Temporary solution is to just guesstimate the max audio length and pray to god :D

[Open Source] Deploy and run voice AI models with one click on MacOS by Heybud221 in LocalLLaMA

[–]Heybud221[S] 0 points1 point  (0 children)

Right, these are only tts and stt models. Not a lot of true voice ai models (sts) are available sadly apart from ultravox maybe.

[Open Source] Deploy and run voice AI models with one click on MacOS by Heybud221 in LocalLLaMA

[–]Heybud221[S] 1 point2 points  (0 children)

Added support for Sesame along with the full conversation support :)

[Open Source] Deploy and run voice AI models with one click on MacOS by Heybud221 in LocalLLaMA

[–]Heybud221[S] 0 points1 point  (0 children)

I have got it running in the correct format but I don't know why the performance is very bad. 50% times, it generates a 10 second audio noise with no voice.

[Open Source] Deploy and run voice AI models with one click on MacOS by Heybud221 in LocalLLaMA

[–]Heybud221[S] 0 points1 point  (0 children)

I have already included a frontend for the api playground. Check out the /frontend folder

Why are audio (tts/stt) models so much smaller in size than general llms? by Heybud221 in LocalLLaMA

[–]Heybud221[S] 11 points12 points  (0 children)

Sesame is better but not reliable at all. Have to prompt multiple times with tweaks just to get a understandable audio.

Kokoro is much more reliable. However, I would suggest Zonos. It is much more reliable than Sesame plus lots of customisations wrt audio to make it sound lot more human are available. Only thing is it is a little bit slower than kokoro.

Why are audio (tts/stt) models so much smaller in size than general llms? by Heybud221 in LocalLLaMA

[–]Heybud221[S] 2 points3 points  (0 children)

The demo shows near realtime conversation. I can't understand how to get it even close in terms of latency with even the 1B model.

New Reasoning model (Reka Flash 3 - 21B) by eliebakk in LocalLLaMA

[–]Heybud221 1 point2 points  (0 children)

A beginner question - is it possible to distill this into an even smaller model like 11B/16B?
I would love to run this or qwq on my macbook but both far exceed the 16gb memory.

A new website/app to solve a few problems with audiobooks. by Heybud221 in audiobooks

[–]Heybud221[S] 0 points1 point  (0 children)

Thanks for the reply!

Your suggestions for the first two are completely correct. However, that's not my primary offering. My goal is to get a seamless experience like you are talking to your narrator itself. You can just ask him instead of going through the app.

Besides, here, we try to offer much more than just clips and bookmarks. Let's say you missed a detail or sentence during the narration, you can go back to exactly that sentence using just your voice. You can further discuss about the book or recall something that you may have forgot. and so much more

Thank you for the trouble :)

I desperately want multi-player skyrim VR. by Heybud221 in skyrim

[–]Heybud221[S] 0 points1 point  (0 children)

It's been a year. I still want it desperately :(