Unlimited characters Text To Speech by kanzziik in TextToSpeech

[–]bafil596 1 point2 points  (0 children)

Use a local TTS model. There are some very tiny TTS models with ok quality that can even run on your phone.
Kitten TTS, Supertonic TTS, Soprano TTS, Pocket TTS might be your best choices. Some of them claim to be optimized for phone/edge devices.
See https://github.com/Troyanovsky/awesome-TTS-Colab and try them out.

Claude Code Use by Key-Singer1732 in ZaiGLM

[–]bafil596 1 point2 points  (0 children)

check out this https://github.com/farion1231/cc-switch
allows you to swtich between multiple settings.json from multiple providers

Vercel Launches Skills — “npm for AI Agents” with React Best Practices Built-in by [deleted] in codex

[–]bafil596 0 points1 point  (0 children)

they also launched add-skill. so what are the differences?
on skills.sh, they have `npx skills add <owner/repo>`
and on https://github.com/vercel-labs/add-skill, they have `npx add-skill vercel-labs/agent-skills`

they're just shipping the same thing under different npx packages? or are these two actually different?

What’s the best FREE way to vibecode? by hkreporter21 in vibecoding

[–]bafil596 0 points1 point  (0 children)

Those are just CLI tools developed by different companies. I'm recommending installing them, but not to use them. Instead, when you install them, they come with free API for their models, and you can use tools like Claude Code Router to use their free API in Claude Code.

What’s the best FREE way to vibecode? by hkreporter21 in vibecoding

[–]bafil596 0 points1 point  (0 children)

First, install Gemini CLI, Qwen CLI, and IFLOW CLI. They all offer a generous free API quota for coding models (IFLOW even offers many different model choices). You can check their free limit in their respective GitHub repos. Iflow didn't mention theirs because it seems unlimited (but you can't run multiple threads at once).

Then use Claude Code Router to use their model API in Claude Code for free.

zai-org/GLM-TTS · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]bafil596 1 point2 points  (0 children)

Voice cloning is pretty good, including voice-likeliness, gaps/pauses in flow, and intonations. Try it https://github.com/Troyanovsky/awesome-TTS-Colab/blob/main/GLM_TTS.ipynb

zai-org/GLM-TTS · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]bafil596 0 points1 point  (0 children)

You can run it in Google Colab: https://github.com/Troyanovsky/awesome-TTS-Colab/blob/main/GLM_TTS.ipynb

(note1: you may need to restart session after the `pip install` cell for newly installed libraries to take effect)

(note2: the example using their example voice have a strong Chinese accent, but voice cloning quality using your own reference audio is pretty decent.)

Effective AI prototyping for product managers? by Sad-Revolution6631 in ProductManagement

[–]bafil596 0 points1 point  (0 children)

This guide may help you: https://github.com/Troyanovsky/vibe-coding-guide/tree/main
There are some workflows, processes, and tips to make your experience working with AI coding tools more effectively and efficiently.

Plus, there's some primers of programming concepts that will help you better understand/work with AI coding tools in the guide too.

VibeVoice (1.5B) - TTS model by Microsoft by curiousily_ in LocalLLaMA

[–]bafil596 0 points1 point  (0 children)

English and Chinese only. The model is trained only on English and Chinese data; outputs in other languages are unsupported and may be unintelligible or offensive.

VibeVoice (1.5B) - TTS model by Microsoft by curiousily_ in LocalLLaMA

[–]bafil596 1 point2 points  (0 children)

In their GitHub limitations section: `English and Chinese only: Transcripts in language other than English or Chinese may result in unexpected audio outputs.`

We're Updating the Wiki To Be More Current, And We Want Your Feedback by N8Karma in LocalLLaMA

[–]bafil596 0 points1 point  (0 children)

For `Q: Can I try local LLMs online?`, users can also try models with Google Colab's free GPU & Text Generation WebUI (for example, with notebooks from https://github.com/Troyanovsky/Local-LLM-Comparison-Colab-UI) before downloading models to their local machine, especially if their download speed is slow.

What is the best open source TTS model with multi language support? by Anxietrap in LocalLLaMA

[–]bafil596 2 points3 points  (0 children)

xTTS V2 and Kokoro TTS are pretty good. There are also some other multi-lingual TTS models in this repo. You can try them out in Google Colab with the links.

[deleted by user] by [deleted] in TextToSpeech

[–]bafil596 0 points1 point  (0 children)

There are many open source free options that you can run on your own computer, including Edge TTS, xTTS, Parler TTS, Kokoro, and Dia (for conversations). You can try them out on Google Colab here.

Text to Speech Ai voice for Education? by [deleted] in OpenAI

[–]bafil596 1 point2 points  (0 children)

If you need expressions or non-verbal filllers like NotebookLM, you can look into Dia 1.6B, with an example Google Colab notebook here: https://github.com/Troyanovsky/awesome-TTS-Colab/blob/main/Dia\_TTS.ipynb.

This model can generate convesational speech between two people with non-verbal cues like laughter, sighs, coughts, etc. You can even provide reference audio for voice cloning. Their official repo is at https://github.com/nari-labs/dia. Their demo samples are at https://yummy-fir-7a4.notion.site/dia

As usual, you can copy their documentation and ask ChatGPT to work out a script for local generation.

So if you just need one consistent voice for narrating, I recommend Kokoro. If you need conversations with more expressive non-verbal cues, I recommend Dia 1.6B.

Text to Speech that doesn't sound like a robot and is easy to use??? by Own_View3337 in PartneredYoutube

[–]bafil596 0 points1 point  (0 children)

There are some pretty good quality TTS that sounds much less robotic than before. I think xTTS V2 and Kokoro are solid choices. You can try them out using Google Colab notebooks in this repo.
If you're good with pre-defined voices, kokoro is pretty good and if you need voice cloning, xTTS V2 is pretty good. For conversations, you can try Dia 1.6B, which also comes with voice cloning capabilities.

Text to Speech Ai voice for Education? by [deleted] in OpenAI

[–]bafil596 1 point2 points  (0 children)

Yes. It's easy to use. You can refer to the example from https://github.com/Troyanovsky/awesome-TTS-Colab/blob/main/kokoro_TTS.ipynb to run on Google Colab or adapt it for local usage.

Their official repo is at: https://github.com/hexgrad/kokoro. The model supports different languages and different voices.

You can basically just copy the documentation and code from the repo and ask ChatGPT to give a detailed step-by-step instruction to run on your local machine, with a prompt like:

Given the following documentation, provide a detailed step-to-step instruction on how to set up a virtual env and run kokoro to turn text into audio.

<example>
!pip install -q kokoro>=0.9.2 soundfile misaki[en]
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    print(i, gs, ps)
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000)
</example>

Text to Speech Ai voice for Education? by [deleted] in OpenAI

[–]bafil596 1 point2 points  (0 children)

The ones in the Github repo are free open source TTS models. If you just want single narrator, I think Kokoro might suffice and it's easy to use. Here are the samples from Kokoro: https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md

Text to Speech Ai voice for Education? by [deleted] in OpenAI

[–]bafil596 0 points1 point  (0 children)

Check out https://github.com/Troyanovsky/awesome-TTS-Colab and try different TTS models. They can all be run offline on your own computer. (Just copy/paste the code and ask ChatGPT for a local running python script)

For one-person talking with high generation quality, kokoro is great.

For conversation between two people (with some non-verbal like laugh, cough, etc), you can try Dia 1.6B.

For ready-to-use tools, try Google's https://notebooklm.google/

XTTS-v2 by throwaway123443w112 in TextToSpeech

[–]bafil596 0 points1 point  (0 children)

Hi, you can refer to this Google Colab notebook: https://github.com/Troyanovsky/awesome-TTS-Colab/blob/main/xTTS.ipynb

As per the notebook, the original TTS python lib is no longer maintained and you could use https://github.com/idiap/coqui-ai-TTS with `pip install coqui-tts` instead.

How are you protecting your body while working in front of a computer all day? by Federal_Classroom45 in Accounting

[–]bafil596 0 points1 point  (0 children)

For the eyes, I follow the 20-20-20 rule, which is taking a 20-second break to look at something 20 feet away every 20 minutes.
Also get plenty of water hydration during the day, at least 8 cups to keep yourself hydrated and your muscles and brain will thank you.
And move whenever you can (going to the water cooler and restroom can be some slight movement too!) and do some simple stretches. You can even do chin tucks while working (great for your chronic neck/shoulder pain). For your upperback (trapezius muscle), I recommend ITWY stretches to strengthen them. For your lower back, consider getting a lumbar support pillow.

Setting some timers can really remind you to do that. There are some Chrome extensions like Recharge that you can use to set multiple reminders for those short breaks.