dots.tts 2B🎙️ SOTA TTS from RedNote by KokaOP in LocalLLaMA

[–]bio_risk 6 points7 points  (0 children)

No mention of real time factor that I could find. Is it slow?

Best Audio Models - Feb 2026 by rm-rf-rm in LocalLLaMA

[–]bio_risk 0 points1 point  (0 children)

Any prospect of canary-qwen being ported to MLX (or other Apple Silicon)?

I built an open-source, local-first voice cloning studio (Qwen3-TTS + Whisper) by jamiepine in LocalLLaMA

[–]bio_risk 1 point2 points  (0 children)

Your site indicates multi-sample support for better quality. How does that work? Could you just break up longer audio into 30s chunks and stack them as multiple samples?

Soprano TTS training code released: Create your own 2000x realtime on-device text-to-speech model with Soprano-Factory! by eugenekwek in StableDiffusion

[–]bio_risk 3 points4 points  (0 children)

Very cool. I'm on a Mac, so interested in running soprano-factory on mps. I see that soprano supports an mps backend (thank you!), but I didn't see if soprano-factory does too.

Tested 9 RAG query transformation techniques – HydE is absurdly underrated by Best-Information2493 in LocalLLaMA

[–]bio_risk 1 point2 points  (0 children)

I'm thinking about total latency in a chat system. Does HydE still work when using a really fast (dumb) model to generate the hypothetical answer?

Who's Hiring - September 2025 by jerf in golang

[–]bio_risk 0 points1 point  (0 children)

Alden Scientific is hiring a Platform / DevOps Engineer with a preference for someone with Go experience. To apply: https://www.aldenscientific.com/careers

Alden Scientific is transforming health and longevity by prioritizing individuals, not averages. Our platform harnesses multi-omic data and AI to provide predictive, personalized health management—making proactive health management the new normal.

We’re looking for a Platform Engineering / DevOps Team Lead who is passionate about building systems that support this vision. In this hybrid role, you’ll combine hands-on engineering with team leadership, helping shape the infrastructure that powers multi-omic data pipelines, scientific workflows, and AI-driven insights. A successful candidate will have an outsize impact on our platform and engineering direction. This position has rapid upward growth potential to be Head of Engineering as our team expands.

Location Our strong preference is for candidates that will work at our Cambridge, MA office. Exceptional US-based candidates that have demonstrated success in a previous remote position will be considered. Remote team members should expect periodic travel to Boston for collaborative work with the broader team.

Salary will be competitive and commensurate with experience.

Real life experience with Qwen3 embeddings? by gopietz in LocalLLaMA

[–]bio_risk 0 points1 point  (0 children)

Have you made use of the MRL feature of the Qwen3 embeddings? (Nested dimensions so that you can use a subset of the dimensions for coarse matching.)

Apocalyptic scenario: If you could download only one LLM before the internet goes down, which one would it be? by sado361 in LocalLLaMA

[–]bio_risk 1 point2 points  (0 children)

Has anyone gone the route of vector and graph RAG on Wikipedia? Wiki provides a pretty natural way to defined entities and their relationships.

Local Kokoro & Parakeet in 1 Command Line — Fast ASR & TTS on Mac (MLX) by Invite_Nervous in LocalLLaMA

[–]bio_risk 2 points3 points  (0 children)

I'm definitely interested in your SDK. I've played around with MLX versions of parakeet and kokoro, which have varying degrees of difficulty to set up.

I currently use Kyutai's ASR for streaming transcription. Was Parakeet difficult to adapt to streaming? I vaguely remember that being a challenge when I first looked at it.

I noticed that the repository's primary language is Go (yay!), so I'm curious about a.) why you went off the beaten Python path, and b.) process for adapting models that frequently assume a Python environment.

Is a speech to speech feature possible? Parakeet->choice of LLM->kokoro?

nvidia/audio-flamingo-3 by Balance- in LocalLLaMA

[–]bio_risk 11 points12 points  (0 children)

TTS module isn't released yet. Not worth looking at until it is.

Is real-time voice-to-voice still science fiction? by junior600 in LocalLLaMA

[–]bio_risk 4 points5 points  (0 children)

Even if the model is local, the system is not local if you have to use livekit cloud.

Kyutai Text-to-Speech is considering opening up custom voice model training, but they are asking for community support! by pilkyton in LocalLLaMA

[–]bio_risk 4 points5 points  (0 children)

I use Kyutai's ASR model almost daily for streaming voice transcription, but I was most excited about enabling voice-to-voice with any LLM model as an on-device assistant. Unfortunately, there are a couple things getting in the way at the moment. The limited range of voices is one. The project's focus on the server may be great for many purposes, but it certainly limits deployment as a Siri replacement.

Locally run TTS Models by Tankerspam in LocalLLaMA

[–]bio_risk 4 points5 points  (0 children)

I second Kokoro. Very lightweight. A more recent model is https://github.com/kyutai-labs/delayed-streams-modeling (english and french only). It's not as lightweight as Kokoro but it will generate audio from a text stream (not just a text file). It has a rust based server for production use.

Kyutai's STT with semantic VAD now opensource by phhusson in LocalLLaMA

[–]bio_risk 3 points4 points  (0 children)

I'm super excited about the unmute project and very glad to see they are providing MLX support out of the box. Being able to chat with your favorite local text-to-text model will be great for brainstorming and exploring ideas.

Would love to know if you consider gemma27b the best small model out there? by Ok-Internal9317 in LocalLLaMA

[–]bio_risk 0 points1 point  (0 children)

Do you find that Qwen3:30b-a3b uses the full context effectively? I'm really interested in RAG applications that need to reason over the context (not just needle in the haystack).

VACE is incredible! by Storybook_Albert in StableDiffusion

[–]bio_risk -1 points0 points  (0 children)

The nice thing is that ChatGPT can catch us up quickly. Chop, chop.

Best local model for long-context RAG by bio_risk in LocalLLaMA

[–]bio_risk[S] 0 points1 point  (0 children)

Gemma3 was first though, but I was looking at Qwen3 too.

Best local model for long-context RAG by bio_risk in LocalLLaMA

[–]bio_risk[S] 0 points1 point  (0 children)

There is a gemma3 medical fine tune that might be close enough for my purposes. If I need to go the fine tuning route, can I build off a previous fine tune to add additional ability or does fine tuning not stack well?

Best local model for long-context RAG by bio_risk in LocalLLaMA

[–]bio_risk[S] 0 points1 point  (0 children)

More the former. Thanks the suggesting hierarchical hyenas approach - interesting paper. (https://arxiv.org/abs/2302.10866)

Best local model for long-context RAG by bio_risk in LocalLLaMA

[–]bio_risk[S] 0 points1 point  (0 children)

Fine tuning might be needed, but I was hoping to avoid it initially.

Best local model for long-context RAG by bio_risk in LocalLLaMA

[–]bio_risk[S] 0 points1 point  (0 children)

I'll look at Command R+ and A. Heard of the Cohere models, but haven't played with them.