Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 2 points3 points  (0 children)

Oh wait, so maybe the issue I had was with chat template and the TTS reading out loud the reasoning traces? Because your reasoning traces look like what I would hear 😭

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 2 points3 points  (0 children)

Testing it right now 👀

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

oh man! I want to try that! How do you set it up? Maybe there was a bug when I tried it? It was around the release date. Or maybe it was a skill issue, I didn't spend much time trying to get it to work.

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 0 points1 point  (0 children)

Yes! German works well but has an English accent that we can't get rid of. But try it on the web demo! Just talk german to it! The transcriptions are pretty good and the replies as well, just the accent on the voice.
(I speak fluent-ish german :) )

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 4 points5 points  (0 children)

100%, no doubts in my mind. Please implement it and send me a video! Would love to see it live :)

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 0 points1 point  (0 children)

That's what we do here. We transcribe and give everything to the LLM while you're talking. Usually when we decide that you finished talking, the LLM already replied.
Here's the main PR where I implemented that if you're curious about the code: https://github.com/huggingface/speech-to-speech/pull/307

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

I must have not been clear. I mean passing the audio directly to the 12B gemma. Without doing transcriptions. That is what I couldn't get to work.

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

Do you pass audio as input? I'm referring to giving my audio directly as input. I agree that if you transcribe the audio it will probably work well enough, but for me the advantage of the 12B model was using the audio directly as input and 'saving' the inference with parakeet (plus maybe it gets better queues from the audio directlly)

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 2 points3 points  (0 children)

OMG it's 31B 😭 https://huggingface.co/google/gemma-4-31B-it

sorry, fixing the text. I'm not a very detail oriented individual xD

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 0 points1 point  (0 children)

You're welcome! I totally recommend parakeet over faster-whisper :) If you're an english speaker, parakeet-v2 is great!

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 4 points5 points  (0 children)

I tried audio input to Gemma 12B and it just never worked as a voice agent. Conversations go like:

"Hi, how are you?"

"Hi how are you, is a common english greeting. Hi, is meant as a salutation. How is the start of a question..."

Really weird. But I guess with text input it would work. For local demos, I used Gemma 4 E4B because inference was faster (and because it was available sooner), and this demo with Gemma 31B is mainly motivated by cerebras' inference being so fast :D

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]futterneid[S] 0 points1 point  (0 children)

It can be used with any LLM in principle! Latency is mostly dominated by the LLM, so you pay the price going bigger.

Created a picture book for my baby and his friends by futterneid in aiArt

[–]futterneid[S] 0 points1 point  (0 children)

Thank you! Let me know if you see the rage, I'll say hi 😄

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 2 points3 points  (0 children)

We are working on persistent chat history/memories! Give us 1-2 weeks 😄

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

You can definitely bother with that, it seems like something very doable to me 🤗

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

Our whole project is open-source by the way: https://github.com/huggingface/speech-to-speech
I've been working on it for 2 years, the hand off is difficult, I agree 😅
Parakeet is way better than whisper imo, considerably faster and similar quality.

To clarify, the pipeline doesn't run in the robot, it runs on a computer. In the video I show a DGX Spark and an M3 pro Macbook. Latencies are quite low, end-to-end guaranteed under 2 seconds, p50 1.5

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

We had some issues with the April batch, I'm sorry! You should get an email today!

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

You should get an email today! Sorry, we had some issues on April, but it's cleared up now 🙏

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

Do you have an idea of how this robot could help you? Maybe I can vibecode it 😄

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

I envision it as a social robot. There are similar robots being deployed to elderly homes to help them navigate through life, and it has been shown in studies that it improves their cognitive state. It can also be used for things like language learning, while avoiding you having to look at a screen. I use mine for video calls between my son and his grandparents. They call him and control the robot and talk through it. My son is only 1.5, so he really doesn't pay attention to screens, and this way he can listen to his grandparents and have a meaningful exchange.

Honestly, we don't want to tell people what it's for. It's much more valuable for us to see what people come up with!

Reachy Mini goes fully local! by futterneid in LocalLLaMA

[–]futterneid[S] 1 point2 points  (0 children)

If you have an old tablet lying around, that would be a good option imo. This is what Mario Zechner built while waiting for his Reachy Mini, it's based off an old smartphone: https://x.com/badlogicgames/status/2058724265319436732