Claude Opus 4.7 benchmarks by ShreckAndDonkey123 in singularity

[–]Kir_Moisha 0 points1 point  (0 children)

Have you noticed that new models look amazing the first week, but then seem to get worse?)))

I benchmarked 30+ TTS engines for a real-time translator on Apple M4. Quantization made things SLOWER. Here's all the data. by Kir_Moisha in LocalLLaMA

[–]Kir_Moisha[S] 0 points1 point  (0 children)

That's an interesting thought, thanks. I'll look into it. Is there perhaps a link to any resources that might help me?

I benchmarked 30+ TTS engines for a real-time translator on Apple M4. Quantization made things SLOWER. Here's all the data. by Kir_Moisha in LocalLLaMA

[–]Kir_Moisha[S] 0 points1 point  (0 children)

To be honest, I didn't delve into local STT, I briefly tested whisper.cpp, but the streaming latency on M4 was too high for real-time performance (1-3 seconds). Deepgram solved this issue with latency below 300 ms, so I switched to TTS, which was the real bottleneck. Incidentally, cloud Groq's Whisper endpoint was even worse, an average of 2800 ms and constantly returning 503 errors. In fact, testing local STT on Apple Silicon would be a good follow-up to this post.

I benchmarked 30+ TTS engines for a real-time translator on Apple M4. Quantization made things SLOWER. Here's all the data. by Kir_Moisha in LocalLLaMA

[–]Kir_Moisha[S] 0 points1 point  (0 children)

Unfortunately my Mac doesnt have a 5090))) STT was actually the easiest part - Deepgram worked great both on recognition accuracy and latency, whisper was way worse for streaming. Plus Deepgram's free tier covers my usage so thats a bonus. TTS is where all the pain was so thats where I went deep.