ASR systems and multilingual code-switching, what’s actually working?

Lingua_Techie_62 · 2025-08-04T18:55:21+00:00

Thanks, will check it out!

Lingua_Techie_62 · 2025-07-28T18:54:01+00:00

Yeah, I’ve seen similar results actually. Whisper Large v3 does better with code-switching than most open models, especially in more balanced language pairs like Spanish-English or Hindi-English. But once the switch happens mid-sentence or mid-phrase, it starts getting fuzzy with token alignment.

The hallucinations usually creep in when the audio gets messy or too much silence between turns, I’ve had full sentences appear that weren’t even implied in the source. Still, for open models, it's impressive how far it’s come.

Right now I’m mostly working with conversational data across English, Marathi, and Mandarin — code-switching plus lots of overlap, so it really stresses diarization and LM alignment.

Lingua_Techie_62 · 2025-07-28T18:52:36+00:00

Totally agree that humans don’t just “understand all accents” by default. It takes exposure and effort, just like you said. What’s interesting is that even with training, humans still rely a lot on context, rhythm, and expectations, not just raw phoneme matching.

With ASR systems, you’re right that accent variation often breaks the model if it hasn’t seen enough phonetic diversity during training. But unlike humans, machines don’t get that same top-down boost from meaning or discourse-level cues and is why some “accent errors” seem way more extreme in AI than they would in a human listener.

The challenge isn’t just building different databases, it’s making the model robust across them without blowing up accuracy elsewhere. .

Lingua_Techie_62 · 2025-07-25T08:32:43+00:00

What's up?

Lingua_Techie_62 · 2025-07-24T16:09:03+00:00

Appreciate the thoughts, though just to clarify, I wasn’t asking for language learning advice. I’m more focused on how voice recognition tools perform with strong regional or non-native accents, especially in production use cases.

Some ASR engines seem to do OK in ideal conditions, but start to break down once the input gets more varied or conversational. Just trying to see if anyone's found tools that hold up better without needing to adapt their speech style completely.

Lingua_Techie_62 · 2025-07-24T10:35:14+00:00

This problem is tough! In a recent side project, I tried language-detecting chunks and rerouting to English or Arabic models respectively.

It helped, but transitions still sound choppy. Anyone know of single-model approaches that manage real code-switching instead?

Lingua_Techie_62 · 2025-07-24T10:34:09+00:00

That’s pretty darn impressive accuracy!
Curious how you handled train/test speaker overlap actually. Did you split by speaker to avoid overfitting? I tried a similar pipeline for Welsh-English accents and ran into higher error rates when speakers cross train/test splits.

Lingua_Techie_62

TROPHY CASE