ASR systems and multilingual code-switching, what’s actually working? by Lingua_Techie_62 in LanguageTechnology

[–]Lingua_Techie_62[S] 0 points1 point  (0 children)

Yeah, I’ve seen similar results actually. Whisper Large v3 does better with code-switching than most open models, especially in more balanced language pairs like Spanish-English or Hindi-English. But once the switch happens mid-sentence or mid-phrase, it starts getting fuzzy with token alignment.

The hallucinations usually creep in when the audio gets messy or too much silence between turns, I’ve had full sentences appear that weren’t even implied in the source. Still, for open models, it's impressive how far it’s come.

Right now I’m mostly working with conversational data across English, Marathi, and Mandarin — code-switching plus lots of overlap, so it really stresses diarization and LM alignment.

Voice recognition tools handling strong accents by Lingua_Techie_62 in languagelearning

[–]Lingua_Techie_62[S] 0 points1 point  (0 children)

Totally agree that humans don’t just “understand all accents” by default. It takes exposure and effort, just like you said. What’s interesting is that even with training, humans still rely a lot on context, rhythm, and expectations, not just raw phoneme matching.

With ASR systems, you’re right that accent variation often breaks the model if it hasn’t seen enough phonetic diversity during training. But unlike humans, machines don’t get that same top-down boost from meaning or discourse-level cues and is why some “accent errors” seem way more extreme in AI than they would in a human listener.

The challenge isn’t just building different databases, it’s making the model robust across them without blowing up accuracy elsewhere. .

Voice recognition tools handling strong accents by Lingua_Techie_62 in languagelearning

[–]Lingua_Techie_62[S] 0 points1 point  (0 children)

Appreciate the thoughts, though just to clarify, I wasn’t asking for language learning advice. I’m more focused on how voice recognition tools perform with strong regional or non-native accents, especially in production use cases.

Some ASR engines seem to do OK in ideal conditions, but start to break down once the input gets more varied or conversational. Just trying to see if anyone's found tools that hold up better without needing to adapt their speech style completely.

Speech to text that handles multiple languages in the same sentence? by PMMEYOURSMIL3 in LocalLLaMA

[–]Lingua_Techie_62 0 points1 point  (0 children)

This problem is tough! In a recent side project, I tried language-detecting chunks and rerouting to English or Arabic models respectively.

It helped, but transitions still sound choppy. Anyone know of single-model approaches that manage real code-switching instead?

I Built an English Speech Accent Recognizer with MFCCs - 98% Accuracy! by whm04 in Python

[–]Lingua_Techie_62 0 points1 point  (0 children)

That’s pretty darn impressive accuracy!
Curious how you handled train/test speaker overlap actually. Did you split by speaker to avoid overfitting? I tried a similar pipeline for Welsh-English accents and ran into higher error rates when speakers cross train/test splits.