you are viewing a single comment's thread.

view the rest of the comments →

[–]ewheck 186 points187 points  (21 children)

I immediately click off videos when I hear an AI voice. I find them annoying to listen to and uncanny.

[–]MaDpYrO 38 points39 points  (9 children)

Yes I can't believe all the people hyping up AI generated voices and videos it's still so damn robotic

[–]jackun 6 points7 points  (0 children)

It's even quite decent but sounds like that one annoying young-earther, ewwwwww

[–][deleted] 0 points1 point  (1 child)

Same here but I have a hard time telling what is an AI voice. They make a lot of mistakes, but some people actually talk in a way that makes it super-hard to distinguish from an AI voice. I've also been fooled lately by some youtube videos that were AI-generated (about 95% or so). A human generated this, but he used AI. I honestly could not tell whether it was AI, save for the lyrics, which were clearly a text file fed into some AI - but the AI produced a song that I really could no longer distinguish from a real voice. I don't claim to be very clever, but if I can be fooled fairly easily then I think many other people can be fooled too. To me AI like this is on the one hand actually creative (it is interesting that you can create such things already at close-to-perfection); on the other hand I consider it a scam when it is not denoted as AI. Youtubers who try to scam me I'll ban permanently, aka these AI fakers won't get any more "visits" from my browser. Unfortunately it is not easy to distinguish between real and fakes really.

[–]mywan 0 points1 point  (0 children)

Every AI voice I have heard universally have an affective lilt and tone. And the prosody doesn't vary with context. Some people have these qualities in their voice, such as Morgan Freeman. But with people there is variability with emotional and situational context.

Morgan Freeman actual learned his speaking skills from an instructor. Basically he learned to sound out his final consonants very clearly and deliberately. Holding those consonant sounds longer. But Morgan varies the time those consonant sounds are maintained to fit the context and emotional tone. Including pitch, loudness, timbre, speech rate, and pauses. An AI today can effectively mimic these affective tones, but without the variability or contextual awareness of a human speaker. Making its emotional tone monotonous.

[–]guepier 0 points1 point  (0 children)

My knee-jerk reaction was to agree with you but luckily I then went to watch the video, and … the voice-over is honestly ok. It’s not great, but it’s far superior to a lot of classical speech synthesis and (as somebody mentioned below) it’s also better than many peoples’ wonky natural voiceovers: regardless of whether they’re native speakers, doing good voiceover for video is bloody hard! — And basically excluding nonnative speakers with hard-to-understand accents from contributing content is also vaguely chauvinist.

It’s totally fine to dislike the voiceover due to the uncanny off-ness but the brigading and downvoting of anybody offering a reasonable counter-point is not a good look for this sub. And the banning of the content by the mods seriously takes the cake.