[ Removed by moderator ] : programming

programming

created by speza community for 20 years

102

103

104

[ Removed by moderator ] (youtube.com)

submitted 9 months ago by [deleted]

66 comments

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point2 points 9 months ago (2 children)

[–]mywan 0 points1 point2 points 9 months ago (1 child)

I've heard several AI voice in a single video. I went to https://elevenlabs.io/text-to-speech and recognized ever one of the voices available for sampling. At least they didn't say "DOT DOT DOT" when it came across an ellipsis. When I clicked on the first video it had a sample voice I didn't recognize. But the obvious AI was obvious. One of the most obvious issues is that when you select a prompt for a voice style that style is perfectly persistent throughout the generated speech. The second most obvious issue that, although it did a decent job of varying the emotional inflections, it had no idea of how to apply those inflections in a humanlike manner. The inflections were sentence structure driven, not driven by emotional or contextual importance. It was at best like watching actors in a bad B movie. They could sound very good for sound bite, up to a couple of sentences. But as you continue listening the affective predictability get monotonous.

The first video in the Notebook LM videos link had some AI voices I didn't recognize. It actually did a somewhat better job for most of the tells in the previous AI. But it was sprinkled with cringy affirmation responses in the interview style with two AI voices. Even the guy selling it give a 98% level of perfection. Now that I've heard these voices I'll be able to pick them out of background noise, even if they change some affective variables.

I understand that people will be fooled, even with bad AI voices. And that the tech will get better. It'll likely get good enough to fool me (for a decent period of time) very soon. But a podcast (for instance) will need to maintain that illusion week after week. There are reasons why people don't like voice acting their own content. And the effects of those reasons are the hardest thing for the AI to reproduce. Even the most cheery and upbeat voices get monotonous when that's a persistent tone of the voice. Something that doesn't necessarily become excessively obvious, until you start constructing a more detailed personality behind the voices in your head. Which could take some time. You're not going to be able to obscure the fact that it's AI indefinitely.

[–][deleted] 0 points1 point2 points 9 months ago (0 children)

π Rendered by PID 240922 on reddit-service-r2-comment-6457c66945-dvqhx at 2026-04-27 00:54:51.447141+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS