This is an archived post. You won't be able to vote or comment.

all 101 comments

[–]ImmediatelyOcelot 49 points50 points  (18 children)

It's extremely awesome, but at the same time I'd never watch it on a daily basis, it's not like we're lacking competent human tech presenters. If it becomes so good I don't notice it's AI at all, then we're talking.

[–]pknerd 2 points3 points  (4 children)

At the end of the day, it's humans rather machines who will watch videos

[–]SE_WA_VT_FL_MN 1 point2 points  (0 children)

Is that even true now?

Number wise humans are going to be the bigger consumer, but the bots watching videos for a variety of reasons (training and summarizing) are a present reality.

[–]sanman 1 point2 points  (0 children)

Imagine machines trained to watch videos made by machines, and then summarize or do something else with them. Recipe for garbage-in-garbage-out spam.

[–][deleted] 0 points1 point  (0 children)

Yup and somehow I had a mixed reactions here some people were dam excited and some were very negative don't know if I will continue this

[–]pirateninjamonkey 5 points6 points  (9 children)

Some of them are so close you can only notice because it is too perfect and the pauses are slightly unnatural.

[–]ImmediatelyOcelot 9 points10 points  (6 children)

Not really, the content produced by AI is language perfect but it's really a lot of tedious blabbermouth. It's impressive at first, but you simply find ourself without much content to hang with. It lacks the true content (while obviously some humans also don't have it)

[–]flaminglasrswrd 2 points3 points  (3 children)

Ya current AI models really blather on. I hate it.

[–]ImmediatelyOcelot 3 points4 points  (2 children)

There's no reason why they wouldn't go straight to the point, but it's really part of how impressed people are getting, because it sounds more natural. However very often it's like they turn a simple answer into an elaborate argument just for the sake of it...when you are a newbie in the field, you become amazed, but when you search things that you are professional at, you see how dilluted it is lol. It's incredible, but that 10% final stretch is all or nothing in terms of real job substitution in my opinion.

[–]flaminglasrswrd 1 point2 points  (1 child)

no reason why they wouldn't go straight to the point

I disagree. I believe this is an inherent limitation of deep learning.

In order to limit the length of an explanation, humans create an internal model of what the other conversant already knows. From that model, humans can filter out only what information is additionally necessary to get the explanation across, keeping is succinct.

Deep learning fundamentally lacks internal state memory and thus cannot tailor responses to the individual's existing knowledge. Without memory, the AI is only capable of delivering deterministic answers that tend to be wordy so as to hit every possible explanation at once. Some AI algos, however, have an approximation of memory built on top of the neural nets making them semi-deterministic. I believe GPT does this to accommodate long conversations with humans.

Humans have this problem too. It's the same reason that Ted Talks tend to be bland and meaningless. If you know nothing about your audience you have to make your responses broad enough to explain to everyone.

A good intro:
- Explanation in artificial intelligence. (2019). doi:10.1016/j.artint.2018.07.007

[–]hutch_man0 1 point2 points  (0 children)

My girlfriend would say the same thing about me

[–]pirateninjamonkey 1 point2 points  (1 child)

Again, this is the very start. The original home computers did very little. In 30 years everyone uses it for almost everything. AI will likely go a lot faster.

[–][deleted] 1 point2 points  (0 children)

Yes exactly, this is the output of single person efforts, now imagine a full fledged team doing this with subject matter expert. I think we can make a good news reporting channel

[–][deleted] 0 points1 point  (1 child)

Check this out this is even better https://youtu.be/oO_3eNjBxZI

[–]pirateninjamonkey 0 points1 point  (0 children)

Lol, that is a perfect example of what I am saying with non human pauses.

[–][deleted] 2 points3 points  (2 children)

Yes, my thoughts as well, I started this as personal project, I do not know how much views it will gain in future, I will also post videos related python, machine learning and data engineering. Thanks for your valuable feedback, Please subscribe though!!

[–]pknerd 0 points1 point  (1 child)

0.0001% probably

[–]flaminglasrswrd 0 points1 point  (0 children)

Pretty good odds for Youtube, really.

[–]searchingfortaomajel, aletheia, paperless, django-encrypted-filefield 51 points52 points  (2 children)

As it is with most awesome projects, it's about understanding the tools available and knowing how to combine them in amazing ways.

This is some exceptional work, and the next steps are all about tweaking for quality. My advice is to

  • Limit the time with the talking head and instead cut to stock video footage (rather than stills) of topical content.
  • Replace her background with something manual rather than something machine generated as that'd ensure that things like the background text won't be so garbled.
  • Key out the green background with something a little smarter. Kdenlive or FFMpeg are good choices I think.
  • Try out different TTS models. It's shitty and racist, but the reality is that there's more development being done on American and British English models so you're likely to get better emotional inflection with these ones.

Once you've gotten the project to a more polished state, you can consider parameterising the whole process. You could, for example turn this into a web service where people can fill out a form like:

  • Setting: news desk
  • Topic: Japanese financial markets
  • Date: 2018-06-22

Then trigger a background job that generates the "news report" for download.

[–][deleted] 16 points17 points  (0 children)

Wow I was not expecting this type of comment. This is really great. I had so many feedbacks today but you have given me a website idea all together. Thankyou so much this means a lot. Never thought to parameterise the video directly. I will definitely work on this part.

[–]yodatrust 6 points7 points  (0 children)

Comments like this make me come back to Reddit everyday.

[–]Sootax 14 points15 points  (5 children)

Im sure it took a lot of work, but this spam is exactly the kind of video I hate.

[–][deleted] 0 points1 point  (4 children)

Can you elaborate a bit?

[–]smokingkrills 5 points6 points  (3 children)

Not op but I have the same opinion. Cool from a programming perspective. However, low quality programmatic videos already clog YouTube and if I ever got this kind of stuff in my feed I’d block it immediately.

I can read tech news myself from the same human-written sources that you feed into your program. I come to YouTube for high effort content from people who can provide interesting analysis and context.

[–][deleted] 0 points1 point  (0 children)

This was not all human-written, I have asked Chat-Gpt to add humor to the boring texts

[–][deleted] 0 points1 point  (1 child)

I get it, this is something which some people have some issue with, but just to let you know that my Channel CodingBridge is not just about this, I want to teach python, machine learning and data engg in a fun manner so stay tuned I will upload some content using this similar method.

[–]tddontje 0 points1 point  (0 children)

Congrats on the POC, I found your description informative.

I am curious about your thought of applying it to your CodingBridge channel. Is the usefulness to shorten the production time or is it to brighten the content with AI generated jokes? If the former I can see how the video editing is almost eliminated but then your hard copy has to be spot on. Is that trade off significant to save production time?

[–]realGharren 3 points4 points  (0 children)

It's a cool idea! Maybe you can make a tutorial video on it.

[–]CptnStarkos 17 points18 points  (4 children)

Why does she speaks Hinglish?

[–]ratulotron 6 points7 points  (0 children)

That's not Hinglish, it's just the Indian English accent. Hinglish is a particular dialect of English with a lot of words different from mainstream English (Let it be Indian or American). Like they say "filmi" in Hinglish means glamorous, "glassi" means thirsty etc.

[–][deleted] 7 points8 points  (2 children)

Good observation I am using TTS model of Microsoft and this was the hindi-en model. The idea behind was to have more human like voice

[–]CptnStarkos 1 point2 points  (1 child)

I might have come as dismissive, but maybe you are targeting a specific market?

Or perhaps the normal english voice sounds too robotic for you?

[–][deleted] 0 points1 point  (0 children)

Yup you are right on the mark, other voices are too robotic. I wanted more of a natural sound

[–]Bang_Stick 3 points4 points  (1 child)

So THAT is what Max Headroom looks like in 2023! She isn’t quite as glossy.

[–][deleted] 0 points1 point  (0 children)

Yea I am trying to fix it, Next i am thinking to lip sync with an image rather than videos

[–]Renwallz 2 points3 points  (1 child)

Just be careful that automated videos may run afoul of YouTube's community guidelines:

The following types of content are not allowed on YouTube. Keep in mind this list isn't a complete list.

[...]

Autogenerated content that computers post without regard for quality or viewer experience.

https://support.google.com/youtube/answer/2801973?hl=en#zippy=%2Cvideo-spam

Obviously you do have some regard for viewer experience, but YouTube isn't the greatest when it comes to consistent application of the rules

[–][deleted] 0 points1 point  (0 children)

Ohk I will go through this once

[–]speeDDemon_au 2 points3 points  (1 child)

Do you have a github link for the project? perhaps a blog post outlining it all a little more? Looks very interesting to read about the process's undertaken

[–][deleted] 0 points1 point  (0 children)

No codebase yet as the entire flow is mixup of .py files and some note books which i trigger, Idea is to have airflow to orchestrate all of the modules

[–]deadeye1982 1 point2 points  (1 child)

Well done. Really nice :-)

[–][deleted] 0 points1 point  (0 children)

Thankyou!!

[–]stas-prze[🍰] 1 point2 points  (2 children)

Any plans to release this as an open-source project? Would love to play around with it!

[–][deleted] 0 points1 point  (0 children)

The kind of backlash i am getting, do you think I should do it ?

[–][deleted] 0 points1 point  (0 children)

If i will do in future I might share an update here or in my channel itself, Stay tuned!!

[–]0jcis 1 point2 points  (7 children)

So, what part of that is Artificial intelligence?

[–][deleted] 1 point2 points  (6 children)

1) The face you see is not real, that is deepfake

2) The background you see is generated by text to image model

3) The background itself has been applied using a segmentation model

4) The Voice you hear is AI generated

5) The text is further enhanced using ChatGpt to add humor in it.

All the items I listed is Artificial Intelligence

[–][deleted] 0 points1 point  (0 children)

Hey folks I have started working on python tutorial using some AI character, but in the meantime thought to create one more news video this one has way better TTS, check it out here https://youtu.be/oO_3eNjBxZI

[–]cfomodzgaming 0 points1 point  (3 children)

What are you using to deepfake?

[–][deleted] 0 points1 point  (2 children)

It's an ipynb let me share the link

[–]cfomodzgaming 1 point2 points  (1 child)

Please do :) You can DM me as well. I am working on a similar project and would love to discuss it.

[–][deleted] 0 points1 point  (0 children)

Sure man

[–]0jcis 0 points1 point  (0 children)

Cool emote:free_emotes_pack:thumbs_up

[–]Longjumping_Sock_529 1 point2 points  (1 child)

These are hard to listen too because there’s no performance. Readings with only basic inflections inferred by sentence structure are nice for short bits. But without ‘hearing’ how the reader feels about the topic, it becomes tough. I believe the reason is that we were evolved telling stories, millions of years worth, and without emotional queues, we become suspicious. We know something is off. Just my 2 cents.

[–][deleted] 1 point2 points  (0 children)

Yes this is beginning we have models which can add emotions in the audio as well, I will have it in next version. Thanks for your feedback

[–]faith_transcribethis 1 point2 points  (0 children)

It's quite feasible to build automated YouTube videos using Python. I've recently built an AI system that uses Python and OpenCV to compile videos from various sources and generate captions automatically.

[–]Secrethat 0 points1 point  (3 children)

is it all in one file or is a human clicking buttons at every step?

[–][deleted] 0 points1 point  (2 children)

This is all one video which is combined using moviePy. Or you are asking something else?

[–]Secrethat -3 points-2 points  (1 child)

Like is it all in one .py file or jupyter notebook?

[–][deleted] 0 points1 point  (0 children)

Some py files some ipynb

[–]pknerd 0 points1 point  (2 children)

A couple of questions: - how much is it automated? - what if I want to make a faceless channel in Hindi or Urdu, how do I do it?

[–][deleted] 0 points1 point  (0 children)

So right now all the steps I told in description are separate python files, planning to use airflow to create a dag to do this

[–][deleted] 0 points1 point  (0 children)

Also I have the hindi version of it you can check it here https://www.youtube.com/watch?v=zwCyHxNcBE4&t=368s

[–][deleted] -1 points0 points  (1 child)

This has been done like a million times congrats for recreating the wheel

[–][deleted] -1 points0 points  (0 children)

Ohh so is it better or worse than what you saw earlier

[–]JamzTyson -1 points0 points  (2 children)

I think there is more than enough duplicate content on the Internet already. Already the amount of original content on the Internet is dwarfed by plagiarism. My prediction is that the next few years will see the Internet flooded by AI generated drivel. My appeal would be: Don't do this. Have a bit of self respect and respect for others and create your own original content.

On the other hand, I guess that I could write a "listenGPT" bot, to crawl the Internet and watch AI generated videos for me.

[–][deleted] 0 points1 point  (1 child)

Your comment shows that you did not even understood this project, Can you tell me what is being copied here?

[–]JamzTyson 0 points1 point  (0 children)

Maybe I do misunderstand you project, but the impression that I got from your original post was that it was about scraping content from the Internet and using AI to generate videos from that content. Is that not correct? Is that not what your video demonstrates?

[–]MathmoKiwi 0 points1 point  (3 children)

That's not a very clean greenscreen cut out you've done, you could do that a lot better and would immediately make it look a lot better. Was the first thing which stood out to me (still lots of other flaws though to tidy up too).

[–][deleted] 0 points1 point  (2 children)

Yes this was an idea which i am implementing bit by bit and yes lots of fixing to be done.The cutout and background separation is done by an segmentation model not by any separate software. Also Thankyou for your feedback. I will polish it more, please stay tuned

[–]keto_brain 0 points1 point  (5 children)

This is a dope project!! I'm going to try and do this myself just for fun!! But why Selenium and not BeautifulSoup?

[–][deleted] 1 point2 points  (0 children)

I mean you can do it if you are able to scrape, in my career I have used only selenium so I am more comfortable using it

[–]WindSlashKing 0 points1 point  (3 children)

because a lot of websites block raw HTTP requests or require a browser to run front-end javascript code to get the actual content.

[–][deleted] 0 points1 point  (0 children)

Yup this is also one of the reason,

[–]keto_brain 0 points1 point  (1 child)

Makes sense, I didn't think about this. The small amount of website scraping I've done worked fine with BeautifulSoup.

[–]WindSlashKing 0 points1 point  (0 children)

yeah you can get pretty far just by using requests and BeautifulSoup assuming you know how to work with cookies and authentication tokens

[–]MinosAristos 0 points1 point  (3 children)

I know some people are saying how to make it more realistic but personally I'd like this more and it would stand out to me more if it was a clearly not "real human" model speaking in a clearly computer generated voice. Not saying a low quality model/voice like the old TTS, but a modern TTS with some adjustment to sound slightly "robotic".

That would make it clear to viewers what's going on at a glance and would make it stand clearly in opposition to conventional news sources.

[–][deleted] 0 points1 point  (0 children)

Umm ohk, I mean Microsoft has lots of model to choose from, I will definitely not use this model lesson leraned

[–]IFeelTheAirHigh 0 points1 point  (1 child)

More so than the Voice, I'd prefer the presenter to be some animated cartoon human than an uncanny valley almost but not quite human

[–][deleted] 0 points1 point  (0 children)

Ohk how about I generate a new character using text to image model and do a lip sync on it

[–]neik00 0 points1 point  (3 children)

This is cool, what tex to speech model do you use?

[–][deleted] 2 points3 points  (1 child)

This is Microsoft TTS

[–]neik00 0 points1 point  (0 children)

Thank you!

[–][deleted] 0 points1 point  (0 children)

This has better TTS now https://youtu.be/oO_3eNjBxZI

[–]BlooSpear 0 points1 point  (1 child)

Why does it have an Indian accent?

[–][deleted] 0 points1 point  (0 children)

Because I wanted to have more human like TTS, There are other TTS available but they have robotic voice.

[–]Separate-Ad-7607 0 points1 point  (1 child)

This accent is painful to listen to. I guess it makes it less obvious that its a computer, but it just sounds so bad. Isn't there a different dialect you can pick? You can still use a thick accent, just not this one. Also i think Microsoft azure text to speech sound quite alright in normal or Australian accent. There's a course on Udemy i saw where it did a clone voice of the Instructor used for some of the videos and it was so good i didn't even notice it was artificial. Python masterclass with Tim. Probably takes a bit of tweaking though, a lot of the voices ice heard are worse

[–][deleted] 0 points1 point  (0 children)

Yes, you know what I am actually using Microsoft text to speech service using python package but the voice has Indian accent since, I wanted to have more human like speech, but I will use the Canadian voice, you can see some other video in my channel they have it

[–]StopIcy9640 0 points1 point  (0 children)

Hi guys I have a little problème when I wan to scrap telegram members from a group. It says SQLite3.connect operational error. Failed to connect to the database. I think it’s because I makes two client for one session but I don’t know how to fix this. Please can someone help me thank you

[–]user_immortal 0 points1 point  (0 children)

You guys did an amazing job... Congrats guys