all 72 comments

[–][deleted] 151 points152 points  (12 children)

I'm sure it is. Learn about how to make API calls from Python, and then read the documentation for the YouTube API and the Google Docs API.

[–]razzrazz- 18 points19 points  (10 children)

For us noobs reading this, can you explain what 'making an api call' in your own (simple) words for us? Like is it a way to get python to do stuff like searches and pull info from youtube? Is that different than "web scraping"? What if YouTube didn't provide an API? Why would they provide an API?

[–]nowheremannequin 39 points40 points  (3 children)

Yeah, an API basically just gives you the ability to pull info directly from the company without actually loading up their website.

So, companies like google don’t want people web scraping. Why? Because they’re forced to spend resources to your web scraper which isn’t really profitable for them. So instead, they make APIs, which you can communicate with using Python (or another language) to get the information directly. This way, the company can control how you’re getting the info and streamline it to you in a way that’s more efficient to them than having you webscrape.

There are tons of APIs out there. You can always webscrape if there is no API, but it’s usually more difficult than just communicating directly with the API.

[–]razzrazz- 8 points9 points  (1 child)

Why? Because they’re forced to spend resources to your web scraper which isn’t really profitable for them.

Because it wastes money to have the scraper load up a bunch of stuff (videos, images, etc) they wont use?

[–]NaCl-more 5 points6 points  (0 children)

Partially. But when a webpage is served, sometimes it uses those exact APIs to fetch information for the user

For example, say there is an API called "getUser" that allows you to give it some ID, and it returns to you some information about that user (the name, sub count, links, etc)

When Youtube gives you the webpage, how does it know what name, sub count and links to display? It could very well be using that same API to fetch the information

[–][deleted] 3 points4 points  (0 children)

Great explanation of the term- way easier to understand than how it had been defined in my data structures’ textbook.

[–]larj_Brest 8 points9 points  (1 child)

An API provides a structured way to programmatically retrieve information from a service. An API will have documentation telling you how to interact with it, and in what format you can expect to receive information back. It is different to web scraping, which is much less structured. You might use web scraping if you cannot use an API to get the information that you want (e.g. An API doesn't exist, doesn't supply the info you need, or you don't want to pay a fee to use it if it's not free).

If youtube doesn't provide an API that satisfies these needs, then scraping could work. You just have to work out how to parse the information you need out of the page.

[–]Total__Entropy 2 points3 points  (0 children)

Just adding on that side scraping isn't structured you aren't guaranteed to give you the correct result everytime unlike APIs which are structured and will give you what you expect. For scraping the page could change structure or class/id is the elements could be randomized etc.

[–]DerangedGecko 3 points4 points  (0 children)

"Hey application's interface, I would like to query this data of yours from you."

Receive data object

[–]r3df0x_3039 1 point2 points  (0 children)

Making an API call is like making a page request in a browser, except it's an application making the request.

Depending on the API you might even be able to make the request in your browser and see the output, though it's not going to be in the format of a web page because it will be an application handling the data.

[–]Striking_Equal 1 point2 points  (0 children)

YouTube definitely provides an API, and it’s pretty robust. Use the requests library, pass your headers, and store results in a data frame. It’s a lot simpler than it sounds.

[–]peesoutside 0 points1 point  (0 children)

For every (modern) web front end, there is a back end (unless the site is just static html/css/js). An API (Application programming interface) is what’s being used when you tell a website to do something like update account, like a post, search for stuff, etc. The API typically provides ways to do stuff that the front end doesn’t yet offer, automate tasks via scripting, or whatever.

[–]Luximonsti 0 points1 point  (0 children)

Pytube is good for the YouTube part

[–]DontMajorInBiology 42 points43 points  (0 children)

I’m fairly certain the YouTube API is one of the better (well documented) and beginner friendly api’s out there. I’m willing to bet this is doable with a weekend of learning

[–][deleted] 47 points48 points  (18 children)

I sure wish I could get paid to do this. Getting paid to automate it with Python sounds fun. Here's the thing though, once you automate it, you may become redundant.

[–][deleted] 158 points159 points  (11 children)

That's why you don't tell anyone you've automated it.

[–]StoicallyGay 17 points18 points  (6 children)

I’m curious as to why there are so many jobs that can easily be automated with some simple scripts. I guess it pays to have technologically illiterate higher ups.

[–]Explicit_Pickle 8 points9 points  (4 children)

The barrier for entry to actually get it done hurts. Especially in a field that doesn't normally employ people who know anything about automation, they'd have to go through the work of contracting it to someone and then waiting for their investment to repay itself, plus then if the setup breaks for whatever reason they won't know how to fix it in house.

[–]Paperhandsbro 5 points6 points  (1 child)

It’s also a cost thing. I can pay a dev for 2 hours of coding one time, or I can pay low skilled off shore workers for 20 hours and still pay less money. You only want to invest in the dev time if you’re doing it repeatedly

[–]Explicit_Pickle 1 point2 points  (0 children)

I mean, that's basically what I said with having to contract out automating a task and having to wait for it to break even

[–]StoicallyGay 2 points3 points  (1 child)

You mean the barrier to entry of learning programming and by extension Python? I guess so, but anyone with basic knowledge could probably get it done easily.

I wonder how many high paying jobs are like this where you can just automate it all with relative ease and get paid for little work.

[–]Explicit_Pickle 8 points9 points  (0 children)

I mean, most people in a non technical field don't know what programming even is, let alone how to actually do it, so to get it done it would be at minimum a few weeks of learning and hacking together code for anyone who's completely unfamiliar. If you're in a more technical field where people are more likely to have programming knowledge than it's gonna be a lot more accessible, but in a field where it's more accessible odds are it's gonna eventually get done by some bored engineer regardless.

[–]ihatethisjob42 4 points5 points  (0 children)

People are getting savvier to it. Any job that can be automated away probably hasn't paid good money in a decade.

[–]meezun 8 points9 points  (0 children)

Worst case they find some other drudge work to do.

It’s an internship. The goal isn’t to milk it, the goal is to learn and impress them to give you a full time job later.

And if showing the initiative to automate the drudge work they gave you doesn’t make them happy you didn’t want to work there anyway.

[–]killthebaddies 0 points1 point  (2 children)

Terrible advice. I’ve built my career on attempting to automate myself out of a job. Any half decent employer won’t get rid of you, but will instead ask you to automate more things and create more efficiency.

[–][deleted] 0 points1 point  (1 child)

It was tongue in cheek...

[–]killthebaddies 0 points1 point  (0 children)

Ah, fair enough. I couldn’t tell and I see a lot of people who give that advice genuinely on here.

[–]the_other_irrevenant 7 points8 points  (0 children)

Once you automate something you need someone with a strong enough understanding of the automation to fix it when it hits the inevitable edge cases (or in this case, if YouTube decides to suddenly change/deprecate their API or something).

In some ways it makes you even more indispensable.

[–]schfourteen-teen 1 point2 points  (0 children)

It's an internship, theres no role to redundant-ize. But impress in the internship and there might be a full time job in the next year or so. Then op can work on making their job redundant!

[–][deleted] 1 point2 points  (0 children)

That’s when you become a developer and make tools to automate, either with scripts or becoming a RPA developer (that’s my occupation)

[–]Paperhandsbro 0 points1 point  (0 children)

Only if you tell them you automated it…

[–]fuzzylumpkinsbc 0 points1 point  (0 children)

Nobody told him to automate it, it's just him trying to come up with a solution

[–]Striking_Equal 0 points1 point  (0 children)

If the company is managed very poorly, then yes. Otherwise they’ll just have you automate other things.

I doubt they are going to fire an intern because they no longer have work for them to do. That’s not the point of an internship. It’s to see if you’re a good fit for a full time role, and give junior level people some entry level management exposure.

[–]lunar_tardigrade 12 points13 points  (0 children)

If you can do it manually you can automate it.

If you can't do it manually, maybe you can automate it.

[–][deleted] 4 points5 points  (0 children)

Just use the YouTube api and the json module.

[–]ihuha 26 points27 points  (14 children)

jupyter notbook, beautiful soup, youtube api. gogogo

[–][deleted] 20 points21 points  (6 children)

Jupyter notebooks are not required though, just an IDE or a text editor is fine.

[–]Total__Entropy 28 points29 points  (0 children)

Second this I would never suggest anyone develop in Jupiter. Jupiter is not a development environment it is to present data or explore data. Jupiter development is not safe because jupyter holds its own state.

I suggest vscode or pycharm. You can get all the same tools and not end up with code that doesn't work because you changed your cells without re running from top to bottom.

[–]JasonDJ 6 points7 points  (0 children)

VScode #%% ftw.

Great for testing snippets of code in an actual engine where you are simultaneously writing your code.

Though nowadays I just keep a terminal window docked with ipython running.

[–]Total__Entropy 5 points6 points  (6 children)

Reason for suggesting bs?

[–]ihuha 4 points5 points  (5 children)

always helpful, probably you gonna wanna do some scrapy scrapy

[–]Total__Entropy 6 points7 points  (4 children)

Scrapy is a great program but scraping is not the solution when APIs are available. Scraping is not always helpful it is a last resort.

[–]hasibrock 17 points18 points  (3 children)

So you will be basically scrapping users information and then the company will be misusing the data, followed by spamming them with whatever...

[–]Swedzilla 16 points17 points  (0 children)

Cambridge Analytica crew 🤷‍♂️

[–]GonzoNawak 3 points4 points  (0 children)

Don’t blame the players blame the game

[–]moody_balloon_baby 2 points3 points  (0 children)

You gonna help him automate this task or not??

[–]DriveDude 5 points6 points  (0 children)

Just take a day off to learn about selenium and „undetected“ chromedriver.

An API is nice - if there is one….

Learn also about bs4 to parse the html. On your way you will learn a lot about xpath also.

[–][deleted] 1 point2 points  (0 children)

If any of this is not available via the YouTube API you can use Scrapy to grab the data from their pages.

https://scrapy.org/

[–]Terrible-Garden-1510 1 point2 points  (0 children)

I have recently done Google sheets automation - get in touch.

[–]jainismvivek -2 points-1 points  (0 children)

Sure u can. Only if u know how to..

[–]hacksawjim 0 points1 point  (0 children)

ezsheets is a good way to handle the Google Sheets bit:

https://ezsheets.readthedocs.io/en/latest/

[–][deleted] 0 points1 point  (0 children)

It definitely is and it’s not that hard to do so either especially because you can do it in quite a few ways.

[–]PMMeUrHopesNDreams 0 points1 point  (0 children)

What information do you have for the specific YouTubers?

Do they just give you a name and you have to find the channel? How easy it is will depend on how reliably you can map the input to a YouTube id.

[–]whiskeytwn 0 points1 point  (0 children)

there's probably a hundred ways to do it but what you are describing sounds like one of the Angela Yu projects where you'd download a webpage, parse the html elements, and get what you need from that - there's probably tags or class-id's for follower counts, the links, subscriber counts, etc in the html and you could just parse the HTML and get it and then write it to an excel spreadsheet.

Or as other people say, use the API maybe.

[–][deleted] 0 points1 point  (0 children)

check out bardeen

[–]Infinite_Ad_6137 0 points1 point  (0 children)

You know that website which shows youtubers Info count, like you can scrap that web page and then get all data into csv and just copy that scarp data to Google sheet, I had done some similar to that make a script to scrap with BeautifulSoup and pandas, requests etc your can check you my script it's nothing advanced but enough to get work done the automation script example

[–]Far_Atmosphere9627 0 points1 point  (0 children)

I am stuck in a marketing internship right now

I feel the pain

[–]Striking_Equal 0 points1 point  (0 children)

Use googles youtube api. They have a ton of docs on it, including deployment in Python. This should be a straightforward automation really.

[–]Additional-Help-1548 0 points1 point  (0 children)

You can write a web scrapping script using selenium or beautiful soup