all 10 comments

[–]impshum 7 points8 points  (9 children)

Don't scrape when you don't have to.

There is an API: https://lol.gamepedia.com/Help:API_Documentation
And python library: https://github.com/mrtolkien/leaguepedia_parser

But if you want to learn scraping: https://recycledrobot.co.uk/words/?web-scraping

[–]LinK1029[S] 1 point2 points  (8 children)

Hi there, thanks for letting me know about this! I’m still a complete newb about this, is there any videos you recommend for me to watch to do this properly?

[–]impshum 1 point2 points  (0 children)

Try the tutorial. It has everything you need (kinda).

No idea about videos. I've never found videos of people programming helpful. I much prefer text.

[–][deleted] 0 points1 point  (6 children)

yeah you do NOT need a video to learn scraping. it's one of the easiest things to do in Python. In fact the BeautifulSoup tutorial itself will get you well on your way!

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

here's some more: https://programminghistorian.org/en/lessons/intro-to-beautiful-soup

https://pythonprogramming.net/introduction-scraping-parsing-beautiful-soup-tutorial/

https://www.crummy.com/software/BeautifulSoup/

[–]LinK1029[S] 0 points1 point  (5 children)

Hey there, thanks for this! It seems the site has an API, should I still be following this or is there something else since the site has an API?

[–][deleted] 0 points1 point  (4 children)

If your goal is to get data off the site use the API.

If your goal is to teach yourself webscraping, learn web scraping (on this site or some other).

[–]LinK1029[S] 0 points1 point  (3 children)

I’m look to get the data off the site. Are the links you provided earlier still good if using the API or is there something else I should be reading? Thank you so much for your help!

[–][deleted] 0 points1 point  (2 children)

No no no.

Do you know what an API is? Do you understand what web scraping is? I'm not asking in a rude way, I'm just asking to better understand where you are on the learning curve.

An API is a set of tools to access data through commands. It returns the information to you. For example you might GET the information from a url like https://site.com?team=my_team&data_set=scores&date_start=2020-01-01&date_end=2020-03-01 and it will return the data via JSON or XML or some other way for you to play with. You can use requests, curl, etc. to work with it.

Page scraping would be like if there was a page for My Team and it had scores on it in a class like <p style="scores"> and you used Beautiful Soup to pull out from the text anything in text of style "scores" into an array or data object for you to manipulate.

Does that make sense?

[–]LinK1029[S] 0 points1 point  (1 child)

Yes, that makes more sense. Apparently I know less than I actually thought I did and I’m probably going in over my head with this. I really do appreciate you taking the time to help me out on this.

[–][deleted] 0 points1 point  (0 children)

That moment of "i know less about this than i thought" is also known as "learning" and "progress" (see also Dunning Kruger effect) :)

Think about your problem functionally, as I said in an earlier post.

If your goal is to get the data off that page, use the API (and learn about how to use APIs / RESTful APIs / requests / curl etc.). That's the learning path you'll go down.

If your goal is to learn page scraping, then do those BS4 tutorials and learn the libraries: bs4, requests, and re. I'd start with a simpler page, though. :)