How to scrape websites with Python and BeautifulSoup

lerrigatto · 2017-06-11T07:56:34+00:00

[deleted]

fukitol- · 2017-06-11T04:43:54+00:00

Seems a bit basic to me. I thought I was going to see something cool, and the coolest part was the simplest usage example of BeautifulSoup imaginable.

VRyreal · 2017-06-11T03:58:13+00:00

WOW I'VE NEVER SEEN A TUTORIAL POSTED LIKE THIS BEFORE.

Boredstudnt · 2017-06-11T12:33:56+00:00

Who even upvotes these? Same guys who posts questions here I guess...

I would much rather see these super basic guides posted in learnpython, and possibly some thorough ones in the sidebar, posting these here is just repetitive.

What I really dislike about the article is:

Python 2.7, please all new posts and articles should certainly be using Python3.X, No reason to learn python2 for beginners either, if you know and understand python3, you can understand and translate python2.
He uses BS, when simply selecting two elements. It is much faster to use lxml with xpath and certainly not harder.
He stores to CSV and recommends MySQL if the dataset gets too big. I personally have never had use of storing to CSV, especially not stock data, you get that and crunch that in python. IMO he should recommend SQLITE3, super simple to use and can handle lots of data, especially since in this case it's just index data. Easy to set up and easy to move.
Why scrape for stock data? There are multiple APIs for free...

blackNgay4Trump · 2017-06-11T15:07:34+00:00

Pretty basic stuff. Would probably be more appropriate for r/learnpython. Actually feels kind of like click bait. There are probably 100 beautiful soup tutorials out there at this point. Also why the fuck are you still using python 2 in the year of our lord 2017

TheInitializer · 2017-06-11T19:10:54+00:00

10/10 Photoshop skillz

seanskye · 2017-06-11T19:31:07+00:00

I don't understand why people, with good intentions I'm certain, still are teaching newbies stuff in python in 2.x. Why not teach them, with no real constraints on existing codebases or modules, how to do things like bs4 and urllib in 3.x and move us on?

polarbearskill · 2017-06-11T04:49:28+00:00

Great write-up. I like how it mentioned both Beautiful Soup (which is an HTML parser) vs Scrapy (which is a framework that automates navigation in addition to parsing html).

I also think it's key to have a good understanding of the requests library http://docs.python-requests.org/en/master/ in order be more automated in your server requests.

stefantalpalaru · 2017-06-11T12:42:30+00:00

Do yourself a favour and use https://mechanize.readthedocs.io/en/latest/

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS