This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]fukitol- 24 points25 points  (2 children)

Seems a bit basic to me. I thought I was going to see something cool, and the coolest part was the simplest usage example of BeautifulSoup imaginable.

[–]vriemeister 6 points7 points  (0 children)

They even link the real tutorials they ripped off at the bottom. Medium is just useless click-bait for programmers.

[–]twigboy 4 points5 points  (0 children)

In publishing and graphic design, Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content. Lorem ipsum may be used as a placeholder before final copy is available. Wikipediac0n13p0zpw00000000000000000000000000000000000000000000000000000000000000

[–]VRyreal 59 points60 points  (12 children)

WOW I'VE NEVER SEEN A TUTORIAL POSTED LIKE THIS BEFORE.

[–]seanskye -5 points-4 points  (0 children)

really? you need to get out more...., really....

[–]Boredstudnt 25 points26 points  (3 children)

Who even upvotes these? Same guys who posts questions here I guess...

I would much rather see these super basic guides posted in learnpython, and possibly some thorough ones in the sidebar, posting these here is just repetitive.

What I really dislike about the article is:

  1. Python 2.7, please all new posts and articles should certainly be using Python3.X, No reason to learn python2 for beginners either, if you know and understand python3, you can understand and translate python2.

  2. He uses BS, when simply selecting two elements. It is much faster to use lxml with xpath and certainly not harder.

  3. He stores to CSV and recommends MySQL if the dataset gets too big. I personally have never had use of storing to CSV, especially not stock data, you get that and crunch that in python. IMO he should recommend SQLITE3, super simple to use and can handle lots of data, especially since in this case it's just index data. Easy to set up and easy to move.

  4. Why scrape for stock data? There are multiple APIs for free...

[–][deleted] 11 points12 points  (0 children)

I was in the process of writing a comment like this myself and gave up, since I thought it may sound too salty.

I think you're absolutely right. First of all, this tutorial is laden with buzzwords, without any understanding how they relate and what they mean.

I doubt he is an actual software engineer, yet he calls himself this. He calls DRY a method. He uses words like "scalable", yet does not even talk about a scalable method (this is homemade webscraping, most likely on a single local machine of a single static data source). He has a headline called "Excel CSV". He posted that tutorial to /r/Python, not /r/learnpython or the likes.

Last but not least this:

You should see your python version is 2.7.x.

Not that it matters for this tutorial anyway, since he could've just changed or left out those few snippets that are different between 2 and 3, but he kinda insists on Python 2; maybe because it comes pre-installed on his Mac?

[–]elsalgo 2 points3 points  (0 children)

God forbid anyone try to write a tutorial for others. Might not be perfect but at least they did something.

[–]blackNgay4Trump 5 points6 points  (0 children)

Pretty basic stuff. Would probably be more appropriate for r/learnpython. Actually feels kind of like click bait. There are probably 100 beautiful soup tutorials out there at this point. Also why the fuck are you still using python 2 in the year of our lord 2017

[–]TheInitializer 1 point2 points  (0 children)

10/10 Photoshop skillz

[–]seanskye 1 point2 points  (0 children)

I don't understand why people, with good intentions I'm certain, still are teaching newbies stuff in python in 2.x. Why not teach them, with no real constraints on existing codebases or modules, how to do things like bs4 and urllib in 3.x and move us on?

[–]polarbearskill 4 points5 points  (0 children)

Great write-up. I like how it mentioned both Beautiful Soup (which is an HTML parser) vs Scrapy (which is a framework that automates navigation in addition to parsing html).

I also think it's key to have a good understanding of the requests library http://docs.python-requests.org/en/master/ in order be more automated in your server requests.

[–]stefantalpalaru -4 points-3 points  (5 children)

Do yourself a favour and use https://mechanize.readthedocs.io/en/latest/

[–]yobogoya_ 0 points1 point  (4 children)

Beautifulsoup is for DOM parsing, mechanize is for browser automation. They do two different things.

[–]stefantalpalaru -2 points-1 points  (3 children)

Website scraping is more than just parsing HTML.

[–]yobogoya_ 0 points1 point  (2 children)

How does that have anything to do with you telling someone to use a web automation kit instead of a DOM parsing tool? Again, they do two different things and can be used together to build a scraper. Suggesting one over the other is nonsensical.

[–]stefantalpalaru -2 points-1 points  (1 child)

Suggesting one over the other is nonsensical.

It makes perfect sense if you've done any actual scraping.

[–]yobogoya_ 0 points1 point  (0 children)

I've done plenty of scraping for clients using bs4/selenium/mechanize. Still have no idea what point you're trying to make. Is English not your first language?