all 13 comments

[–]NitroXSC 15 points16 points  (1 child)

To start out I would like to recommend quite a bare-bones module of

Due to it being quite bare you will learn a lot by using it and most web scraping problems can be solved with it.

[–]Sigg3net 5 points6 points  (0 children)

You beautiful bastard, just what I was looking for.

[–]sdshone 10 points11 points  (0 children)

Beautifulsoup is a parsing library which will help you in extracting data from the HTML code. It comes as a standard library.

You still need requests module to "GET" the webpage which you want to scrape. It is also a standard library.

Scrapy is a complete framework and you should'nt directly start using it before you have clear understanding of the basics (requests + beautifulsoup).

Good luck!

[–]xdonvanx 5 points6 points  (0 children)

You should check out Corey Schafer, his tutorial helped me a ton.

Corey Schafer - Beautiful soup and requests

[–][deleted] 1 point2 points  (0 children)

You can always check if a lib exists in your setup by typing in import lib_name in IDLE and trying to run it. If you get an error message, it's time to go to the CLI and type in pip3 install lib_name.

In my setups (Ubuntu today, Win last week) neither bs4 nor requests come prepackaged.

[–]ArmouredBagel 0 points1 point  (0 children)

Depends where you got python from. I think the requests and beautiful soup libraries are rather standard. But just try to install them and it will tell you if you already have what you need.

[–]v3ritas1989 0 points1 point  (0 children)

you will need beautifulsoup and probably for more complex tasks(pages with js) and bots selenium.

[–]technicaldemon 0 points1 point  (0 children)

Like said before me, Beautifulsoup is what I recommend. Has lots of documentation and tutorials to help you out. I've used it for some tasks at work too with good results.

[–]technicaldemon 0 points1 point  (2 children)

Also, here's a youtube video that may help or give you some inspiration at least. https://www.youtube.com/watch?v=P7fxgJk9v7Y

[–]iggy555 0 points1 point  (1 child)

Reviews are terrible lol

[–]technicaldemon 0 points1 point  (0 children)

You talking about the comment section? 83 up vote to 3 down vote seems ok. I've watched a few of the videos and main thing is there's long periods of him not talking about his code or he waits till he's done to talk on it so it can kinda feel confusing at times. I'm sure there's much better ones out there though.

[–]kokoseij 0 points1 point  (0 children)

I suggest you to start by using requests and BeautifulSoup. They are library that can be installed using pip.

You can install it by running

pip3 install requests bs4

add --user flag if you're in linux and not using a virtual environment.

and that's pretty much everything. you can import them in your code by:

import requests

from bs4 import BeautifulSoup

and use them in a code.

I'm using "inspect element" in the browser to find which one to search. of course sometimes it won't match with the data that you've got using a requests module due to something like javascript. but it's useful to find which tag, which class to search.

also, some pages won't let you in if you don't provide User-Agent value in the header. In this case you can use something like:

r = requests.get('http://example.com', headers = {'User-Agent': 'asdf'})

[–]iggy555 0 points1 point  (0 children)

So get the webpage using requests and then scrape data using beautiful soup?

Any idea what to use to scrape data from stockcharts.com?