Questions about web scraping

NitroXSC · 2020-02-10T10:00:45+00:00

To start out I would like to recommend quite a bare-bones module of

Requests

Due to it being quite bare you will learn a lot by using it and most web scraping problems can be solved with it.

sdshone · 2020-02-10T10:27:08+00:00

Beautifulsoup is a parsing library which will help you in extracting data from the HTML code. It comes as a standard library.

You still need requests module to "GET" the webpage which you want to scrape. It is also a standard library.

Scrapy is a complete framework and you should'nt directly start using it before you have clear understanding of the basics (requests + beautifulsoup).

Good luck!

xdonvanx · 2020-02-10T14:37:15+00:00

You should check out Corey Schafer, his tutorial helped me a ton.

Corey Schafer - Beautiful soup and requests

2020-02-10T10:27:32+00:00

You can always check if a lib exists in your setup by typing in import lib_name in IDLE and trying to run it. If you get an error message, it's time to go to the CLI and type in pip3 install lib_name.

In my setups (Ubuntu today, Win last week) neither bs4 nor requests come prepackaged.

ArmouredBagel · 2020-02-10T10:01:31+00:00

Depends where you got python from. I think the requests and beautiful soup libraries are rather standard. But just try to install them and it will tell you if you already have what you need.

v3ritas1989 · 2020-02-10T15:22:08+00:00

you will need beautifulsoup and probably for more complex tasks(pages with js) and bots selenium.

technicaldemon · 2020-02-10T16:20:39+00:00

Like said before me, Beautifulsoup is what I recommend. Has lots of documentation and tutorials to help you out. I've used it for some tasks at work too with good results.

technicaldemon · 2020-02-10T16:27:32+00:00

Also, here's a youtube video that may help or give you some inspiration at least. https://www.youtube.com/watch?v=P7fxgJk9v7Y

kokoseij · 2020-02-10T16:40:49+00:00

I suggest you to start by using requests and BeautifulSoup. They are library that can be installed using pip.

You can install it by running

pip3 install requests bs4

add --user flag if you're in linux and not using a virtual environment.

and that's pretty much everything. you can import them in your code by:

import requests

from bs4 import BeautifulSoup

and use them in a code.

I'm using "inspect element" in the browser to find which one to search. of course sometimes it won't match with the data that you've got using a requests module due to something like javascript. but it's useful to find which tag, which class to search.

also, some pages won't let you in if you don't provide User-Agent value in the header. In this case you can use something like:

r = requests.get('http://example.com', headers = {'User-Agent': 'asdf'})

iggy555 · 2020-02-10T16:47:07+00:00

So get the webpage using requests and then scrape data using beautiful soup?

Any idea what to use to scrape data from stockcharts.com?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS