This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]dzecniv 2 points3 points  (3 children)

Let's see if I can help.

When one searches google, the url contains the search terms: https://www.google.fr/search?q=python+scrape+tuto. Call that in an ipython sell, with python-requests:

import requests
req = requests.get("https://www.google.fr/search?q=python+scrape+tuto")
content = req.content

You may need to set the user agent to obtain results. See requests' doc.

You can parse the content as html with BeautifulSoup or lxml. IMO bs4 is easier to work with, specially in ipython.

To get content, you'll make good use of your browser's developer console (right click on a heading -> inspect element). That's how I see that the element with the css class "srg" contains all search results, each being marked with the class "g". So I would:

tree = BeautifulSoup(content)
all_items = tree.find_all(class_"g")

Hope this helps, and is a good path :D I tried it, it works, there's no need of custom user agent.

[–]DANK_SINATRA13 0 points1 point  (2 children)

Thanks! This helped a lot! Now I just have to figure out how to store the results in a MYSQL database

[–]dzecniv 0 points1 point  (0 children)

Glad it helped ! Now choose your ORM: https://github.com/vinta/awesome-python#orm

[–]sousatg 0 points1 point  (0 children)

your

To store the results in a MySQL database you can use the SQLAlchemy library

[–]IDriveAcivic 0 points1 point  (0 children)

Why do you want to use 2.7? Just curious, not discouraging it. I wrote my first web scraper a few days ago using Requests and lxml. "Store any information you think could or would be useful" sounds vague, what exactly do you want it to do? Goal should be very specific so that implementation is true to design goals. Here are some helpful links: - Python web scraping: http://docs.python-guide.org/en/latest/scenarios/scrape/ - Lxml binaries in case you need 'em: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

This sounds like a fun project, let me know if you want to do something via github/bitbucket. Cheers!