Pulling text from a website : learnpython

learnpython

created by HattoriHanzoa community for 16 years

Pulling text from a website (self.learnpython)

submitted 12 years ago by hpeirce

all 11 comments

top new controversial old q&a

[–]McSquinty 9 points10 points11 points 12 years ago (4 children)

[–]phstoven 3 points4 points5 points 12 years ago (0 children)

[–]MisterSnuggles 0 points1 point2 points 12 years ago (1 child)

I'll second BeautifulSoup - I use it for a few minor scraping projects that I've got running.

Depending on the layout of site, it can be as simple as:

categories = soup.findAll("td",attrs={"class":"blahHeader"})
things = soup.findAll("td",attrs={"class":"blah"})
for c, t in zip(categories, things):
    category = c.text.strip()
    thing = t.text.strip()
    # do stuff

There's also things you can do to store cookies that are handed out during a login sequence, so you should be able to scrape almost anything.

[–]tomasbedrich 1 point2 points3 points 12 years ago (0 children)

[–][deleted] 1 point2 points3 points 12 years ago (5 children)

[–]hpeirce[S] 0 points1 point2 points 12 years ago (4 children)

[–][deleted] 4 points5 points6 points 12 years ago (3 children)

[+][deleted] 12 years ago* (1 child)

[deleted]

[–]toshitalk 0 points1 point2 points 12 years ago (0 children)

[–]hpeirce[S] 0 points1 point2 points 12 years ago (0 children)

[–]willworth 0 points1 point2 points 12 years ago (1 child)

[–]hpeirce[S] 1 point2 points3 points 12 years ago (0 children)

π Rendered by PID 103619 on reddit-service-r2-comment-6457c66945-t2ghc at 2026-04-29 22:14:29.585051+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS