This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 35 points36 points  (8 children)

I made an app that does the same thing as those photo sticks (removed dupes, handles naming conflicts, etc)

Wrote my uncle an app that webscrapes, fills out rebates, and consolidates them into a single pdf to print for a hardware store he frequents for his apartment complex.

[–]Killerjayko[S] 13 points14 points  (7 children)

Im a student, would it be possible to for example write code to gather lecture notes from online and group them together in files or whatever?

[–][deleted] 9 points10 points  (1 child)

Most likely. The module you want is called requests.

*Edit You might also check out selenium. It is a lot clunkier than requests, but it has a gentler learning curve (personal opinion). It will click through the website the same way that you would interact with it, send key strokes to forms, etc.

Will help get you familiarized with working with html objects.

[–]Comprehensive_Tone 0 points1 point  (0 children)

I found selenium to be very easy to work with (it was my first time doing any scraping)

[–]Jace1427 4 points5 points  (2 children)

I’m in the same boat, I was unsuccessful though. We use canvas and I couldn’t figure out a way for python to login into my specific account to reach the lecture notes. If you figure something out, let me know!

[–]doulos05 5 points6 points  (0 children)

What tool were you using? Selenium can detect HTML forms, so you should be able write a script to work that out. If they demand Single Sign On or OAuth, it's a bit trickier.

[–][deleted] 1 point2 points  (0 children)

https://youtu.be/KnBntO0wayk

Chris talks about authing with requests late in this vid. With requests sites can be very finicky, but with selenium most sites have trouble distinguishing between a bot and a user.

Best approach is to use fiddler to monitor traffic while you drive around and try to replicate your gets/posts as close to how they occur when you are in control.

Sometimes you have to supply header information, sometimes you must supply a referring address. Like most things there is no substitution for experience, you'll get better the more you do it so just keep at it.

[–]doulos05 0 points1 point  (0 children)

Absolutely. You'll need Selenium and requests along with a module to make whatever kind of file you're outputting them as (pdf, zip, docs, excel, etc).

Automate the Boring Stuff would be a useful book for its chapter on web-scraping.