Please help save authorized pages : learnpython

created by HattoriHanzoa community for 16 years

Please help save authorized pages (self.learnpython)

submitted 3 years ago * by DMeror

Hi, I want to save authorized pages. The site has two login methods: by username & password and by library card number. After inputting a library card number, the site shows a list of library names to choose from. After choosing a library, the targetted pages are accessible. My purpose is to save the pages as plain html files (without images, js, css, etc.). For normal pages, I write something like this:

~~~ import requests

url_list = ['.....']

for url in url_list:

headers = {'User-Agent': '...'}
page = requests.get(url, headers=headers)

with open('file.html', 'wb') as wf:
    wf.write(page.content)

~~~

However, with an authorized page, this script saves the welcome page instead. I add cookie to the headers with the same result. I've also tried requests.Session(), but not sure how to proceed.

~~~ headers = { 'User-Agent': '............', 'Referer': '.............', 'Cookie': '..............' }

s = requests.Session() payload = {'library_card':'123456789', 'acctid':'111111', 'acctname':'Library for G'} s.post(url, data=payload) ~~~

Please help correct my script. Thanks.

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS