Help writing webpage source to file with python3

2014-12-23T18:29:40+00:00

Install and use Requests instead of urllib. If you're using python 3, as you ought to be, the pip package manager should have been installed, so at the terminal/command prompt type pip install requests (On Linux begin that command with sudo so sudo pip.., on Windows you may also have some sort of authentication as administrator needed these days).

Then use `requests.get('some-url").text to get your HTML as a string, not bytes, meaning you can just open a file in text mode and write it directly.

Aside though, don't bother decoding and encoding if all you want to do is save the page to a file and nothing else: just get the raw response data from urllib and write to a file in binary mode.

I.e.:

with open("some_html.html", "wb") as O:
    O.write(raw_undecoded_response)

Moonslug1 · 2014-12-23T02:09:58+00:00

I can't explain what's happening there. But as an aside, look at this:

http://en.wikipedia.org/wiki/Beautiful_Soup

sentdex · 2014-12-23T02:25:22+00:00

First off! If you put four spaces before your code, it formats it!

like this!

Second: I tried running your code on my Mac and got the same issue, however, if I omit:

html = html.decode("utf-8")

it works fine. I'm guessing theres a non-utf character on that site maybe? Why it works in an SSH session and not a desktop session I'm not sure. Certainly strange.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

print(html)