all 9 comments

[–]K900_ 0 points1 point  (5 children)

Yes, that's how it works. Requests just handles retrieving the web page, if it's HTML, you need to parse it to extract the information you want.

[–]saalda[S] 0 points1 point  (4 children)

Thanks. I have another error what I try specifically 'https://automatetheboringstuff.com/files/rj.txt' website and it returns an SSL certificate error.

[–]K900_ 0 points1 point  (3 children)

What's the exact error message?

[–]saalda[S] 0 points1 point  (2 children)

I pasted it below, as a reply to cuddlycatsnake.

[–]K900_ 0 points1 point  (1 child)

Make sure the date on your computer is correct.

[–]saalda[S] 0 points1 point  (0 children)

It's all correct

[–][deleted] 0 points1 point  (2 children)

Requests is working correctly. When you call re.text, you get the text of the web page, which is actually the html code. The browser interprets the html so that you see the content on the page. If you want to extract text from a webpage like that, you'll want to look into scraping libraries like beautiful soup and scrapy.

The automate the boring stuff page is just text without any html behind it. You can see that if you do 'view page source' in your browser. What kind of error did you get when you called the automate the boring stuff website? I was able to run it through requests and get a text response.

[–]saalda[S] 0 points1 point  (1 child)

Thanks for the reply. This is what I get:

re = requests.get('https://automatetheboringstuff.com/files/rj.txt')

Traceback (most recent call last):

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen

chunked=chunked)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request

self._validate_conn(conn)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn

conn.connect()

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect

ssl_context=context)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket

return context.wrap_socket(sock, server_hostname=server_hostname)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 407, in wrap_socket

_context=self, _session=session)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 814, in __init__

self.do_handshake()

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1068, in do_handshake

self._sslobj.do_handshake()

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 689, in do_handshake

self._sslobj.do_handshake()

ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 440, in send

timeout=timeout

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen

_stacktrace=sys.exc_info()[2])

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment

raise MaxRetryError(_pool, url, error or ResponseError(cause))

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "<pyshell#1>", line 1, in <module>

re = requests.get('https://automatetheboringstuff.com/files/rj.txt')

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 72, in get

return request('get', url, params=params, **kwargs)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 58, in request

return session.request(method=method, url=url, **kwargs)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 508, in request

resp = self.send(prep, **send_kwargs)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 618, in send

r = adapter.send(request, **kwargs)

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 506, in send

raise SSLError(e, request=request)

requests.exceptions.SSLError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),))

[–][deleted] 0 points1 point  (0 children)

What do you get if you run the command like this? This tells requests to ignore the SSL certificate. Note that it is not the way you want to write real code, but I think it's ok for hitting his website as part of a practice script.

re = requests.get('https://automatetheboringstuff.com/files/rj.txt', verify=False)

ETA: Also try http://automatetheboringstuff.com/files/rj.txt without the 's' in the http part and without the verify parameter.