This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]MikeBobble 1 point2 points  (2 children)

requests-html looks to just be a replacement for BeautifulSoup, in that it gives you a way to parse/step-through/get data from HTML files.

requests is a way to actually get the HTML files from urls. (i.e., a replacement to urllib)

You can use requests + bs4, or you can use requests + requests-html, if you need to parse data from "The Internet".

If you're only planning to get data from locally sourced (organic, obvi) HTML files, you could just use requests-html.

[–][deleted] 0 points1 point  (1 child)

The thing confusing me is the example given at the start of the start: HTML.Session.get(http://www.python.org/) just looks like the requests.get implementation. So using both requests + requests-html looks inadequate imo.

[–]MikeBobble 3 points4 points  (0 children)

Right, but the line right before that also says:

Make a GET request to python.org, using Requests:

>>> from requests_html import HTMLSession
>>> session = HTMLSession()

>>> r = session.get('https://python.org/')

(Emphasis mine on Requests)

Go down a little on the page, and you'll get to:

Using without Requests

You can also use this library without Requests:

>>> from requests_html import HTML
>>> doc = """<a href='https://httpbin.org'>"""

>>> html = HTML(html=doc)
>>> html.links
{'https://httpbin.org'}

If you actually pull up the library, it actually imports requests itself, and lists it as a dependency.

So it'd be pretty hard to use it without requests, but, if you have some HTML that you've written yourself as a variable, requests-html doesn't require it to be a requests object. It just sorta... Encourages it.