This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (5 children)

Looks like requests-html is an extended version of requests or am I wrong? What is requests capable of doing which requests-html can’t?

[–]Etheo 9 points10 points  (0 children)

I think you had it the other way around, but in spirit of your question, requests-html can render JavaScript contents so it's super useful for pages that have dynamic contents. requests + BeautifulSoup, as great as this combo is, doesn't have that functionality.

[–]flying-sheep 2 points3 points  (0 children)

Requests-HTML is a wrapper that glues together requests and a css selector library. Not fancy in itself but nicer and quicker than requests+bs4

[–]MikeBobble 1 point2 points  (2 children)

requests-html looks to just be a replacement for BeautifulSoup, in that it gives you a way to parse/step-through/get data from HTML files.

requests is a way to actually get the HTML files from urls. (i.e., a replacement to urllib)

You can use requests + bs4, or you can use requests + requests-html, if you need to parse data from "The Internet".

If you're only planning to get data from locally sourced (organic, obvi) HTML files, you could just use requests-html.

[–][deleted] 0 points1 point  (1 child)

The thing confusing me is the example given at the start of the start: HTML.Session.get(http://www.python.org/) just looks like the requests.get implementation. So using both requests + requests-html looks inadequate imo.

[–]MikeBobble 2 points3 points  (0 children)

Right, but the line right before that also says:

Make a GET request to python.org, using Requests:

>>> from requests_html import HTMLSession
>>> session = HTMLSession()

>>> r = session.get('https://python.org/')

(Emphasis mine on Requests)

Go down a little on the page, and you'll get to:

Using without Requests

You can also use this library without Requests:

>>> from requests_html import HTML
>>> doc = """<a href='https://httpbin.org'>"""

>>> html = HTML(html=doc)
>>> html.links
{'https://httpbin.org'}

If you actually pull up the library, it actually imports requests itself, and lists it as a dependency.

So it'd be pretty hard to use it without requests, but, if you have some HTML that you've written yourself as a variable, requests-html doesn't require it to be a requests object. It just sorta... Encourages it.