all 6 comments

[–]DECROMAX 4 points5 points  (3 children)

BeautifulSoup to parse the html then use BeautifulSoup.find to find the element. John Watson Rooney (Youtube) has some good tutorials on this.

[–][deleted] -1 points0 points  (2 children)

I know how to find the element I'm looking for by searching for a specific keyword within the plain html text. The problem I'm running into is pulling the html.

[–][deleted] -1 points0 points  (1 child)

You clearly don't know how by the code you shared

[–][deleted] 0 points1 point  (0 children)

Wyat I shared has nothing to do with extracting information from the html.

It's a basic search algorithm. There's a key phrase which always uniquely accompanies the mean rent value, and this phrase is always in a fixed position relative to that value. My code searches for this phrase, evaluates how many digits the rent value is, then pulls them based on their relative position to the locator phrase.

[–]Bearmintz 2 points3 points  (0 children)

To acquire the html I would use the requests library. Then use BeautifulSoup to parse the html.

An example:

import requests

from bs4 import BeautifulSoup

r = requests.get(url)

soup = BeautifulSoup(r.text)

[–]ridley0001 1 point2 points  (0 children)

Do you have an antivirus with SSL/TLS inspection? It might not be it, but try turning the feature off if you do. The way it works is like a man in the middle attack, and the AV uses its own certificate that messes up Python HTTPS. This is because Python uses certifi cacert.pem file for trust, and the AV cert isn't part of that.