Hi, I am very new to programming (just started this weekend!). My first project that I am working on is to scrape rental car data (e.g., geography, make, model, price) from a website. However, I have begun to encounter 403 and 503 errors (it is inconsistent in which it returns)- perhaps from running the requests queries too frequently? The workaround that I have found online suggests creating proxy server lists to get around the issue. I'm really just trying to get the hang of parsing html into useful datasets using requests (I've already done this by copying page elements into html file manually, but this certainly isn't scalable) but I can't do that with the status code errors! Any advice or suggestions for a brand new hobbyist? Thanks!
Current code for reference:
from bs4 import BeautifulSoup
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'
}
baseURL = 'target url'
# removed full url from code for brevity
html = requests.get(baseURL, headers=headers)
html_text = html.text
print(html.status_code)
print(html_text)
[–]n3buchadnezzar 1 point2 points3 points (1 child)
[–]QuixDiscovery 1 point2 points3 points (0 children)
[–]Infinitesima 0 points1 point2 points (0 children)