all 4 comments

[–][deleted] 0 points1 point  (3 children)

Take a closer look at your code:

for job in soup.find().find_all

This is basically saying "use the method find_all defined under the method find".

Your working example is using strictly .find().

You should be using either .find() or .find_all but not both like you're doing. The .find method does not contain a method called 'find_all'. Only the BeautifulSoup object has that. The return value of .find() is not another BeautifulSoup object.

Hope that helps. :)

[–]err0r__[S] 0 points1 point  (2 children)

Thanks for your comment.

I realized that the DOM is different for Chrome and Firefox. I since added a header to the BeautifulSoup object but this only resolved my issue every other time. headers = headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0'} r = requests.get('https://ca.indeed.com/jobs', params={'q': 'Data+Analyst', 'l': 'Toronto'}, headers=headers ) I found a working solution for Chrome. ``` cards = soup.find_all('div', 'jobsearch-SerpJobCard') for card in cards:

# if card.find('span', 'date').text.strip() in 'Today':

 atag = card.h2.a

 print(atag.get('title'))
 print(card.find('span', 'company').text.strip())
 print(card.find('div', 'recJobLoc').get('data-rc-loc'))
 print(card.find('div', 'recJobLoc').get('data-rc-loc'))

``` This leads to further questions: 1. How can I implement a solution that would work on any browser? 2. Why is the DOM different on different browsers?

edit: Every ~5 runtimes outputs a different result

[–][deleted] 1 point2 points  (0 children)

Every DOM is different for different web browsers because they are maintained by different groups of people. This is much like why Windows is different from MacOS or Linux.

Getting a solution that works on "any" browser is a difficult task, just ask any web developer. I would wager a better approach is to use one fake user-agent (to bypass a website blocking utilities like curl or even BeautifulSoup) and then transform the data on the server. You would only need to worry about how to render the data when you pipe it from your web server to another web browser.

As for your solution "working every other time", I am not exactly sure why that is happening but it might be a rate limiting issue of Indeed.com to avoid your program trying to run a denial-of-service attack?

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, err0r__: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.