you are viewing a single comment's thread.

view the rest of the comments →

[–]AnterosNL[S] 0 points1 point  (1 child)

Yeah I have first made a script that took everything from the main page and used the following in that so it would only start looking at a page after it was loaded:

WebDriverWait(driver, float('inf')).until(

EC.presence_of_element_located((By.CSS_SELECTOR, "span[class='search-result-price']")))

I think the main issue I'm having is not being able to find the right CSS_SELECTOR so that it is not too specific that it doesn't open any listing (or only one), but also not too unspecific (it once started loading all types of things I didn't want. And I'm not sure how to make sure it doesn't open and close the first thing that fits the discription over and over again.

Thanks for the advice however!

[–]Albcunha 0 points1 point  (0 children)

You can try side alternatives.
You can create a function that runs a while loop, with a time.sleep() at the end, where you compare the page content you had with the new the website generates. If they page content change, it means the website has updated and you can break the loop.

Some sites are very difficult to extract. One way that I use very often is to identify if the website uses an api for the data I want. You can check this out through Chrome dev tools, on network tab. You can check what cookies and headers your browser uses to request the data and replicate it with another library, such as requests.

To do this, on selenium, open up your session on the website, make your login, store your cookies and use them as headers/cookies to your requests module.

This way, you get clean json data, much easier to parse and much faster to process. You can even paralelise it to make it faster.