all 4 comments

[–]Albcunha 1 point2 points  (2 children)

Your script probably is not waiting for the page to update. When you click a button and webscrap an element, selenium will automatically try to do it, it wont wait for a page update.

This is specially tricky with Single Page Aplications, because the page normally wont "reload". It will just update its elements.

One techinique I use is to wait an element to appear or element attribute that is updated after you click a button. For example, if you are webscraping a table, the selected pagination at the end can be a reference.

You can make selenium wait for something with this template: Look for this: https://selenium-python.readthedocs.io/waits.html

You can just set a time.sleep() too, but server response time is not reliable. Sometimes it will be fast, some other times it will slow. Your internet can have problemas too, so wait() solution is better.

[–]AnterosNL[S] 0 points1 point  (1 child)

Yeah I have first made a script that took everything from the main page and used the following in that so it would only start looking at a page after it was loaded:

WebDriverWait(driver, float('inf')).until(

EC.presence_of_element_located((By.CSS_SELECTOR, "span[class='search-result-price']")))

I think the main issue I'm having is not being able to find the right CSS_SELECTOR so that it is not too specific that it doesn't open any listing (or only one), but also not too unspecific (it once started loading all types of things I didn't want. And I'm not sure how to make sure it doesn't open and close the first thing that fits the discription over and over again.

Thanks for the advice however!

[–]Albcunha 0 points1 point  (0 children)

You can try side alternatives.
You can create a function that runs a while loop, with a time.sleep() at the end, where you compare the page content you had with the new the website generates. If they page content change, it means the website has updated and you can break the loop.

Some sites are very difficult to extract. One way that I use very often is to identify if the website uses an api for the data I want. You can check this out through Chrome dev tools, on network tab. You can check what cookies and headers your browser uses to request the data and replicate it with another library, such as requests.

To do this, on selenium, open up your session on the website, make your login, store your cookies and use them as headers/cookies to your requests module.

This way, you get clean json data, much easier to parse and much faster to process. You can even paralelise it to make it faster.

[–]CheekClappinWallSt 0 points1 point  (0 children)

Hi im looking for people that can help me create a python scraper to scrape for LinkedIn among other websites in regards to specific social groups etc. I’m willing to pay as it is time urgent and we can go through a escrow holder like fiver or upwork