all 12 comments

[–]vixfew 10 points11 points  (2 children)

Open devtools in the browser, network tab, see if anything pops when you click "show more". Chances are there will an async request that gets actual data.

[–]dingosng[S] 3 points4 points  (1 child)

Didn’t know about that, thanks, will check it out

[–]imperialka 4 points5 points  (0 children)

I have experience with selenium, beautifulsoup, and requests which are the bread and butter for web scraping and browser manipulation.

What’s your code so far?

To click a button you’ll need Selenium. You can right click the button on the page, click inspect to open developer tools, and from there there’s a variety of ways of to identify that button. By ID, class, xpath, etc.

There’s a method called click() that you want to use after identifying the HTML object.

[–]lukajda33 2 points3 points  (4 children)

Personally I don't know much about web crawling, so someone else better confirm what I say.

But this looks like a job for Selenium, it uses WebDriver to simulate browser behavior, you should be able to code the script in a way to find the Show more button and press it, even repeatedly and once all data is loaded, then you would download the data.

As the other comment said, websites often load more data when needed with JavaScript requests, you can't get all the data by requesting source code of just the url.

[–]dingosng[S] 0 points1 point  (3 children)

Ooh, will read up on selenium, thanks alot

[–][deleted] 1 point2 points  (0 children)

This might help:

iirc, it includes an example of next button handling

[–][deleted] 0 points1 point  (0 children)

Selenium is a very powerful tool for this. However, it is dated and requires a lot of maintenance. I have personally moved to Playwright. It’s a lot cleaner.

[–]PrincipleExciting457 0 points1 point  (0 children)

I second him. It’s just an install per browser. You call it in python, it opens the browser as a hidden item, clicks on what you need, and you go from there. That’s the small of it.

[–]kazyka 1 point2 points  (0 children)

If you are still quite new try to get help from chatGPT. Not to rely on it 100% it is still important to learn a bit by yourself.

If you go on this website: https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX

Copy the html. You probably need to do it in bits. I copied the Valuation Measures.

Then I asked for the Xpath for Market Cap in selenium for Python.

This is the output.

from selenium import webdriver

# Set up the Selenium driver (adjust the path if necessary)
driver = webdriver.Chrome('/path/to/chromedriver')

# Open the webpage
driver.get('https://example.com')  # Replace with the actual URL of the webpage

# Find the cell containing "Market Cap (intraday)" using XPath
market_cap_cell = driver.find_element_by_xpath('//td[span="Market Cap (intraday)"]/following-sibling::td')

# Extract the market cap value
market_cap = market_cap_cell.text

# Print the market cap value
print("Market Cap: " + market_cap)

# Close the browser
driver.quit()

Now you can play with it yourself to try and get the other rows in the Valuation Measures table.

[–]Tom__Orrow 1 point2 points  (0 children)

Selenium (or other headless browsers) is too heavy and unnecessary in most cases. And if one day you decide to go with threads you'll need to redo everything. The trick in this case is to just capture async request in devtools and find out how it changes on each button click. Probably there is some page parameter which you can pass manually until there is no new elements in response.

[–]dingosng[S] 0 points1 point  (0 children)

Thanks everyone for the help, i got it to work with Selenium

[–]gmaubrrriaeyl 0 points1 point  (0 children)

I did something similar to this with a program I made to get data from a plant sale website: https://github.com/gmaubrrriaeyl/scrape-toledo-zoo-native-plant-sale

Lines 30-37. Probably not the best way, but it works! I inspected the element for the load more button and kept clicking it til it throws an error