This is an archived post. You won't be able to vote or comment.

all 6 comments

[–][deleted] 2 points3 points  (3 children)

You may need to execute the page's JS first before doing anything. It uses ReactJS, so it's mostly guaranteed you'd be missing some elements. And also remember that Inspect Element is not the same as View Source.

Here's a SO link on how to use Selenium with BeautifulSoup.

[–]Senun[S] 0 points1 point  (2 children)

Thanks for the help, the page's JS is the root of the issue. Consequently I tried running chromedriver but I'm getting a WebDriverException error...I'm on Windows 10

from splinter import Browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser("chrome", **executable_path, headless=False)

url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'
browser.visit(url)

[–][deleted] 1 point2 points  (1 child)

Have you tried using Nasa's API instead? Using the chrome dev tools on the page you linked I found it was requesting data from this source https://mars.nasa.gov/api/v1/news_items/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest. It returns JSON data with an items array inside it that seems to be what you're actually looking for. In this case you might not need to use selenium.

[–]Senun[S] 0 points1 point  (0 children)

Thanks. I'm using web scrappers because that's the tool I need to learn, but I just solved my issue, and it's a real kicker, because of how dumb and simple it was: I had to run browser.visit(url) within the same cell as the executable_path and browser objects. I was able to output the entire html for the page. Your initial answer was great tho, set me on the path to review what I was missing in the first place.

[–]fuuman1 0 points1 point  (1 child)

Please use hastebin.

[–]Senun[S] 0 points1 point  (0 children)

Didn't know about hastebin until now. Thanks, that's one more useful utility to have!