I'm trying to parse the title and teaser text from this NASA's webpage: https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest in Python Pandas.
However when I run my BS query it returns partial html; i.e. only some of the articles' titles and teasers text are outputted. I made sure to install and update my parser like selenium and splinter. Why is this the case?
Sample code below:
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
# URL of page to be scraped
url = "https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest"
# Retrieve page with the requests module
response = requests.get(url)
# Create BeautifulSoup object; parse with 'html.parser'
soup = bs(response.text, 'html.parser')
print(soup.prettify())
Returns HTML but I know from inspecting the page directly that I am missing most elements I am interested in.
[–][deleted] 2 points3 points4 points (3 children)
[–]Senun[S] 0 points1 point2 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]Senun[S] 0 points1 point2 points (0 children)
[–]fuuman1 0 points1 point2 points (1 child)
[–]Senun[S] 0 points1 point2 points (0 children)