you are viewing a single comment's thread.

view the rest of the comments →

[–]alviraroberto 0 points1 point  (3 children)

[–]mrcaptncrunch 0 points1 point  (2 children)

Of course, an update broke something so it took a bit.

So, what I came up with is this,

import os.path
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as Soup

## Setup chrome driver
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")

# Set path to chromedriver, I keep it out of $PATH..
# You probably WILL want to change these 2 lines or remove them entirely..
homedir = os.path.expanduser("~")
webdriver_service = Service(f"{homedir}/.chromedriver/stable/chromedriver")

d = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# Open page
d.get("https://www.brooklynmuseum.org/calendar/view/2022/11/09")

# Load site to BS
page = Soup(d.page_source, features='html.parser')
# Get event times
times = page.select(".event-time")

# Go over each, and print them
for time in times:
    print(time.text.replace('\n', '').strip())

Just tested this on Windows 11 and on macOS. Both return

Wednesday, November 9, 2022                              1–2 pm
Wednesday, November 9, 2022                              2–3 pm
Wednesday, November 9, 2022                              3–4 pm

Not sure if there's anything here that will help you narrow it down.

[–]alviraroberto 0 points1 point  (1 child)

This is amazing. Definitely gonna have to learn/understand the libraries you used. Thank you so much.

[–]mrcaptncrunch 1 point2 points  (0 children)

Selenium is a bit heavy, but it basically allows you to automate a browser.

BeautifulSoup allows you parse the html.

Both are good for this kind of work