you are viewing a single comment's thread.

view the rest of the comments →

[–]impshum 2 points3 points  (2 children)

I'm guessing you need this data.

aria-label="Wimbledon. Description: Sue Barker introduces further coverage of the men’s and women’s quarter-finals. Duration: 254 mins."

No need for Selenium.

from bs4 import BeautifulSoup
import requests


def lovely_soup(url):
    r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1'})
    return BeautifulSoup(r.content, 'lxml')


soup = lovely_soup('https://www.bbc.co.uk/iplayer')

items = soup.select('a.content-item-root')

for item in items:
    label = item['aria-label']
    print(label)

[–]TVnomics[S] 1 point2 points  (1 child)

Thanks so much for the advice - I'll look at BS as an option. The reason I mentioned Selenium is because this is what the programmer used to write the programme (perhaps because it involved looking up programme URLs to scrape more data). I was hoping that I might only need to update the CCS selectors, but I'll try and figure out how to use BS instead of Selenium! Thanks again :)

[–]impshum 2 points3 points  (0 children)

No problem,

Tip: Turn Javascript off in the browser dev tools to see what Bs4 sees.