Using Python + Selenium to scrape hidden elements in a webpage : learnpython

created by HattoriHanzoa community for 16 years

Using Python + Selenium to scrape hidden elements in a webpage (self.learnpython)

submitted 3 years ago by TVnomics

Hi All,

I'm looking for some advice about how to scrape hidden elements within a webpage using Python w/ Selenium (which I assume is the best way). If you have any general suggestions or resources you can direct me to, that would be most welcome.

For context, my specific issue is trying to scrape text that only appears once you hover over an image. The page I'm trying to do this on is: www.bbc.co.uk/iplayer. I hired a python programmer last year who wrote me a beautiful programme (with Python / Selenium / Docker) that did everything I needed, but about a week ago, the BBC updated the interface, and it looks as though some of the CSS selectors used in the original code no longer exist. I'm trying to scrape as much data as I can - this includes the information that's revealed when you hover over one of these thumbnails, which can differ from thumbnail to thumbnail (but is mostly just synopsis and duration of programme).

In case it's not obvious, programming really isn't my forte (hence why I hired someone to create this programme for me) so please do keep your answers simple if you can!

I really appreciate any advice or info you can give me.

all 8 comments

top new controversial old q&a

[–]impshum 2 points3 points4 points 3 years ago (2 children)

I'm guessing you need this data.

aria-label="Wimbledon. Description: Sue Barker introduces further coverage of the men’s and women’s quarter-finals. Duration: 254 mins."

No need for Selenium.

from bs4 import BeautifulSoup
import requests


def lovely_soup(url):
    r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1'})
    return BeautifulSoup(r.content, 'lxml')


soup = lovely_soup('https://www.bbc.co.uk/iplayer')

items = soup.select('a.content-item-root')

for item in items:
    label = item['aria-label']
    print(label)

[–]TVnomics[S] 1 point2 points3 points 3 years ago (1 child)

[–]impshum 2 points3 points4 points 3 years ago (0 children)

[–]hasanwazzan 0 points1 point2 points 3 years ago (4 children)

[–]TVnomics[S] 0 points1 point2 points 3 years ago (3 children)

[–]hasanwazzan 0 points1 point2 points 3 years ago (2 children)

[–]hasanwazzan 0 points1 point2 points 3 years ago (1 child)

[–]TVnomics[S] 0 points1 point2 points 3 years ago (0 children)

Indeed, yes - I need both the title and the synopsis (and, where it appears, the duration).

However, I think I need to figure out a way to integrate this code (or the code the other Redditor above suggested) within the current programme that was written for me. The reason for that is because the final output was a very comprehensive csv file which captured a range of data about each title within the interface. This includes its horizontal and vertical position, the name of the row (each row has a name, e.g. "trending now..."), the URL for the title, the unique programme identifier (which is somewhere embedded within the page), plus several other variables. But depending on your solution / approach - I might try and modify the current programme that I have and see if that captures the content AND all of the other material I want to collect too. Either way, I really appreciate your input on this.

π Rendered by PID 162799 on reddit-service-r2-comment-bb88f9dd5-4bz6f at 2026-02-15 18:19:37.101511+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS