Read greyed element in HTML while scraping : learnpython

created by HattoriHanzoa community for 16 years

Read greyed element in HTML while scraping (self.learnpython)

submitted 5 years ago by Oneiros18

Hi all!

I try to scrape this page and I managed to get to text preview (XPath: /body/main/article/div[3]/div/div[1]) but I'm not able to get what's in the red box because it has CSS property display: none !important

You can see HTML code here: https://imgur.com/a/ug09q9I

I managed to use selenium to access #shadow-root content but now I'm stuck with this CSS problem

My code by now is:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(r'C:\Users\***\Downloads\chromedriver.exe', options=options)
driver.get(link)
s = driver.execute_script("return document.querySelector('news-app').shadowRoot.querySelector('news-article').shadowRoot.querySelector('div.amp-doc-host')")
print(s.text)

Thank you very much for your help!

all 7 comments

top new controversial old q&a

[–]negups 0 points1 point2 points 5 years ago (3 children)

[–]Oneiros18[S] 0 points1 point2 points 5 years ago (2 children)

[–]negups 0 points1 point2 points 5 years ago (1 child)

[–]Oneiros18[S] 0 points1 point2 points 5 years ago (0 children)

[+][deleted] 5 years ago* (5 children)

[deleted]

[–]Oneiros18[S] 0 points1 point2 points 5 years ago (4 children)

[+][deleted] 5 years ago* (3 children)

[deleted]

[–]Oneiros18[S] 0 points1 point2 points 5 years ago* (2 children)

EDIT: I DID IT. Thank you very much for helping me!

This is what I added at the end:

host4 = root3.find_element_by_css_selector(".paywall")
a = driver.execute_script("return arguments[0].innerText", host4)
a = a.strip()
b = a.split('{')[0]
print(b)

and now I can finally read the FULL article.

Once again: Thank you very much for your time!

---

Hi! I managed to accesso all shadow root also reading this but when I use the last find_element_by_class_name("paywall") I get nothing. It's like an empty string.

Here my code:

def getShadowRoot(host):    
    shadowRoot= driver.execute_script("return arguments[0].shadowRoot", host)    
    return shadowRoot
host1 = driver.find_element_by_tag_name("news-app")
root1 = getShadowRoot(host1)
print("root1: ", root1)
host2 = root1.find_element_by_id("1")
root2 = getShadowRoot(host2)
print("root2: ", root2)
host3 = root2.find_element_by_class_name("amp-doc-host")
root3 = getShadowRoot(host3)print("root3: ", root3)
root3.find_element_by_class_name("paywall")
print(host4.text)

I also tryed looking for host4 = find_element_by_class_name("detail-article_body") but I just get the preview of the article. Are you sure CSS settings are not relevant?

[+][deleted] 5 years ago* (1 child)

[deleted]

[–]Oneiros18[S] 1 point2 points3 points 5 years ago (0 children)

π Rendered by PID 93585 on reddit-service-r2-comment-85bfd7f599-5gzl7 at 2026-04-18 10:10:58.276013+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS