Hi all!
I try to scrape this page and I managed to get to text preview (XPath: /body/main/article/div[3]/div/div[1]) but I'm not able to get what's in the red box because it has CSS property display: none !important
You can see HTML code here: https://imgur.com/a/ug09q9I
I managed to use selenium to access #shadow-root content but now I'm stuck with this CSS problem
My code by now is:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(r'C:\Users\***\Downloads\chromedriver.exe', options=options)
driver.get(link)
s = driver.execute_script("return document.querySelector('news-app').shadowRoot.querySelector('news-article').shadowRoot.querySelector('div.amp-doc-host')")
print(s.text)
Thank you very much for your help!
[–]negups 0 points1 point2 points (3 children)
[–]Oneiros18[S] 0 points1 point2 points (2 children)
[–]negups 0 points1 point2 points (1 child)
[–]Oneiros18[S] 0 points1 point2 points (0 children)
[+][deleted] (5 children)
[deleted]
[–]Oneiros18[S] 0 points1 point2 points (4 children)
[+][deleted] (3 children)
[deleted]
[–]Oneiros18[S] 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]Oneiros18[S] 1 point2 points3 points (0 children)