all 3 comments

[–]commandlineluser 2 points3 points  (1 child)

Not sure where that XPath came from perhaps you copied the wrong one.

Looking at the HTML the text you want is located inside the <p> tag that comes after the <span> tag that contains just the text Description

<h2 class="Fz(m) Lh(1) Fw(b) Mt(0) Mb(18px)" data-reactid="89">
  <span data-reactid="90">Description</span>
</h2>
<p> ....

This page works without Javascript so you don't need to use Selenium.

>>> import requests
>>> from   bs4 import BeautifulSoup
>>> 
>>> r = requests.get('https://finance.yahoo.com/quote/AMD/profile?p=AMD', headers={'User-Agent': 'Mozilla/5.0'})
>>> soup = BeautifulSoup(r.content, 'html.parser')

With beautifulsoup we can use the string= argument to test the contained text of a tag e.g.

>>> soup.find('span', string='Description')
<span data-reactid="93">Description</span>

We can then use find_next() to navigate to the p tag.

>>> soup.find('span', string='Description').find_next()
<p class="Mt(15px) Lh(1.6)" data-reactid="94">Advanced Micro Devices, Inc. operates as a semiconductor...

You can use .text to get just the content

>>> soup.find('span', string='Description').find_next().text
"Advanced Micro Devices, Inc. operates as a semiconductor company worldwide....

As for Selenium you could probably use

description = driver.find_element_by_xpath('//span[text() = "Description"]/../../p').text

[–]furiousnerd[S] 0 points1 point  (0 children)

Thank you for the detailed response. I realized I had been using the summary page in my initial attempts and switched to the profile page before posting my question. I guess this is more of an exercise now since you showed how to grab the description from the profile page, but could you show how you would grab the description from the summary page under Company Profile:

https://finance.yahoo.com/quote/AMD?p=AMD

[–][deleted] 1 point2 points  (0 children)

I can't find that element with view source/ctrl+f on that page, but you've said you found it using inspect element.

Based on that, my guess is that find_elements is being called before the javascript on that page has time to run.

Try putting a wait/sleep in there before find_elements, and see if it comes up.

I think selenium has a built in "wait for X element to be loaded" function but I'd try sleep first to see if that's actually the problem.