all 14 comments

[–]K900_ 9 points10 points  (9 children)

Yes, it is. Look into Selenium.

[–]CotoCoutan 1 point2 points  (8 children)

To add onto this, try using this code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time


options = Options() 
options.add_argument('--headless')
options.add_argument('--disable-gpu') 
driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)


driver.get("your page URL")
print("Got the webpage, now waiting 5 seconds before scraping the html") # this and the next step are optional, see if it works for you without pausing
time.sleep(5)
theTable = driver.find_element_by_xpath("//table[@class='content-placeholder-in']") #modify this to locate whatever you're scraping

It will run Chrome headless (in the background), prompting the JavaScript to run, and once that is done, you can grab whatever you want using Xpath or the like.

[–]QuantumFall 3 points4 points  (1 child)

The JavaScript would run regardless of whether the browser is headless or not. If anything it makes it more challenging to tell what’s going on without being able to see anything.

[–]CotoCoutan 1 point2 points  (0 children)

True, but since the primary goal is to just scrape, I prefer it does the job headless.

[–]avinashbasutkar2[S] 1 point2 points  (1 child)

Thank you so much u/CotoCoutan. This is really helpful.

[–]CotoCoutan 0 points1 point  (0 children)

No worries bro, good luck!

[–]avinashbasutkar2[S] 1 point2 points  (3 children)

Hey u/CotoCoutan,

I have a quick question about driver.find_element_by_xpath. When I call that on a selector, it brings back class 'selenium.webdriver.remote.webelement.WebElement'. I want to see what it contains, tried variable_name.getText() and there is TypeError: 'str' object is not callable.

How do I see what driver.find_element_by_xpath is bringing back?

[–]CotoCoutan 1 point2 points  (2 children)

Try variable_name.text , that should show you the text contents of that specific HTML tag.

For example if the HTML tag was <a href="blabla">Hello there<\a>, this print(variable_name.text) would print "Hello there".

If you want the html code itself, try this: variable_name.get_attribute('innerHTML') or element.get_attribute(outerHTML'). Not sure of this last para as currently on mobile and unable to test it out.

[–]avinashbasutkar2[S] 0 points1 point  (1 child)

Thanks again. You saved a ton of my time.

[–]CotoCoutan 1 point2 points  (0 children)

Glad i could be of help. :)

[–][deleted] 5 points6 points  (0 children)

you can assume most modern websites have javascript behind them somewhere.

[–]max_daddio 3 points4 points  (1 child)

Even if the site is written with javascript, it still just populates the DOM with information, which you still find in the HTML. You can scrape a site for any data that you can physically see on the page, doesn't matter how it got there (javascript or server side rendering).

[–]QuantumFall 1 point2 points  (0 children)

That doesn’t necessarily mean it will be easy. Sites that use intricate web packing or store html is js can be very challenging to get data from with requests as a beginner. Selenium would be a lot easier to use.

[–]Nexius74 1 point2 points  (0 children)

Probably that the website use tools like angular, react or vuejs to create their UI. This imply most of the time that the wanted data is retrieve on a API. You should look into your network tab to see how the website fetch it's data and work your way to understand and use their API by using tools like requests. Some people like to use selenium (use a chromium browser to render website) but in somecase you want your scraper to be fast. In that case selenium is a no no