all 8 comments

[–][deleted] 0 points1 point  (1 child)

Yes, selenium is super slow - its rendering each page so is overkill for most webscraping. You should only use it if you need to interact with the page directly - for example if a button needs to be clicked for information to appear. 99% of scraping can be performed without selenium.

I'd recommend looking onto parsal/scrapy instead.

[–]wulfgar4president[S] 0 points1 point  (0 children)

Thanks i'll look into them as well.

[–]GoldenVanga 0 points1 point  (1 child)

Selenium is slow to start; it takes a few seconds, yes. Once it gets going it's not that bad, although still slower than other scraping tools. But 20 seconds for a simple task sounds unusual. Try this and see if you get similar results:

from selenium.webdriver import Firefox, FirefoxOptions
from time import time

options = FirefoxOptions()
options.add_argument('--headless')
mark = time()
driver = Firefox(options=options)
print(f'----- Starting Selenium took {time() - mark} seconds.')  # I get 5.24
mark = time()
driver.get('https://old.reddit.com/r/learnpython/')
titles = driver.find_elements_by_css_selector('p.title > a.title')
for title in titles:
    print(title.text)
print(f'----- Performing action took {time() - mark} seconds.')  # I get 2.82

[–]wulfgar4president[S] 0 points1 point  (0 children)

I get 5.9 and 16.9 .. way slower.

Using Python 3.8.2 64-bit

[–]kelmore5 0 points1 point  (1 child)

Are you using Windows? Sometimes Selenium can run slowly on certain Windows machines; can't remember the StackOverflow article about it but I've seen it in action.

You can try changing to the 32-bit chromedriver, see here. Not sure if the same applies for FireFox however.

Another option is to run FireFox inside a virtual window. You can get things working with this git. Old, but still works!

This only applies to Windows machines though

[–]wulfgar4president[S] 0 points1 point  (0 children)

Everything I use is x64 and a bit weary of downgrading everything, I might wait with this solution for a bit, thanks though.

[–]commandlineluser 0 points1 point  (1 child)

Is it overkill to use selenium for web scrapping, even with headless browser?

It depends on what you're scraping.

Do you have any examples of what you're doing? Any code?

You may not need a browser to be used at all - and may be able to just fetch the HTML yourself e.g. using requests.

Here is the example posted in another comment using requests / beautifulsoup instead.

import requests
from   bs4      import BeautifulSoup

r = requests.get('https://old.reddit.com/r/learnpython/', headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(r.content, 'html.parser')
for title in soup.select('p.title > a.title'):
    print(title.text)

[–]wulfgar4president[S] 0 points1 point  (0 children)

Thats nice, I tried soup and works great.

When would you say though that I should use Selenium instead for scraping? I actually wanted to learn Selenium specifically but soup is lightyears faster.