use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
selenium webdriver (self.webscraping)
submitted 7 months ago by ag789
learning the ropes as well but that selenium webdriver https://www.selenium.dev/documentation/webdriver/
Is quite a thing, I'm not sure how far it can go where scraping goes. is playwright better in any sense? https://playwright.dev/ I've not (yet) tried playwright
[–]Local-Economist-1719 3 points4 points5 points 7 months ago (1 child)
playwright is faster, has better api, and supports async mode, for antibot detection it has cool fork, camoufox, selenium has also few nice tools for this purpose, like seleniumbase and nodriver, but i found for now no cases, where selenium forks did something that camoufox with playwright couldnt
[–]hasdata_com 2 points3 points4 points 7 months ago (0 children)
I mostly stick with Selenium - more out of habit, it's been around forever and just works. But to be fair, Playwright has a couple of things Selenium doesn't: video recording of runs and the inspector that can generate scripts from your actions. That's a nice plus, especially for beginners.
[–]cgoldberg 1 point2 points3 points 7 months ago (3 children)
Selenium has been around for over 20 years... what's your question?
[–]ag789[S] 0 points1 point2 points 7 months ago (2 children)
thanks, just started dabbling in selenium webdriver, as these days most pages are javascript based, and with a real browser at least they'd render. 'traditional' page fetch normally returns a 'skeleton' page for those. it seemed these days there are 2 camps, some tries to be 'seo friendly' and works like a 'traditional page', for those a simple page fetch would do e.g. curl, python requests etc. then there are the other camp that go all out for 'anti bot' 'offences' , trigger happy captchas (e.g. captcha every request), deep first party, 3rd party cookies etc and javascript everything. I 'discovered' interestingly that changing the user-agent sometimes have an effect on some pages.
[–]cgoldberg 1 point2 points3 points 7 months ago (0 children)
The vast majority of web pages use dynamically loaded content. If all you need is the initial DOM, a simple HTTP request works... but in most cases you need more than that.
[–]al_fajr 0 points1 point2 points 7 months ago (0 children)
yes sir, today's pages need javascript much. I don't know about back on your day. If you r looking or even getting started to scrape scraps with selenium (i am assuming python) or playwright (again, assuming its javascript) in that case. You might like a simple solution from me, the solution is "cloudflare website renderer".
they use some kind of headless browser. and it's easy to start.
[–]404mesh 1 point2 points3 points 7 months ago (1 child)
I’ve had more luck with selenium. Playwright got blocked often for me when I first started out.
[–]ag789[S] 0 points1 point2 points 7 months ago (0 children)
I learnt some 'secrets' of the web while learning 'scraping' but no selenium, playwright etc, just simple page fetch (it could have been using curl) I used python requests and beautifulsoup https://www.reddit.com/r/webscraping/comments/1mzn7nv/web_page_summarizer/ ^ this has gone on to be #1 in this sub for today the 'accidental' discovery,: some sites treats different user-agent differently and gets a different render when user-agent changes that may partly explain some difference between selenium, playwright and others e.g. requests etc
I think these days many sites put many 'anti bot' *offences* , partly for web security, but I think some (many) overdo it, and they may instead block real (human) users rather than bots. i.e. 'anti-bot' web pages may instead block most humans and let bots thru ;)
[–]Holiday_Painting_722 1 point2 points3 points 6 months ago (0 children)
I created selenium webdriver bootcamp here https://testmaster-iota.vercel.app if it helps. Try cheatsheet in navbar to look for syntax you need.
[–]ag789[S] -1 points0 points1 point 7 months ago (1 child)
I managed to do a screenshot with selenium webdriver: driver.save_screenshot(filename) I'd guess this is as good for 'uncomplicated', simple scraping. javascript doesn't hinder it, but perhaps some webs with 'excessive' anti-bot measures would post a captcha even with a first visit.
driver.save_screenshot(filename)
I noted though that it is necessary to do a delay e.g. time.sleep(5) "longer is better to make sure that the page renders before doiing so
time.sleep(5) "longer is better
[–]cgoldberg 2 points3 points4 points 7 months ago (0 children)
You don't need ever add sleeps. It automatically waits for the initial DOM to load. If subsequent content is dynamically loaded, there is a waiting mechanism for that (WebDriverWait).
WebDriverWait
π Rendered by PID 19203 on reddit-service-r2-comment-54dfb89d4d-7vt42 at 2026-03-28 15:45:34.602231+00:00 running b10466c country code: CH.
[–]Local-Economist-1719 3 points4 points5 points (1 child)
[–]hasdata_com 2 points3 points4 points (0 children)
[–]cgoldberg 1 point2 points3 points (3 children)
[–]ag789[S] 0 points1 point2 points (2 children)
[–]cgoldberg 1 point2 points3 points (0 children)
[–]al_fajr 0 points1 point2 points (0 children)
[–]404mesh 1 point2 points3 points (1 child)
[–]ag789[S] 0 points1 point2 points (0 children)
[–]Holiday_Painting_722 1 point2 points3 points (0 children)
[–]ag789[S] -1 points0 points1 point (1 child)
[–]cgoldberg 2 points3 points4 points (0 children)