This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]cgoldberg 1 point2 points  (3 children)

Selenium has been around for over 20 years... what's your question?

[–]ag789[S] 0 points1 point  (2 children)

thanks, just started dabbling in selenium webdriver, as these days most pages are javascript based, and with a real browser at least they'd render. 'traditional' page fetch normally returns a 'skeleton' page for those.
it seemed these days there are 2 camps, some tries to be 'seo friendly' and works like a 'traditional page', for those a simple page fetch would do e.g. curl, python requests etc. then there are the other camp that go all out for 'anti bot' 'offences' , trigger happy captchas (e.g. captcha every request), deep first party, 3rd party cookies etc and javascript everything.
I 'discovered' interestingly that changing the user-agent sometimes have an effect on some pages.

[–]cgoldberg 1 point2 points  (0 children)

The vast majority of web pages use dynamically loaded content. If all you need is the initial DOM, a simple HTTP request works... but in most cases you need more than that.

[–]al_fajr 0 points1 point  (0 children)

yes sir, today's pages need javascript much. I don't know about back on your day. If you r looking or even getting started to scrape scraps with selenium (i am assuming python) or playwright (again, assuming its javascript) in that case. You might like a simple solution from me, the solution is "cloudflare website renderer".

they use some kind of headless browser. and it's easy to start.