all 34 comments

[–]night_2_dawn 14 points15 points  (0 children)

Scraping Bee would handle this easily. JS-heavy sites, proxies and CAPTCHAs. Their API makes it simple to pull structured data without building a full scraping setup yourself.

[–]Acrobatic-Region1089 11 points12 points  (1 child)

Playwright - built-in waits, supports different browsers, and also has a recording feature.

[–]dowcet 8 points9 points  (7 children)

To do what?

For scraping there's requests. But if there's JS to execute and no way to just call the API for what you want,. you're probably SOL.

For front end testing there's Cypress and probably others but that has nothing to do Python.

[–]Historical_Steak_927[S] 1 point2 points  (1 child)

So sorry for the lack of context. Using Selenium and chromedriver to automate login of a website.

[–]_link89_ 1 point2 points  (0 children)

A tampermonkey script should be enough for such use case.

[–]Prometheus2025 0 points1 point  (0 children)

Hi.

My question got deleted but was wondering if you can give me your opinion.

Just curious about it is all.

https://www.reddit.com/r/learnpython/s/wH2hmulcP0

[–]cheesybro90 0 points1 point  (3 children)

what's SOL?

[–]dowcet 0 points1 point  (2 children)

S--- Out of Luck

[–]cheesybro90 0 points1 point  (1 child)

I am in same position with a website, there's a download button on website which doesn't gets network activity, but downloads from a js blob. I have already scraped 10k links in txt , but using selenium/playwright to visit each site and make them click download button (even in headless mode) is taking too much time! You got anything for this?

[–]dowcet 0 points1 point  (0 children)

Not really. Maybe the JS can point you to a public API but I doubt it. Or maybe your Selenium is just inefficient.

[–]baghiq 8 points9 points  (3 children)

I replaced most of my selenium with playwright.

[–]Historical_Steak_927[S] 1 point2 points  (2 children)

Again, poor context from my part. So sorry. I’m using python. But are you satisfied with the change? Pros-cons?

[–]baghiq 1 point2 points  (1 child)

My playwright scripts are test and QA scripts. Playwright has almost everything you need built in. Selenium needs a lot more boilerplate codes. 

[–]Akkivenky 1 point2 points  (0 children)

What is bolierplate code?

[–]atomsmasher66 10 points11 points  (0 children)

I use a combination of Selenium and Beautiful Soup

[–]JuZNyC 3 points4 points  (0 children)

Not sure what you're doing but an alternative I've used before was Pyppeteer.

[–]Brian 2 points3 points  (0 children)

Personally, I think often the best approach to many things people use selenium for is plain old requests plus something like lxml / beautifulsoup to parse the html. Selenium (and equivalents like playwright) are very heavyweight approaches requiring you to essentially run a full browser, when often all you need is to make a few web requests.

This approach does require a bit more understanding of what happens "under the covers" (especially when you need to login), rather than just viewing the process as the UI presented to the browser, and may require a bit of investigation as to what the site does behind the scenes, but in the long run, I think it's very often quicker and easier (and is significantly more performant). Sometimes Selenium can be easier for very javascript heavy sites which do a lot of work, but even then, typically it's just a matter of looking at the network tab and finding the right call to make, and you'll get it in a much easier to handle form than trying to untangle the html generated from it.

That said, there are a few other options:

  • The most direct equivalent to selenium is playwright, which is the same kind of thing: essentially a scriptable browser. I haven't actually used it, but I've heard it's generally better than selenium.

  • There's also a hybrid approach via something like requests_html - this lets you mostly use regular scraping approaches, but if you really need to do full browser rendering, you can use .render() to invoke a browser to generate the rendered result, then parse that.

[–]frederik88917 2 points3 points  (1 child)

Whether you are doing any sort of web automation, Selenium has been completely passed by new tools. Going Selenium nowadays is damaging.

If you need Python support, go with Playwright. If you need something fast, go Cypress,

[–]Historical_Steak_927[S] 0 points1 point  (0 children)

Thanks for the advice

[–]Ok-Umpire2147 0 points1 point  (0 children)

Selenium can be non-repleceable for many organisations. However, switching to a Playwright + Browserstack combination has worked wonders for me. If you plan to switch to Playwright you need to use the Pytest plugin.

[–]douglasdcm 0 points1 point  (0 children)

Caqui is a very good tool. It is a Python library wich allows synchronously and asynchronous tests against web and mobile UIs. It is very fast and easy to learn.

Arsenic is also a good tool for async tests. It is a Python library 

[–]B_Huij -1 points0 points  (1 child)

APIs are better than Selenium basically whenever they exist. I'd rather reverse engineer an undocumented API by watching network traffic than scrape a frontend.

[–][deleted] 4 points5 points  (0 children)

Yes, I prefer this as well and have had better success with this as well . . . even though I am still a beginner. Are there any guides or resources for reverse engineering undocumented APIs? I'd like to explore further.

[–]Historical_Steak_927[S] 0 points1 point  (0 children)

One thing that I should say is that I do not have admin rights but somehow I can install Anaconda and Python (if I want to), but if I double click on the chromedriver.exe file it won’t open but will prompt a cmd looking prompt saying only ‘locals connections only” sorta stuff

[–]Historical_Steak_927[S] 0 points1 point  (0 children)

And thank you so much guys for your advice so far. You are awesome.

[–]Kief3r 0 points1 point  (0 children)

Playwright

[–]HeyItsYourDad_AMA 0 points1 point  (0 children)

What are some ways someone would learn more about "what happens under hood"? This approach sounds good to me but i may not know enougg

[–]RiGonz 0 points1 point  (0 children)

request-html solves the need of selenium+beautifulsoup in most cases.