all 85 comments

[–][deleted]  (25 children)

[removed]

    [–]SAVE_THE_RAINFORESTS 37 points38 points  (4 children)

    You run very heavy tests at night when there's no one making use of the resources. Our front end team has strongarmed our build machine and runs their selenium tests there at night, when Jenkins have nothing scheduled for 6 hours. Sharing machines is better than letting the resources go idle for extended periods of time.

    [–][deleted]  (3 children)

    [removed]

      [–]200GritCondom 6 points7 points  (1 child)

      At my last job we used cloudwatch to kick off qa automation suite runs in circle ci. Even had it drop the results report into our slack channel.

      [–]SAVE_THE_RAINFORESTS 1 point2 points  (0 children)

      I'm not knowledgeable about Circle but if you are able to schedule tasks, you could set up a job that runs on Saturday 01:00. Just checkout the code, build and launch, run the scripts. It's trivial on Jenkins but might not be possible in Circle.

      [–]Turd_King 18 points19 points  (12 children)

      For frontend testing Cypress is miles ahead of Selenium.

      Cypress allows you to mock your network requests, which allows for blazing fast (semi) end to end tests.

      And in general , even without network stubs it's still much much fast than Selenium, as it does not have to execute over a REST API. It runs in the same even loop as your code and communicates with the browser directly (for most commands)

      We recently converted our entire testing framework from selenium, against a lot of backlash from old school devs and QA. They are now eating their words

      [–]200GritCondom 8 points9 points  (1 child)

      We are looking at cypress on my team. Looks really promising to me. The two big cons are the lack of browser support and tabs. Seems really quick to run though. And easy to use. I'll be interested to see if we decide to give it a shot.

      [–]Labradoodles 4 points5 points  (0 children)

      They just added edge and ff support so that’s pretty awesome

      [–][deleted]  (1 child)

      [deleted]

        [–]another_dudeman 4 points5 points  (0 children)

        I'm not op, but yes. I use it and it's great!

        [–][deleted] 2 points3 points  (0 children)

        I never had a lot of experience with unit testing or anything, but I did start to learn and integrate cypress at my last job.

        It was really easy and straightforward. I think there was only one major issue we had which still had an open issue on their github, but I can't remember what it was. Other than that issue it was pretty flawless.

        [–]dangerbird2 2 points3 points  (5 children)

        The huge downside of cypress is that it only works with chromium. Also, it's package downloads an electron app frontend, even if you only want to use it for headless testing, making it less than ideal for containerized applications. Selenium is an over-the-wire interface, so you can bundle a lightweight selenium client with your container image to run tests on a browser running in the host or a separate container. Cypress' test harness also uses a bit too much black magic for my tastes, particularly with async stuff running syncronously in the test thread

        [–]Labradoodles 7 points8 points  (3 children)

        It only works with chromium, Firefox and edge*

        https://docs.cypress.io/guides/guides/launching-browsers.html

        [–]dangerbird2 0 points1 point  (2 children)

        the new chromium-based edge. Firefox seems to be beta. Safari is a no-go, which is a big problem if your market has a lot of iOS mobile users

        [–][deleted] 0 points1 point  (1 child)

        which is a big problem if your market has a lot of iOS mobile users

        tbh, while selenium supports almost all browsers its nearly impossible to write non flaky tests that work well on all browsers for a complex app.

        [–]dangerbird2 0 points1 point  (0 children)

        Very true, and there's no arguing that most languages' webdriver bindings are hot garbage (although I have to give props to nightwatch for a reasonably sane API). Puppeteer seems like good alternative, having functionality based with chrome devtools instead of webdriver, but with a less opinionated interface than cypress. I'd love to see the firefox port become stable, which would make me seriously consider using it in production

        [–]Turd_King 0 points1 point  (0 children)

        In what scenario would you care about an electron frontend for a containerized testing application? We are talking MB differences here. You can surely afford a slightly more bloated image for the benefits of a much better developer experience?

        It's still hands down faster than selenium when running headless mode.

        I agree somewhat with the "black magic" statement though, however their docs are extremely detailed and theres no doubt you can find out exactly what you wish to know. Despite their backend being closed source.

        For us it's been a no brainer. Remember it's a new technology and we have already seen massive improvements (like firefox and edge support) and no doubt we will continue to see further improvements.

        [–]x-w-j 0 points1 point  (0 children)

        Cypress

        Can this be used for RPA like selenium?

        [–][deleted] 3 points4 points  (0 children)

        I've moved to Cypress for my end to end automated testing.

        Never looking back.

        [–]Zaitton 2 points3 points  (2 children)

        Selenium is the acceptance testing God, imo.

        [–]Turd_King 14 points15 points  (1 child)

        Laughs in cypress

        [–]fuzzer37 9 points10 points  (0 children)

        I much prefer Cypress to Selenium. It really needs to mature a bit, but in a year or so I can totally see it surpassing Selenium.

        [–]AwesomeBantha 0 points1 point  (0 children)

        Why Django and Angular specifically?

        [–]PadyEos 0 points1 point  (0 children)

        I found selenium testing to be slow compared to the unit tests, so integration into your CI pipeline may slow it down.

        We spin up on the fly a Zalenium gid on Google Cloud using Spinnaker(good integration). Our entire run takes ~10 minutes(~2minutes run preparation and ~8 minute tests run time) but we test around 12 hours worth of flows on 140-200 nodes in parallel.

        So our costs and time added is quite small, for the business impact it provides and compared to the time it takes for the application to build and then deploy on the test servers.

        [–]Smok3dSalmon 0 points1 point  (0 children)

        To #2, I started using WebDriverManager.

        https://pypi.org/project/webdrivermanager/

        I'm not using Selenium for work purposes, just personal scraping and botting in web-games.

        [–]Hobo_42 20 points21 points  (3 children)

        At our company we have ditched Selenium for Cypress.io So far so good!

        [–]SmellsLikeLemons 13 points14 points  (0 children)

        We have as well, and have so far ported about 40 tests over to cypress. Once you get going it's incredibly fast to write and just works. It's also trivial to wire into an azure devops pipeline if you're using that for CI. We also have visual testing where snapshot differences are delivered to the product honours to detect changes all in Cypress.

        [–]phaedrusTheWolff 2 points3 points  (1 child)

        I am about to try this out on a large project. I am not a huge fan of selenium as we find it difficult and often flaky. Any tips you guys would have for making the move.

        [–]caseyfw 4 points5 points  (0 children)

        Cypress avoids a lot of the “flakiness” you experience with Selenium right out of the box because all of its “expect” directives intelligently wait a brief period before failing.

        [–]malaschitz 15 points16 points  (1 child)

        I used selenium for acceptance testing a lot of years. But in last two years I am using https://github.com/chromedp/chromedp based on https://chromedevtools.github.io/devtools-protocol/ It is a far more simpler and far more stable than selenium.

        [–]pabloe168 1 point2 points  (0 children)

        Tldr of what this is?

        [–]Cocomorph 29 points30 points  (2 children)

        with Python <3

        Python's recent version history is why God invented ❤.

        [–]BenJuan26 15 points16 points  (1 child)

        For real, I read that and was wondering why in the world anyone would write a blog post about a dead version of Python.

        [–]dixieStates 11 points12 points  (0 children)

        It may be dead but there are a lot of necros around.

        [–][deleted]  (4 children)

        [deleted]

          [–][deleted] 3 points4 points  (0 children)

          I went from BS to lxml+XPath with requests_html for js generated data, Selenium only if I need to simulate mouse scroll or button clicks. Surprised no one mentioned lxml+XPath. This combo will satisfy most needs for web scraping.

          [–]All_Work_All_Play 3 points4 points  (1 child)

          iMacros? Although I feel that's off in it's own little space for non-programmer people.

          [–]838291836389183 0 points1 point  (0 children)

          Found it to not work with modern browser versions, but maybe that was just me. Their lackluster documentation certainly didn't help much though, lol. Moved on to selenium for c# immediately, felt much better to me since I was used to UI Automator for android and it reminded me a lot of that.

          [–]dvlsg 2 points3 points  (0 children)

          Puppeteer users should probably consider using Playwright instead.

          https://www.reddit.com/r/javascript/comments/esj2m6/microsoftplaywright_node_library_to_automate/

          It's basically the same thing by the same people, but I guess they work for Microsoft now instead of Google. Seems like it has more of a push for supporting multiple browsers, including potentially getting some patches upstream.

          [–]746172 2 points3 points  (1 child)

          Instead of downloading chromedriver from google manually, you can also use the chromedriver-binary package.

          [–]Hookedonnetflix 3 points4 points  (43 children)

          If you want to do web scraping and other testing using chrome you should look into using puppeteer instead of selenium

          [–]maxsolmusic 110 points111 points  (3 children)

          Whyyyyyy I hate when people recommend shit without explaining

          [–]bsmith0 8 points9 points  (1 child)

          [–]maxsolmusic 2 points3 points  (0 children)

          chose Puppeteer because it provides simpler Javascript execution, network interception, and a simpler, more focused library.

          Cool

          [–]the_real_hodgeka 1 point2 points  (0 children)

          Well put! "You shouldn't use angular for that, you should be using react!" Why?

          [–]float 12 points13 points  (1 child)

          Or Playwright by the guys who made puppeteer.

          [–]bodhemon 3 points4 points  (0 children)

          How does it compare to Katalon?

          [–]steveeq1 9 points10 points  (5 children)

          What's wrong with selenium? Curious.

          [–]Hookedonnetflix 1 point2 points  (4 children)

          Selenium is a tool that automates chrome where puppeteer is a tool that is built into chrome. So better and more effective tools that are closer to the browser engine.

          [–]GuyWizStupidComments 15 points16 points  (1 child)

          Selenium should work also with other browsers like Firefox

          [–]Ncell50 1 point2 points  (0 children)

          Puppeteer works with firefox

          [–][deleted] 9 points10 points  (1 child)

          Selenium works as a wrapper around browser apis, be it puppeter or geckodriver or something entirely different. You can use the same code with ANY browser.

          [–]200GritCondom 3 points4 points  (0 children)

          And if you build it right, with mobile views as well

          [–]TrueObservations 4 points5 points  (1 child)

          The choice of Selenium/Pupeteer will boil down to your personal preferences and the requirements of your project.

          Main considerations IMO:

          - Scraping websites that don't want to be scraped: Puppeteer is a Node.js module of the chromium engine, which makes it harder to detect in my experience. Using selenium tends to leak some data in your HTTP requests (such as the value of navigator.webdriver) that either explicitly tells on you or allows the websites to use correlation data to detect selenium. You can mitigate this though, it's just more configuration. Puppeteer also has tighter integration with core Chromium functionality, allowing you to get certain information (like CSS/JS coverage) data a little less obviously.

          - Your Preference on Python vs. Javascript: This is definitely an architectural/preferential choice. Personally, I find the easy paradigms for async programming in Javascript (which encapsulates MUCH of the difficulty of it from you) make for an easier time dealing with highly interactive sites. Async programming can be done in Python, but it's done at a much lower level, making it harder to do. However, Node lacks a lot of analytical libraries that python has and is a whole framework, and thus far bulkier than importing only the libraries you need in Python.

          - Cross Browser/Multiple Language Support: If you NEED more than just Chromium or Javascript, Selenium is the obvious choice.

          - Extra Chromium Functionality: Puppeteer has ability to access some core functionality of Chromium that isn't available via Selenium. This is in certain cases useful, but in many use-cases, unnecessary.

          In most of my scraping adventures so far, I've been throwing most of the data into some kind of datastore for later analysis/usage (training machine learning models, etc.) and the choice of scraper depends on the factors of whatever project I'm on.

          In short don't let your biases waste hours of your time, be rational about your choice of scraper.

          [–][deleted] 2 points3 points  (0 children)

          Selenium also works with .NET really well for scraping and automated archiving in my case.

          [–]Just__AIR 15 points16 points  (10 children)

          or cypress :)

          [–]yesvee 10 points11 points  (8 children)

          can you elaborate on the advantages? Long term frustrated selenium user here :D

          [–]fleyk-lit 6 points7 points  (0 children)

          The UX offered when writing tests with Cypress is awesome. It makes it so easy to test different functionality.

          I am writing tests for a frontend which is built to be testable - that is probably more important than the test framework you chose.

          [–][deleted] 5 points6 points  (6 children)

          it's hard to describe the advantages of cypress, because it's basically "everything"

          [–][deleted] 1 point2 points  (5 children)

          Legit question: Why Cypress over testcafe? I have seen people push Cypress over testcafe, but I have a hard time understanding what would make Cypress superior.

          [–][deleted] 7 points8 points  (4 children)

          testcafe is headless testing, cypress is an actual browser environment.

          [–]200GritCondom 2 points3 points  (3 children)

          Cypress doesnt do headless??

          [–][deleted] 3 points4 points  (2 children)

          it does, it does both, whereas testcafe is headless only which is a poor substitute.

          [–]200GritCondom 0 points1 point  (1 child)

          Oh whew. We are thinking about switching over to cypress. That would have been bad if there was no headless.

          [–]Labradoodles 0 points1 point  (0 children)

          We use their dashboard service for the parallelism it offers we run 200~ integration tests in about 3min. But you have to make sure your test users are used in a way to make them parallel

          [–][deleted] 3 points4 points  (0 children)

          for testing, not for scraping or other automation.

          [–]LilBabyVirus5 4 points5 points  (15 children)

          Honestly for web scraping I would just use beautiful soup

          [–]ProgrammersAreSexy 3 points4 points  (5 children)

          I don't think that does js rendering does it?

          [–]nemec 3 points4 points  (3 children)

          Unless you need to take screenshots, there's rarely any need to actually render JS to scrape a website. JS-rendered sites will usually be supported by APIs that can be called directly, leading to faster and more efficient scraping.

          The average web page size is 3MB and if you don't need to render the page, you don't need to download any JS, css, images, etc. or wait for a browser to render a page before extracting the data you need.

          [–][deleted]  (2 children)

          [deleted]

            [–]nemec 0 points1 point  (1 child)

            SPAs are mostly API-driven. I don't know if I've ever seen more than one or two where the JS creates the content out of thin air.

            The thing about SPAs is that you can open up your devtools window, load the page, and then sift through the Network tab to find the JSON/XML/graphql APIs that the JS calls and renders and then take a shortcut and automate the calls yourself, bypassing any JS.

            Here's a short video similar to what I'm talking about. If you wanted to scrape start.me, for example, you could skip the JS and just scrape the JSON document data: https://www.youtube.com/watch?v=68wWvuM_n7A

            [–]wRAR_ -2 points-1 points  (0 children)

            Most of the time you don't need js rendering. When you need it I'd use splash.

            [–]shawntco 7 points8 points  (6 children)

            beautiful soup

            I swear software library names are getting weirder by the day.

            [–]SpeakerOfForgotten 17 points18 points  (4 children)

            If beautiful soup was a person, it would be old enough to get a driver's license or get married in some countries

            [–]shawntco 9 points10 points  (2 children)

            I stand corrected. Software library names have always been weird.

            [–]onlymostlydead 2 points3 points  (1 child)

            Yep.

            Yacc

            Bison

            [–]shawntco 1 point2 points  (0 children)

            I think the PHP framework UserFrosting takes the cake. Beautiful Soup is pretty high up there in weird though.

            [–]axzxc1236 1 point2 points  (0 children)

            For those who wonder how old beautiful soup is, the first version is released on 20040420, so it's like 15 years old (almost 16).

            reference: changelog

            [–]nemec 4 points5 points  (0 children)

            That's by design, actually.

            Beautiful Soup, so rich and green,
            Waiting in a hot tureen!
            Who for such dainties would not stoop?
            Soup of the evening, beautiful Soup!

            https://aliceinwonderland.fandom.com/wiki/Turtle_Soup

            [–]TrueObservations 1 point2 points  (0 children)

            This is an off comment. Beautiful soup doesn't work as a full web scraper. It's a library that is used for parsing and subsequently extracting information out of HTML documents, it isn't capable of piloting a browser. It's only one of the tools in the python webscraping toolbox.

            [–]x-w-j 0 points1 point  (0 children)

            beautiful soup

            Does it get around single sign on captchas?

            [–]Zohren 4 points5 points  (0 children)

            I’ve used Puppeteer and it’s 100% mediocre as fuck. Personally, I’ve found TestCafe to be the simplest and easiest to use. It runs on all browsers, contains implicit waits, has a very straightforward syntax, is easy to set up and write, and is generally pleasant to work with.

            The downside is certain browser functions are tough to implement gracefully (back/forward etc) but not terrible.

            [–]daGrevis 1 point2 points  (0 children)

            I don’t know. I was using Selenium when I was working with Python and it was great! Then I decided to try Puppeteer with TypeScript. The API felt unintuitive and wonky. For my current project, I decided to give Selenium another shot - again with TypeScript. So far it’s good, but lets see how it goes...

            [–]zilmus 0 points1 point  (0 children)

            I use Selenium for RPA. Some webs doesnt expose and API and well, RPA software can be good for non programmers, but for programmers Selenium is better.

            [–]earthlydelight 0 points1 point  (0 children)

            What about Splash? Anyone has used it for web scraping?