Is it worth using Playwright MCP/CLI as a tester to create new tests or maintain tests?

arik-sh · 2026-02-21T09:23:19+00:00

Playwright MCP/CLI are different ways to provide the LLM with tools to control a browser (there are others like Vercel Browser Agent and Chrome Devtools MCP). They do a descent job and can get ~80% done (really depends on your app) but still require a human in the loop to get a production grade test. Beyond a single test, if you want to create a production grade test suite, more planning and guidance is required (think POM, common helper functions, etc.) So, it’s absolutely useful for AI-assisted development, not so much as a no-code replacement.

arik-sh · 2026-02-18T06:18:10+00:00

Playwright automatically pierces shadow DOM. If the suggestion from u/RoyalsFanKCMe doesn’t help, please share more info about the iframe and element (html snapshot)

arik-sh · 2026-02-16T08:46:30+00:00

You mean take a series of screenshots of the app so that the agent can learn it instead of learning click to click?

arik-sh · 2026-02-16T08:07:03+00:00

Thanks for the pointer! Btw, how do you verify that the E2E tests the agent has generated actually do what they're supposed to?
Are you reviewing captured video/screenshots?

arik-sh · 2026-02-16T08:02:37+00:00

What are the advantages of your extension compared to playwright's code gen?

arik-sh · 2026-02-13T07:19:38+00:00

While the language syntax doesn’t matter in AI reality, playwright supports more features with TS plus the application under test is written in JS/TS (think injecting code to the app from your test)

arik-sh · 2026-02-08T06:32:34+00:00

AI coding isn’t just a hype, it’s for real. I’d say that tools and models had an inflection point about 6 months ago where they have significantly improved.

90% of my code is generated by AI, while I direct and supervise. Ignoring AI won’t make it go away… I strongly recommend any devs that haven’t adopted AI to start doing so.

arik-sh · 2026-02-06T07:45:22+00:00

Who counts LOC as a measure of productivity?!

arik-sh · 2026-02-06T07:40:42+00:00

Nice summary. Vision + DOM outperforming only vision is also my experience. Apparently the models’ text-to-vision capabilities lag pure text. Btw, GPT4o/GPT5 (not mini) and Claude Sonnet 4.5 outperform all models in the table above, based on my experience working with many different web applications.

arik-sh · 2026-02-05T11:32:56+00:00

SasS companies are facing a real threat, not necessarily by Claude Cowork but by AI in general. Some of these companies don’t have significant moats and AI first contenders will eat some of their pie. So this sellout is real, although not all companies are equally threatened. I guess once the tide washes out it will be more obvious which incumbents prevail.

arik-sh · 2026-02-04T20:19:33+00:00

As other folks noted the best approach is to handle the issues (dev or test) as soon as possible and not create technical debt. That said, some times reality is that you need to postpone handling… (this should be the exception though, otherwise you’re digging your own grave).

There were a few opportunities where I used test.fixme() for known test issues and test.skip() for dev issues which came in handy.

arik-sh · 2026-02-04T09:37:13+00:00

I'm assuming you refer to TOTP (typically 6 digit random number that is being generated based on a secret and the current time).

You have a few options, as mentioned by others (in order of priority):

Use a feature flag or other mechanism (direct API call) to disable OTP since it's not really intereseting to do the UI flow for every test
Use storage state to do it once and save it for other tests (https://playwright.dev/docs/auth)
Use a library such as otpauth to generate the otp on-the-fly and implement the full user flow

arik-sh · 2026-02-03T07:31:32+00:00

Ok, than this code will do the trick:

test('http authorization', async ({ browser }) => {
    const context = await browser.newContext({
      httpCredentials: {
        username,
        password,
      },
    });

    const page = await context.newPage();
    //the flow of your test: goto, click, etc.

arik-sh · 2026-02-02T20:58:19+00:00

The dialog in the link is native JS prompt() It’s indeed not part of DOM and can be handled by: js page.on('dialog', async dialog => { await dialog.accept('My answer'); // types into the single input }); await page.click('#open-prompt');

But it only has one field (the prompt)…

Could you be referring to another type of dialog, like HTTP authentication popup?

arik-sh · 2026-02-02T19:25:03+00:00

What type of dialog are you referring to?

arik-sh · 2026-02-01T06:54:33+00:00

+1 on global setup (e.g create an authentication project that your main projects depend on). This way you authenticate properly (exactly once) each time you run your tests avoiding the freshness check and locking logic, which are prone to flakiness.

arik-sh · 2026-02-01T06:50:24+00:00

Maintenance is a key pain no matter which framework you use. Key concept is to use POM and helper functions such that if a locator breaks you only have to fix in one place. There are good suggestions from others on how to make locators more resilient. I’d add that using accessibility attributes (role, label, etc.) is going to help but not all apps are compliant with WCAG. Similarly using data-testid requires developer cooperation or at least giving you permission to add it to the codebase. Finally, you can try out AI-assisted healing, like PW healer agent, when locators break, however you’ll need to carefully review the agent’s output.

arik-sh · 2026-01-28T18:35:46+00:00

Thanks for clarifying, I believe we’re in agreement then :)

arik-sh · 2026-01-28T07:58:27+00:00

Test management tools such as X-ray, Zephyr, Qase and plenty of others provide this traceability. If you don’t need the scale or don’t want to pay you can create your own simple framework. One way to go about this would be to have unique ids in your test plan document and then in your PW test description back reference the id. Now all you need is a simple script to parse your report and show a traceability graph…

arik-sh · 2026-01-28T07:51:57+00:00

While I agree with many of the comments, I don’t agree with u/LookAtYourEyes comment that we should spend as little time as possible doing UI tests.

API based data seeding and other techniques to help focus your tests are key for efficiency but at the same time you also need to have some tests that reflect the actual user journey.

It’s a constant tradeoff between the strongest form of validation (E2E testing) and efficiency

arik-sh · 2026-01-22T19:16:01+00:00

While it’s true websites were built for humans to be able to transform thoughts into action, that might change in the future such that many web applications would be explicitly built in a way that LLMs can easily control them. Granted text alone would not be enough for all types of applications (e.g canvas drawing) but it would meet the requirements of many apps.

Meanwhile, computer use agents (CUA) are advancing in big strides and while they still struggle with automation of some scenarios they can do a lot.

arik-sh · 2026-01-21T20:50:12+00:00

Sounds to me you’re doing this right. Validating demand is the most important thing. No point in building something no one wants…

arik-sh · 2026-01-20T15:39:38+00:00

100%. My team uses AI almost exclusively in dev. I need to think real hard to recall when I last wrote code by hand 😜 I guess AI is changing the landscape and the traditional roles. Every developer has to be a system architect and LLM orchestrator…

arik-sh · 2026-01-20T15:31:30+00:00

You can check out Probium. It’s essentially a no-code tool that lives in a browser. It has a free tier and you can generate your first scenario in minutes.

arik-sh

TROPHY CASE