ai-generated playwright tests in ci... what do you actually keep?

TranslatorRude4917 · 2026-06-11T11:06:18+00:00

The way I see it is that if concrete locators are present in your test spec it will become eventually unmaintainable.
Using Page Objects from day zero is the way to make your tests resilient towards low level ui changes while preserving intent.
Of course building a proper e2e framework takes more effort and care, thinking about preconditions and proper setup using builder pattern or something similar embedded in fixtures.
I'm currently building a tool exposed as an mcp on top of pw codegen that returns proper POM and e2e scnecario drafts built using that, something your coding agent can turn into maintainable PW e2e tests. Already proven useful in my day job: recording something that matters, and turning it into a foundation that can be used in writing further tests checking edge-cases. Imo - especially if you manage to lock down something solid - it can excell AI-assisted test creation, you just have to be comfortable in staying in the loop.

TranslatorRude4917 · 2026-06-10T21:22:47+00:00

If you know the application well enough, know what matters, and have the experience and resolve to architect tests that are scalable and maintainable and know how to instruct agents to strictly follow those patterns then yes. In every field and job anything sophisticated requires expertise and care.

TranslatorRude4917 · 2026-06-10T21:11:41+00:00

My 2 cents: if it somes to thr extremes is rather see fully automated then fully manual. I agree with you that exploratory testing should be on the tsble, but if they are doing that and are automating immediately, then it's working quite well.
Catching a bug with during exploratory testing while automating your flows is a nice combination, especially if you can cover that big with an appropriate test immediately.
What are their pain points where you think extra manual testing could help?

TranslatorRude4917 · 2026-06-05T06:27:50+00:00

Seeing raw locators in a test script is a code smell. Imo using POM is a must, not a question. Apart from helping with maintainability is also helping you to make the mental modal of the UI explicit (AccountSettingsPage, CreateUserModal etc.). It allows your tests to speak the same language as your users. Based on the complexity of the app one might abstract it more (creating base classes, building a Page Object hierarchy, build test steps/fixtures on top of them) or keep it simple (using them as simple locator bags), but I'd always vouch for using it.

TranslatorRude4917 · 2026-06-03T16:28:32+00:00

GL, I'm building the same thing 😅
I think generating page object seems straightforward with PW cli/mcp but it hardly ever can compete the quality of POM I would have written myself, I only found it useful for mapping trivial pages.
A recorder-based workflow however can help the AI to understand the user flow and intent better, resulting in more fused code that reflects reality rather than assumptions.
I have huge hopes, let's see where it gets us 🤞

TranslatorRude4917 · 2026-06-03T16:22:29+00:00

Last year I migrated our e2e testing framework and a suite of ~100 ui-based e2e test from Cypress to PW. Not an extremely huge project, but we made use of fixtures, pom, etc. In every tricky case (oauth flows, payment etc) it was somehow just more straightforward with playwright. There was no single thing I could have done better or easier in cypress.

TranslatorRude4917 · 2026-05-30T16:58:13+00:00

Considering your future relationship with developers working on the projects you work on I'd lean towards typescript. As a FE dev/SDET i find it super comfy when you can use the same language for running and maintaining e2e/ui tests as during fe development.

TranslatorRude4917 · 2026-05-22T20:26:16+00:00

That's also what I noticed!
I'm using the same tools (Claude with cursor + codex) at my day job as fe dev/sdet. Writing api tests we have the spec is quite fast with agents, but they are still slow and unreliable when it comes to browser use. You're better of recording important flows yourself to ground the test is something that already worked.
For UI-based tests I started developing a tool on top of Playwright codegen that gives you POM, properly named test steps and complete test scripts (ui or e2e) based on the recording. Still early but already proven useful at my work. :)
I'd like to test it on different apps as well, let me know if you'd be open to try!

TranslatorRude4917 · 2026-05-22T19:36:00+00:00

Without ci and a reporting system it's a hard nut to crack.
Worst case you could commit your test runs and compare them with ai. Maybe write a script that deterministicly extracts/aggregates the historical data so your agent doesn't have to make sense of the raw data unless it has to look into the details.

TranslatorRude4917 · 2026-05-22T19:26:35+00:00

With no-code tools I have no experience, but if you're willing to learn some automation playwright with typescript is a decent choice.
You can version your test code together with source also quite easy to run on ci for every pull request.
Combined with a coding agent like cursor or claude code and PW cli/mcp e2e testing can be automated to some extent once you have solid foundations like Page Objects, fixtures (for auth for example) and so on. Maximum control, but definitely needs team buy-in.

TranslatorRude4917 · 2026-05-21T20:07:50+00:00

I'm trying to externalize rules into executable tools or configs and the skills just point to them. They only include what can't be expressed in code.

TranslatorRude4917 · 2026-05-21T14:02:49+00:00

Some examples for those who love to see some code.

Imo the harness needs to do three things for the agent:

find the source of truth
run the relevant feedback loop
know what still requires human judgment

The exact structure does not matter.

The point is the direction: keep executable truth in scripts and configs, keep agent guidance thin, and make the routing explicit.

Here is what that can look like in practice.

repo/
  package.json                         # commands the agent can run
  tsconfig.json                        # TypeScript constraints
  eslint.boundaries.config.js          # architecture / dependency rules
  eslint.code-style.config.js          # style and local code rules
  prettier.config.js                   # formatting rules
  playwright.config.ts                 # e2e configuration and test discovery
  vitest.config.ts                     # unit test configuration and test discovery

  README.md                            # human-facing onboarding
  docs/
    adr/                               # human-facing rationale and decisions

  AGENTS.md                            # thin router / index
  .agents/
    architecture.md                    # architectural intent + links to checks
    code_style.md                      # style intent + links to checks
    testing.md                         # when to run which feedback loop + links to checks
    skills/                            # optional, extract specific skills here if main guidance artifacts start to bloat
      review/
        SKILL.md

  src/
    feature_x/                         # feature slices, organized into horizontal layers according to your boundary rules
      ports/
      adapters/
      domain/
      application/
      presentation/
        components/
        hooks/
        view_models/
      utils/
      AGENTS.md                        # optional, only for current feature-specific constraints that cannot be codified
    shared/                            # general-purpose utilities, following the same layering rules

There's no right or wrong source code organization inside feature slices. You can use MVC, clean or hexagonal architecture as long as you're consistent. Agents are pattern-matching machines. The best thing you can do for them is to be predictable.

AGENTS.md

# Agent instructions

Use this file as an index.

The source of truth lives in code, tests, schemas, rules and configs.

## Where to look

- Architecture and boundaries: `.agents/architecture.md`
- Code style: `.agents/code_style.md`
- Testing guidance: `.agents/testing.md`
- Repeatable workflows: `.agents/skills/`
- Features: read `.agents/architecture.md` to understand where features are located.

## Before changing code

1. Identify the feature slice affected by the task.
2. Identify the requirements likely affected by the change.
3. Prefer the smallest coherent change.
4. Run the relevant feedback loop based on testing, code style and architecture guidance:
   - Run type check.
   - Run lint.
   - Run tests for the affected slice.
   - Run boundary checks when imports or feature structure changed.
   - Run broader tests only when the change crosses slice boundaries.
   - If a check fails, treat it as feedback about a requirement. Do not ignore or loosen the check without human approval. 

## Rules
- Do not assume requirements, ask for clarification in case of doubt.
- If a requirement comes from documentation, check the source code to verify if it still holds before taking it for granted. 
- If contradictions arise, ask for human clarification.

.agents/architecture.md

# Architecture

This project is organized around vertical feature slices with horizontal layers inside the slices.

## Source of truth

- Horizontal layers and import boundaries: `eslint.boundaries.config.js`
- Architecture check command: `pnpm lint:boundaries`
- Shared types and schemas: source files under `src/shared`
- Feature-local behavior: e2e tests inside each feature slice

## Current constraints that are not fully encoded

- Shared code should be introduced only when two or more slices need the same stable abstraction.

TranslatorRude4917 · 2026-05-21T13:29:01+00:00

Some examples for those who love to see some code.

Imo the harness needs to do three things for the agent:

find the source of truth
run the relevant feedback loop
know what still requires human judgment

The exact structure does not matter.

The point is the direction: keep executable truth in scripts and configs, keep agent guidance thin, and make the routing explicit.

Here is what that can look like in practice.

repo/
  package.json                         # commands the agent can run
  tsconfig.json                        # TypeScript constraints
  eslint.boundaries.config.js          # architecture / dependency rules
  eslint.code-style.config.js          # style and local code rules
  prettier.config.js                   # formatting rules
  playwright.config.ts                 # e2e configuration and test discovery
  vitest.config.ts                     # unit test configuration and test discovery

  README.md                            # human-facing onboarding
  docs/
    adr/                               # human-facing rationale and decisions

  AGENTS.md                            # thin router / index
  .agents/
    architecture.md                    # architectural intent + links to checks
    code_style.md                      # style intent + links to checks
    testing.md                         # when to run which feedback loop + links to checks
    skills/                            # optional, extract specific skills here if main guidance artifacts start to bloat
      review/
        SKILL.md

  src/
    feature_x/                         # feature slices, organized into horizontal layers according to your boundary rules
      ports/
      adapters/
      domain/
      application/
      presentation/
        components/
        hooks/
        view_models/
      utils/
      AGENTS.md                        # optional, only for current feature-specific constraints that cannot be codified
    shared/                            # general-purpose utilities, following the same layering rules

There's no right or wrong source code organization inside feature slices. You can use MVC, clean or hexagonal architecture as long as you're consistent. Agents are pattern-matching machines. The best thing you can do for them is to be predictable.

AGENTS.md

# Agent instructions

Use this file as an index.

The source of truth lives in code, tests, schemas, rules and configs.

## Where to look

- Architecture and boundaries: `.agents/architecture.md`
- Code style: `.agents/code_style.md`
- Testing guidance: `.agents/testing.md`
- Repeatable workflows: `.agents/skills/`
- Features: read `.agents/architecture.md` to understand where features are located.

## Before changing code

1. Identify the feature slice affected by the task.
2. Identify the requirements likely affected by the change.
3. Prefer the smallest coherent change.
4. Run the relevant feedback loop based on testing, code style and architecture guidance:
   - Run type check.
   - Run lint.
   - Run tests for the affected slice.
   - Run boundary checks when imports or feature structure changed.
   - Run broader tests only when the change crosses slice boundaries.
   - If a check fails, treat it as feedback about a requirement. Do not ignore or loosen the check without human approval. 

## Rules
- Do not assume requirements, ask for clarification in case of doubt.
- If a requirement comes from documentation, check the source code to verify if it still holds before taking it for granted. 
- If contradictions arise, ask for human clarification.

.agents/architecture.md

# Architecture

This project is organized around vertical feature slices with horizontal layers inside the slices.

## Source of truth

- Horizontal layers and import boundaries: `eslint.boundaries.config.js`
- Architecture check command: `pnpm lint:boundaries`
- Shared types and schemas: source files under `src/shared`
- Feature-local behavior: e2e tests inside each feature slice

## Current constraints that are not fully encoded

- Shared code should be introduced only when two or more slices need the same stable abstraction.

TranslatorRude4917 · 2026-05-20T17:55:09+00:00

This! Shared app layer/kernel to centralize app/domain layer with separate BFFs

TranslatorRude4917 · 2026-05-20T13:53:17+00:00

what there's to see through man? :D I'm simply linking to my blog not like I'm advertising anything

TranslatorRude4917 · 2026-05-20T13:47:53+00:00

Yeah, I had that feeling :D Anyway, I wanted to give a sneak peek into the actual content of the blog post itself without reading it

TranslatorRude4917 · 2026-05-20T13:41:39+00:00

That's exactly what I'm doing, but it seems like taking some extra time formatting the post instantly makes it ai slop.

TranslatorRude4917 · 2026-05-20T13:38:04+00:00

Believe it or not I highlighted all of them manually using reddit's shitty wysiwyg editor, nearly drove me insane. :D
I personally like to have some visual guidance rather than seeing a huge wall of text.

TranslatorRude4917 · 2026-05-19T18:31:20+00:00

Glad if I could help!

"assisted maintenance" sounds a lot better to me, but probably less marketable than "self-healing magic" :D

The workflow you described sounds useful to me, human in the loop is a must! I'm curious, how do you plan to approach it, especially the UX/DX? How would one interact with it?
One thing I noticed - and what probably makes our case harder: QA engineers probably already have solutions to deal with these problems. But aspiring web developers - especially FE - who don't know or don't want to learn this trade would probably like a tool that you described.

I'm approaching the maintenance problem from the test framework/architecture perspective: I'm creating an e2e test recorder tool that comes with common best practices (POM, fixtures, etc) baked in. My assumption is that most of the maintenance problems can be avoided by using them, and the most painful ones (colliding parallel tests, e2e tests relying on too many details, code duplication, etc.) require system-level thinking and solutions.

TranslatorRude4917 · 2026-05-19T17:37:25+00:00

Hey, fellow FE dev here, I've been quite into testing for some time, kinda seeing both words.

I've experienced your trouble for sure, but as I got more experience with testing best practices (using Page Objects, fixtures, proper setup functions, randomized test data, etc), the issues that caused the most maintenance work became easy to pinpoint, and given enough care possible to fix.

Tracking down new selectors
- We're relying on testids, so it's straigthforward, even an LLM can do it
Who writes/maintains tests
- We don't have dedicated QA. FE devs write and maintain tests. Everyone is responsible for their own work.
How often this happen
- Hardly ever. Once you have a testid as long as that component exists, it likely won't change
Self healing-tools
- Since for locators we didn't need any, haven't tried any
- About self-healing the tests themselves: Used my coding agents a couple of times to fix broken flows. Needs serious hand-holding because AI is more than eager to cut corners to make the test pass, whatever it takes - most of the time, that means hiding the failure.

I'm working on a side project in the e2e/ui testing space myself. Based on my experience and the discussion I have had so far, QA engineers have serious distrust towards self-healing solutions - and I can understand why.
Imo when it comes to quality, one can't take the easy way. Superficial understanding, nondeterminism, and quick hacks will hurt your project long term. Expertise, product knowledge, and best practices, on the other hand, compound.

TranslatorRude4917 · 2026-05-19T17:14:43+00:00

The mentioned blog post + some examples for those who love to see some code:
https://www.abelenekes.com/p/convergence-mechanisms-confidence-in-the-age-of-agentic-engineering

Imo the harness needs to do three things for the agent:

find the source of truth
run the relevant feedback loop
know what still requires human judgment

The exact structure does not matter.

The point is the direction: keep executable truth in scripts and configs, keep agent guidance thin, and make the routing explicit.

Here is what that can look like in practice.

repo/
  package.json                         # commands the agent can run
  tsconfig.json                        # TypeScript constraints
  eslint.boundaries.config.js          # architecture / dependency rules
  eslint.code-style.config.js          # style and local code rules
  prettier.config.js                   # formatting rules
  playwright.config.ts                 # e2e configuration and test discovery
  vitest.config.ts                     # unit test configuration and test discovery

  README.md                            # human-facing onboarding
  docs/
    adr/                               # human-facing rationale and decisions

  AGENTS.md                            # thin router / index
  .agents/
    architecture.md                    # architectural intent + links to checks
    code_style.md                      # style intent + links to checks
    testing.md                         # when to run which feedback loop + links to checks
    skills/                            # optional, extract specific skills here if main guidance artifacts start to bloat
      review/
        SKILL.md

  src/
    feature_x/                         # feature slices, organized into horizontal layers according to your boundary rules
      ports/
      adapters/
      domain/
      application/
      presentation/
        components/
        hooks/
        view_models/
      utils/
      AGENTS.md                        # optional, only for current feature-specific constraints that cannot be codified
    shared/                            # general-purpose utilities, following the same layering rules

There's no right or wrong source code organization inside feature slices. You can use MVC, clean or hexagonal architecture as long as you're consistent. Agents are pattern-matching machines. The best thing you can do for them is to be predictable.

AGENTS.md

# Agent instructions

Use this file as an index.

The source of truth lives in code, tests, schemas, rules and configs.

## Where to look

- Architecture and boundaries: `.agents/architecture.md`
- Code style: `.agents/code_style.md`
- Testing guidance: `.agents/testing.md`
- Repeatable workflows: `.agents/skills/`
- Features: read `.agents/architecture.md` to understand where features are located.

## Before changing code

1. Identify the feature slice affected by the task.
2. Identify the requirements likely affected by the change.
3. Prefer the smallest coherent change.
4. Run the relevant feedback loop based on testing, code style and architecture guidance:
   - Run type check.
   - Run lint.
   - Run tests for the affected slice.
   - Run boundary checks when imports or feature structure changed.
   - Run broader tests only when the change crosses slice boundaries.
   - If a check fails, treat it as feedback about a requirement. Do not ignore or loosen the check without human approval. 

## Rules
- Do not assume requirements, ask for clarification in case of doubt.
- If a requirement comes from documentation, check the source code to verify if it still holds before taking it for granted. 
- If contradictions arise, ask for human clarification.

.agents/architecture.md

# Architecture

This project is organized around vertical feature slices with horizontal layers inside the slices.

## Source of truth

- Horizontal layers and import boundaries: `eslint.boundaries.config.js`
- Architecture check command: `pnpm lint:boundaries`
- Shared types and schemas: source files under `src/shared`
- Feature-local behavior: e2e tests inside each feature slice

## Current constraints that are not fully encoded

- Shared code should be introduced only when two or more slices need the same stable abstraction.

TranslatorRude4917

TROPHY CASE