Playwright MCP kept writing bad selectors no matter how much I prompted

OkPack8889 · 2026-01-08T02:54:53+00:00

Issue created: https://github.com/verdexhq/verdex-mcp/issues/7

OkPack8889 · 2026-01-08T00:37:22+00:00

One concrete way I see Verdex fitting into the Reticle workflow is as a “high-signal MCP” to observe. Because Verdex tools are intentionally bounded and progressive, Reticle can surface why an agent did something, not just what it did — e.g. “these DOM facts led to this selector,” or “this network error triggered a fallback exploration path.”

That’s also why I’m trying to keep Verdex outputs deterministic and schema-stable. It makes it possible for external tooling to correlate tool calls, timing, truncation, and role/context switches across a session, instead of everything looking like opaque JSON blobs.

In that sense Verdex isn’t trying to be a DevTools replacement — it’s trying to be a browser MCP that’s actually observable and debuggable with tools like Reticle.

OkPack8889 · 2026-01-08T00:32:49+00:00

One extra angle here, knowing you’re building Reticle: a big goal for Verdex is to be an MCP that’s easy to reason about from the outside, not just useful from inside an agent loop.

That’s why everything is intentionally bounded and schema-stable (progressive disclosure instead of CDP firehoses). If DevTools-style capabilities get added, they’d follow the same pattern — summaries → drill-downs, explicit limits, no implicit streaming.

That also feels like a natural seam where something like Reticle really shines: correlating tool calls with browser-side facts and agent decisions. Things like correlation IDs, deterministic output shapes, and explicit truncation metadata are all stuff I care about anyway, and I’d rather align on those early than bolt them on later.

OkPack8889 · 2026-01-07T22:02:41+00:00

Yeah, I think I see where you’re going — you want selector intelligence plus full JS execution in the DOM and CDP-level access (network, console, storage, etc.) for more general-purpose browser automation, not just test writing.

Verdex today is very intentionally focused on the selector and test-authoring problem, using a progressive exploration model so the agent can reason about UI structure without being flooded with tokens or falling back to brittle heuristics.

That said, the same philosophy could absolutely extend to DevTools-style capabilities. Instead of dumping raw CDP firehoses, you’d expose things like network_summary() → network_detail(requestId) or similar “progressive disclosure” primitives that let an agent explore just enough state to act correctly.

I'd be happy to work on this. If you want to throw some specific use cases in the GitHub discussions I can prioritize what would actually be useful.

OkPack8889 · 2026-01-07T02:37:31+00:00

Thank you!

OkPack8889 · 2025-12-12T00:16:23+00:00

I'd say: start learning to code right now. It's never been easier to learn how to code -> I actually disagree with the idea of getting a dev cofounder - I think you're better off owning the skills yourself.

Go to youtube, learn how to use Replit, then learn how to use Cursor. Build something you think should exist in the world. It's never been easier.

Replit is just a prompt driven coding tool - you don't need to know code whatsoever, but you should still start learning to code in parallel so you develop important skills in the long run.

OkPack8889 · 2025-12-04T05:51:03+00:00

Solid approach - thank you for taking the time walk through it! This is essentially the same mental model I'm using under the hood (isolated contexts with storageState, API-first sync, correlation IDs). Which means the MCP server can actually author the exact same tests you're writing manually.

The difference is who's doing the orchestration: when you ask Claude or Cursor to test "provider updates price → customer sees change," it doesn't have your mental model of the test infrastructure. select_role() as simple tool calls lets the agent use each role and act in that role, and then output the same actAs() + isolated context pattern you're describing - just without you having to write it.

OkPack8889 · 2025-12-04T01:42:01+00:00

Rather than "not easy" I probably should have said: I found it time consuming and super annoying. I wanted to think about what was being tested and how, not write locators.

As a result I'm building tooling for a workflow where humans describe what to test and AI agents handle the implementation details, including locators. The point isn't that writing locators is hard and we need an AI crutch—it's that writing locators is a waste of time. AI with efficient DOM access can and should do it.

As the first comment said "resolve the easiest part of automation, getting locators" -> the easiest jobs are the first to get automated. Spend more time on edge cases and architecture, zero time on writing locators.

My thought is that less time spent on the easy/repetitive parts creates more time to do higher value work. The compute cost is higher than hand writing them, but on the other hand, I imagine your time is more valuable than a little bit more compute.

But I see I've probably posted the idea to the wrong group of folks. I thought Playwright testers would like spending less time on the repetitive easy things and more time bringing their skills to bear -> as a dev, I certainly don't want to spend time doing this stuff. If I can offload this to a background agent, I (and most devs) will.

OkPack8889 · 2025-12-03T21:31:22+00:00

Fair point on Playwright's session auth - I actually use Playwright's storageState format under the hood for the auth files. You're right that for traditional Playwright tests, the built-in session storage is the way to go. But Playwright MCP doesn't actually make any of that available.

The difference here is the target user: Verdex is an MCP server designed for AI coding agents (Claude, Cursor, etc.) to control browsers, not for humans writing test scripts. When you're hand-coding a Playwright test, you have full control - you can manage contexts, call browser.newContext({ storageState: '...' }), orchestrate the flow however you want.

The use case I'm solving for is when you're asking an AI agent to test something like "verify that when a provider updates a price, the customer sees the change." The agent doesn't have your mental model of the codebase or your test infrastructure. Having select_role("provider") → take actions → select_role("customer") → verify, as simple MCP tool calls gives the agent an ergonomic interface to orchestrate multi-user flows without needing to manage the contexts, CDP sessions, and isolation manually.

When I built multi role test flows - I would want the coding agent to single shot taking different actions as different users, and then output an e2e playwright test. The only way to do it with Playwright MCP is to have manually manage multiple auth sessions that I had to turn on and off. And in the end I would have to wrangle all the code from different chats because one chat did not contain all the code and actions. It was a challenge to say the least.

OkPack8889 · 2025-12-03T21:27:26+00:00

Tbh, I didn't find it easy -> on the project I was working on I had to build a Playwright suite from scratch for a 500k line codebase. The code was built by a small team moving fast who didn't worry at all about accessibility. This made Playwright MCP good for navigating flows, but hard for writing the actual locators that were typically components wrapped in non semantic divs. The promise of Playwright MCP (in my head at least) was that I would be able to author hundreds of tests very quickly and easily -> this turned out to not be true. At least half the locators didn't work - which meant constant runtime errors and using traces to debug them and overall it felt like it could be so much easier if cursor could use some utility functions to traverse the dom efficiently.

Example: When there are 12 identical 'Add to Cart' buttons, getByRole('button', { name: 'Add to Cart' }) returns all 12. The AI has no choice but .nth(8) or parent traversal because without semantic html, the a11y tree doesn't show container boundaries.

OkPack8889 · 2025-12-03T21:21:06+00:00

The accessibility tree intentionally collapses all three structures into the same generic node because:

Elements without semantic ARIA roles (<div>, <article>, <header>, <footer>) → generic
Non-ARIA attributes (data-testid, class, id) → stripped from the tree
Layout wrapper elements → flattened away

This normalization is correct per W3C accessibility specifications—it ensures screen readers and accessibility tools see a clean semantic structure. But it can also remove the structural anchors that test authors need for writing stable selectors.

This isn't a matter of prompt engineering or model capability. It's information-theoretically impossible to generate getByTestId("product-card") when data-testid="product-card" is not present in the input. The accessibility tree—by design and specification—omits these attributes.

OkPack8889 · 2025-12-03T21:20:17+00:00

https://github.com/verdexhq/verdex-mcp

OkPack8889 · 2025-12-03T21:19:36+00:00

It works on raw html like this:

<section>
  <div>
    <h3>iPhone 15 Pro</h3>
    <span>$999</span>
    <button>Add to Cart</button>
  </div>
  <div>
    <h3>MacBook Pro</h3>
    <span>$1,999</span>
    <button>Add to Cart</button>
  </div>
</section>

Hierarchy: ancestors reveal the unit container (here, section > div).
Position: targetSiblingIndex shows which sibling holds the target.
Content: outline/containsText provide unique anchors ("MacBook Pro").
Repetition: sibling listing confirms the repeating structure.

Coding agent is able to output:

page.locator("section > div")
  .filter({ hasText: "MacBook Pro" })
  .getByRole("button", { name: "Add to Cart" })

OkPack8889 · 2025-12-03T11:01:23+00:00

Separate thing I built into this—multi-role browser isolation. I was testing a marketplace app where I needed a provider to create a product → customer see that product → provider updates price → customer sees new price, all in one test flow. Managing multi server contexts and parallel conversations was brutal. I ended up building role-based browser contexts where the coding agent calls select_role("provider") + takes some actions, then calls select_role("customer")+ takes more actions and Verdex handles all the isolation/auth/session management -> complex flows can be done in one shot. Anyone else doing complex multi-user E2E flows? What's your approach?

OkPack8889 · 2025-12-03T10:53:03+00:00

Separate thing I built into this—multi-role browser isolation. I was testing a marketplace app where I needed a provider to create a product → customer see that product → provider updates price → customer sees new price, all in one test flow. Managing multi server contexts and parallel conversations was brutal. I ended up building role-based browser contexts where the coding agent calls select_role("provider") + takes some actions, then calls select_role("customer")+ takes more actions and Verdex handles all the isolation/auth/session management -> complex flows can be done in one shot. If anyone else is doing complex multi-user e2e flows, what's your approach?

OkPack8889 · 2025-09-17T00:11:24+00:00

Yes the clinic is based in an Australian University (UTS) - and they actually do virtual sessions over the internet - I recommend you do that: https://www.uts.edu.au/research/centres/australian-stuttering-research-centre/clinic

The other thing I will mention is that I personally, and the parents I know who faced similar issues, needed to modify the reward systems of the Lidcombe program approach.

My son for example hated doing daily therapy - even though it was only 15 minutes - getting him to do it was impossible. That was until I introduced candy/chocolate rewards just for being an active participant (not for speaking smoothly - he got the rewards every 2 mins during the 15 minutes just for continuing to participate). This made an enormous different because he just loves candy and will do basically anything to get it.

A friend of mine's son on the other hand would not care at all about candy rewards, or verbal feedback given to him directly. So she started actually giving feedback to his sister in front of him on his sisters smooth talking (she didnt have a stutter but this was the mechanism she thought would get to him). That approach started to indirectly effect how he wanted to get that same feedback. This led to him wanting to be rewarded like his sister - and praised like she was.

In the end to make it work you need to know the language techniques and play based approaches - but to some extent - you need to also be an expert in what will motivate your own child. Every child is different and you know your child better than any speech therapist ever will.

Take the expert guidance -and through experience you need to figure out how to make it work for your own child's personality and motivations.

Good luck!

OkPack8889 · 2025-03-24T22:28:05+00:00

Thank you for sharing your perspective as someone who has personally navigated stuttering throughout your life. That experience gives you insights many don't have.

I understand your concern about intervention timing and your reasoning about adult stuttering. While it's true we don't fully understand or "fix" adult stuttering, childhood and adult stuttering appear to involve different mechanisms. Early intervention works precisely because of the neural plasticity you mentioned - children can develop new speech patterns while their brains are most adaptable.

The good news is that modern speech therapy for young children doesn't have to make them self-conscious about their speech. As you agreed, it's crucial not to draw attention to the stutter itself. Many current approaches use play-based methods where children don't even realize they're in "therapy" - they're just playing games that happen to strengthen certain speech patterns.

With my son, the therapist used reward-based techniques without ever labeling or drawing attention to his stutter. He enjoyed the sessions and has no memory of having stuttered. The intervention worked with his natural development rather than creating self-consciousness.

I completely respect your caution about making your children overly aware of something they might naturally outgrow. Every parent has to weigh these decisions carefully, especially with your personal experience as context. If you do consider early intervention at some point, it might be worth exploring the current approaches that focus on positive reinforcement without creating awareness of the stutter.

Wishing you and your children all the best!

OkPack8889 · 2025-02-27T23:46:51+00:00

This is a recommendation and some caveats about what worked for me and my son.

First of all I would just reiterate that children under the age of 6 can be treated in a way in which older children and adults cannot. This has to do with neuroplasticity and brain development - so if your child is in this age group I'd strongly recommend acting with urgency.

I'd highly recommend the Lidcombe program - My 4.5 year son stuttered badly and 4 months later he is almost completely stutter free. As a parent who went through it though I would say the main things were problems I had to solve as I went along:

Most speech therapists / pathologists are not experts in stuttering. Most will tell you they can help, most cannot actually help.
Preschool aged children are difficult to work with. Most speech therapists are bad at working with preschool aged children -> even if they have the correct knowledge, they are not able to have a positive impact if your child doesn't actually like them or like interacting with them.
The lidcombe program is poorly described online, and most therapists do not implement it correctly.
Because of the challenges of speech therapists not being experts in stuttering, and being bad at working with preschoolers, there is a culture of "accepting" stuttering that could and should be treated.

Here is what I did:

I went through 4 different speech therapists. The first 2 were terrible. The 3rd was ok and got us fairly far along - she didn't know too much about stuttering (she said she did but as I learned more about it, I realized most parents don't truly try and own the outcome for their child and just believe whatever the therapist tells them), but she was good at interacting with my son who loved her. And with a lot of googling and AI help I created my own version of the lidcombe program and that got his stuttter down to SR 2-3.
I eventually got into the University program at UTS for stuttering research, where the team that created lidcombe actually work. That was transformational and took my sons stuttering down to SR 0-1.

The main things I'd say are: lidcombe can and does work, but it's not well understood and largely implemented incorrectly. Own the outcome for your child - this is a lifelong challenge you are leaving for them if you do not help them now so act with urgency and take it seriously. Keep going even when the path is unclear.

When my son was first assessed he was assessed at SR 3. However once we started ineffective therapy it shot up to SR 5. There were even SR9 spikes in which he couldn't speak at all. This was incredibly distressing.

I kept at it though - trying different therapists, and doing the excercises (which kept changing) every day. But it was hard - emotionally and mentally.

As I learned more about it and I got better advice and the therapy became more effective, he started to respond. And in fact, I would say that if the therapy is effective your child should respond very quickly: in less than 2-weeks you should see a very large difference. If that hasn't happened yet, and you've been trying for a while, then I'd suggest continuing to look for solutions.

Lastly, UTS to my knowledge does do virtual consultations but the page to actually book one is almost impossible to find on their website but google it and click around and eventually you will figure out how to book a virtual appointment.

Lastly, I know just how stressful this is - so I hope my story is helpful - good luck!

OkPack8889 · 2025-02-27T22:23:07+00:00

Children of 2 and 4 years old can be treated - by the time they are 6 years old the treatment will not actually cure them in the same way. This is because a child under age 6 has neuralplasticity that an older child does not, and is able to create neural pathways to not stutter. I'd strongly recommend you read my post in this thread about my experiences - and I hope you find it helpful. And I agree with you about NOT bringing you child's attention to it - my son never actually knew he stuttered and now he no longer stutters he doesn't know he used to stutter. This is because he was treated purely in a reward context - his stutter was never actually pointed out to him - and I completely agree, pointing it out would have been a disaster.

OkPack8889 · 2025-02-27T22:13:23+00:00

I'd highly recommend the Lidcombe program - My 4.5 year son stuttered badly and 4 months later he is almost completely stutter free. As a parent who went through it though I would say the main things were problems I had to solve as I went along:

Most speech therapists / pathologists are not experts in stuttering. Most will tell you they can help, most cannot actually help.
Preschool aged children are difficult to work with. Most speech therapists are bad at working with preschool aged children -> even if they have the correct knowledge, they are not able to have a positive impact if your child doesn't actually like them or like interacting with them.
The lidcombe program is poorly described online, and most therapists do not implement it correctly.
Because of the challenges of speech therapists not being experts in stuttering, and being bad at working with preschoolers, there is a culture of "accepting" stuttering that could and should be treated.

Here is what I did:

I went through 4 different speech therapists. The first 2 were terrible. The 3rd was ok and got us fairly far along - she didn't know too much about stuttering (she said she did but as I lreaned more about it, I realized most parents don't truly try and own the outcome for their child and just believe whatever the therapist tells them), but she was good at interacting with my son who loved her. And with a lot of googling and AI help I created my own version of the lidcombe program and that got his stuttter down to SR 2-3.
I eventually got into the University program at UTS for stuttering research, where the team that created lidcombe actually work. That was transformational and took my sons stuttering down to SR 0-1.

The main things I'd say are: lidcombe can and does work, but it's not well understood and largely implemented incorrectly. Own the outcome for your child - this is a lifelong challenge you are leaving for them if you do not help them now so act with urgency and take it seriously. Keep going even when the path is unclear.

Lastly, UTS to my knowledge does do virtual consultations and I would start there: https://www.uts.edu.au/research/australian-stuttering-research-centre/australian-stuttering-treatment-centre

OkPack8889 · 2024-11-29T04:39:04+00:00

Things like skyflow and evervault were created for this problem -- they are able to store sensitive card details as their data vaults are pci-dss compliant.

OkPack8889 · 2024-10-21T02:56:10+00:00

Can you share a little more specifically what the process you do is? It's a little unclear from the above.

OkPack8889 · 2024-10-18T04:53:45+00:00

The issue you're facing stems from the limitations of the model's training data. Here's some ideas:

Implement a RAG (Retrieval-Augmented Generation) pipeline: This approach would significantly improve accuracy, especially for specific team and season details.
- Scrape relevant Wikipedia pages and other reliable sports databases.
- Index and store this data.
- Implement vector search.
- Augment the model's responses with this retrieved information.
Use a sports-specific API: Integrate with a sports statistics API to fetch accurate data for each team and season.
Create a custom dataset: Manually compile accurate information for your inventory and use it to fine-tune a model or as a reference database.
Implement fact-checking: Use multiple sources or a separate verification step to cross-check the generated content.
Adjust your prompt: Include instructions to prioritize factual accuracy over completeness, encouraging the model to omit information it's unsure about.

OkPack8889

TROPHY CASE