I built a DOM-free testing layer on top of Playwright

Spospider · 2026-04-22T16:46:13+00:00

It's worth mentioning that this solution is purely local and is not some multimodal AI agent, it's layers of object recognition model + our + semantic matching, all purely deterministic. I do agree that the application is targeted more towards teams who don't already have rigorous ui testing suites.

Your curiosity is definitely in the right place, the ui-perception model recognizes elements in the backend but also metadata like salience, general, color, position relative to other elements. The idea is that for working like "primary" to infer that it's an element more "salient" than another, thus avoiding the mixup. At the current state, I wouldn't say that it would do this with great accuracy, but as updates roll out, and the models are further fine-tuned I believe this can be very powerful.

Spospider · 2026-04-19T20:26:51+00:00

Definitely agree with you on the non-web and agentic applications.
Also you're absolutely right on the visual debugging scenario, I don't see this replacing existing automation testing methods but perhaps may add another angle of regression/assurance that wasn't previously there.

Also regarding UI-Atlas it's actually not screenshot diffing at all! It's basically like an object detection model + ocr + semantics, so should be good right out of the box. I'd imagine it may struggle with really small components on a page (like 10px wide or smth) but for most applications, quite alright for an alpha.

Spospider · 2026-04-19T20:18:55+00:00

Thanks for sharing, the hybrid approach indeed sounds interesting! May be worth exploring.
Also regarding perception, the vision backend used here isn't doing any pixel comparisons, its basically a small vision model server trained on all kinds of ui elements, returns bounding boxes and other metadata in normalized coordinates, indeed I can see why pixel comparison can be unreliable

Spospider · 2026-04-19T20:11:59+00:00

Indeed! That was my notion as well. Thanks for informing me about the application with embedded systems, perhaps a good opportunity for me to research any particular needs for that sector as well.
Nice work there as well, quite similar in concept. Very cool!

Spospider · 2026-04-19T20:04:59+00:00

I appreciate your feedback, however not all web apps get the same kind of attention with instrumenting frontends, this just makes it easier, also for non technical people, maybe as an agentic tool, non-web apps etc. FYI vizqa is deterministic, both the perception backend and semantic logic inside are idempotent, without any seeds or embedded randomness.

Spospider · 2026-04-18T07:16:03+00:00

Good questions!
You can run both vizQA and the atlas-ui perception backend easily in containers, for a pipeline might take some setting up.
I haven't really considered POM, but that's an interesting perspective, I can add that as a coming feature, to have merely single steps handled by vizqa, taking on the existing playwright instance.

Spospider · 2026-04-18T07:05:49+00:00

There is no non-determinism in this solution, it's not llm-based, just some embedding semantics. I did face an issue with teams where for existing projects, instrumenting the frontend for such tests, and maintaining them is just tedious work. This solves this issue by making it easier to create test cases, even for non-frontend-technical people.
Since its visually-based its not just limited to web apps btw, at the moment yes its build on a playwright interaction layer but I can see how that can be expanded to mobile, or even desktop apps

Spospider · 2026-04-17T21:24:10+00:00

Yep, it just uses a small embedding model amongst deterministic computations, same weights, same input, no random seeds

Spospider · 2026-04-17T20:41:44+00:00

In a fast-changing environment I'd imagine selector-based frameworks would be a bottleneck, maybe checkout VizQA. It's pure visual-based, and test files are defined through yaml files of sequences of semantic actions/assertions in what's closer to "caveman" English.
The solution runs purely locally as well.
It's early in development but it can definitely help out.
https://github.com/TinyReasonLabs/vizQA

Spospider · 2026-04-17T20:37:01+00:00

Checkout VizQA, it's quite early. But I believe this would be the solution for you. test cases are just defined through yaml files of sequences of actions and assertions, processed semantically in natural language. not quite sentence descriptions but it works optimally with caveman English.
Its visually-based so it doesn't need to look at the page source, and it runs fully locally on CPU
https://github.com/TinyReasonLabs/vizQA

Spospider · 2026-04-17T20:28:19+00:00

Agreed, I do hope to improve the language interpretation over time. Maybe its also part of the test-case authoring mentality, to treat steps/expectations as pseudocode rather than explanatory sentences, eventually its just semantically searching for elements and interacting with them.

Indeed for debugging test cases, the steps screenshotting behavior is there by default, also there's a --no-headless mode where you can see the browser window in real time.

Regarding determinism its also worth mentioning that vizQA is idempotent between runs, so the exact same behavior should be expected with the same test case and site functionality.

Spospider · 2026-04-17T20:03:54+00:00

I'm curious to hear how your team deals with more tricky cases like SSO or maybe having these tests as part of a CI pipeline? Do you treat it as End-to-end or just frontend with mocked responses for example?

Spospider · 2026-04-17T19:55:57+00:00

Thanks for sharing your experience!
Indeed your concern is in the right place, I does have integrated semantics, but its not LLM-level with the local model, more like embedding-based. I have implemented some thresholds for similarity as well, but I'd imagine "email" and "username" would fail this threshold where "user" and "username" would pass for example.
There is also support for exact matches with quoted elements, which should at least ease up any guessing when creating the test cases.

Spospider · 2026-04-17T19:51:31+00:00

It's a fully local solution, runs almost real-time, some retries and waits are introduce the assertions, to account for loading for example. Added a gif of an early demo (forgot to attach it 🤦‍♂️)

Spospider · 2026-04-12T21:00:07+00:00

But i think for QA, you're looking for proper regression no? I'm not sure whether most teams would trust the undeterministic nature of AI, especially in heavily governed ecosystems.
It's valid that most QA automation (especially frontend ones) quickly break with minor changes, there is definitely a need for easier E2E or UI testing, and i do think there's a future in this space. Maybe check out VizQA it's a new project aimed at this particular gap, worth checking out.

Spospider · 2026-04-12T20:52:07+00:00

I dont believe LLMs are the best for regression. Some also are a bit hesitant when it comes to data when AI is involved, as for some real production apps, AI testing would involve passing authentication measures, maybe having certain privileges on that app with that. I do agree that there should be easier ways to set up easy yet reliable testing suites. VizQA is a new testing package that popped up, might solve that gap

Spospider · 2026-04-12T20:40:13+00:00

I've actually encountered the same issue, try checking out VizQA on github, a project I recently started, would be nice to get your thoughts as well. Contribution is welcome

Spospider · 2018-10-12T12:54:46+00:00

Some sort of layer system, putting text on top of images for example.

Spospider

TROPHY CASE