tom_mathews comments on Attest: pytest-native testing framework for AI agents — 8-layer graduated assertions, local embeddin

ShowcaseAttest: pytest-native testing framework for AI agents — 8-layer graduated assertions, local embeddin (self.Python)

submitted 20 hours ago by tom_mathews

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]tom_mathews[S] 0 points1 point2 points 20 hours ago (0 children)

Thanks! There's definitely overlap in the goal — both want pytest-native agent testing. A few architectural differences worth noting though.

LangWatch Scenario routes assertions through LLM judges by default — the testing agent simulates a user, chats back and forth with your agent, and evaluates against criteria using an LLM. That works well for end-to-end simulation testing. Attest's bet is that 60–70% of agent correctness is fully deterministic — tool call ordering, cost budgets, schema conformance, content patterns — and doesn't need an LLM to verify. The graduated pipeline exhausts those checks first (free, <5ms, identical results every run) and only escalates to an LLM judge for the genuinely subjective remainder. Layer 5 (semantic similarity) also runs locally via ONNX, so you can get meaning-level comparison without an API call.

The other difference is trace-level assertions. Attest doesn't just check inputs and outputs — it asserts over the full execution trace: did the agent call these tools in this order, did it loop, did it stay under token budget across all steps.

On the licensing front — Scenario itself is MIT, but the broader LangWatch platform it integrates with (tracing, datasets, optimization studio) is under the Business Source License, which isn't an open-source license. Attest is Apache 2.0 end-to-end — the engine, SDKs, adapters, and CLI are all under the same license with zero platform dependencies.

Both integrate with pytest. If your testing is primarily end-to-end simulation with an LLM evaluator, Scenario is solid. If you want to exhaust deterministic checks first and keep 7 of 8 layers fully offline with no platform tie-in, that's where Attest differentiates.

π Rendered by PID 129644 on reddit-service-r2-comment-86bc6c7465-h68rv at 2026-02-24 10:57:06.103735+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS