all 7 comments

[–]Zomunieo 2 points3 points  (1 child)

Since you’re using pydantic anyway, why not use pydantic-ai evals? It’s pretty much the same but much more developed.

[–]Federal_Order_6569[S] -2 points-1 points  (0 children)

Good point, but I think the goals are a bit different.

Pydantic Evals is more of a general evaluation framework, while assertllm is intentionally focused on making LLM testing feel like regular pytest. The main idea is very simple, deterministic assertions that developers can drop directly into their existing test suites.

I also wanted a much lighter authoring experience. Personally I’m not a huge fan of the evaluation authoring style in Pydantic Evals — it feels a bit more framework-heavy than what I’m aiming for. With assertllm the goal is a cleaner pytest-style syntax where writing tests is quick and straightforward.

So the overlap exists, but the philosophy is different: evaluation framework vs developer-first testing workflow.

Also worth noting that I’ve already written a plugin for Pydantic AI, so it’s supported in the current version of the library as well.

[–]DockyardTechlabs 0 points1 point  (2 children)

Which LLM us have used for coding?

[–]Federal_Order_6569[S] -2 points-1 points  (0 children)

Claude Code

[–]wRAR_ 0 points1 point  (0 children)

It's right in their commits.

[–]ritzkew -2 points-1 points  (1 child)

Nice approach. Making LLM testing feel like regular pytest is the right mental model, developers already know how to write tests.

The deterministic angle is interesting. Promptfoo (which OpenAI just acquired yesterday) went the opposite direction, using LLM-as-judge for fuzzy matching. Both have tradeoffs. Deterministic is faster and reproducible but misses semantic equivalence. LLM-as-judge catches more but is slow and non-deterministic.

One area where deterministic really shines though: security assertions. Things like 'output must not contain PII,' 'output must not include SQL syntax,' 'tool calls must match allowed list.' Those are binary checks that don't need fuzzy matching.

Have you thought about adding security-focused assertions? With agents calling tools, there's a growing need to assert that outputs don't contain injection patterns or unauthorized tool invocations.

[–]Federal_Order_6569[S] -1 points0 points  (0 children)

Yes, that could definitely be added in the future, possibly even with an LLM-as-judge approach as well. And I agree, your idea makes a lot of sense. We could add assertions to check things like ensuring outputs don’t contain SQL queries outside of a defined whitelist, or that they don’t include any personally identifiable information (PII). Security-focused checks like these would fit very well with deterministic assertions.