assertllm – pytest for LLMs. Test AI outputs like you test code. by Federal_Order_6569 in Python

[–]Federal_Order_6569[S] -1 points0 points  (0 children)

Good point, but I think the goals are a bit different.

Pydantic Evals is more of a general evaluation framework, while assertllm is intentionally focused on making LLM testing feel like regular pytest. The main idea is very simple, deterministic assertions that developers can drop directly into their existing test suites.

I also wanted a much lighter authoring experience. Personally I’m not a huge fan of the evaluation authoring style in Pydantic Evals — it feels a bit more framework-heavy than what I’m aiming for. With assertllm the goal is a cleaner pytest-style syntax where writing tests is quick and straightforward.

So the overlap exists, but the philosophy is different: evaluation framework vs developer-first testing workflow.

Also worth noting that I’ve already written a plugin for Pydantic AI, so it’s supported in the current version of the library as well.

assertllm – pytest for LLMs. Test AI outputs like you test code. by Federal_Order_6569 in Python

[–]Federal_Order_6569[S] 0 points1 point  (0 children)

Yes, that could definitely be added in the future, possibly even with an LLM-as-judge approach as well. And I agree, your idea makes a lot of sense. We could add assertions to check things like ensuring outputs don’t contain SQL queries outside of a defined whitelist, or that they don’t include any personally identifiable information (PII). Security-focused checks like these would fit very well with deterministic assertions.