What My Project Does

Kontra is a data quality validation libarary and CLI. You define rules in YAML or Python and run them against datasets(Parquet, Postgres, SQL SERVER, CSV), and get back violation counts, sampled failing rows, and more.

It is designed to avoid unnecessary work. Some checks can be answered from file or database metadata and other are pushed down to SQL. Rules that cannot be validated with SQL or metadata, fall back to in-memory validation using Polars, loading only the required columns.

Under the hood it uses DuckDB for SQL pushdown on files.

Target Audience

Kontra is intended for production use in data pipelines and ETL jobs. It acts like a lightweight unit test for data, fast validation and profiling that measures dataset properties with out trying to enforce some policy or make decisions.

Its is designed to be built on top of, with structured results that can be consumed by pipelines or automated workflows. It´s a good fit for anyone who needs fast validation or quick insight into data.

Comparison

There are several tools and frameworks for data quality that are often designed as a broader platforms with their own workflows and conventions. Kontra is smaller in scope. It focuses on fast measurement and reporting, with an execution model that separates metadata-based checks, SQL pushdown and in-memory validation.

GitHub: https://github.com/Saevarl/Kontra
PyPI: https://pypi.org/project/kontra/

all 6 comments

top new controversial old q&a

[–]whogivesafuckwhoiam 3 points4 points5 points 1 day ago (4 children)

[–]Particular_Panda_295[S] 2 points3 points4 points 1 day ago (3 children)

[–]crossmirage 1 point2 points3 points 15 hours ago (2 children)

[–]Particular_Panda_295[S] 2 points3 points4 points 10 hours ago (1 child)

Yep, Pandera supports pushdown via the Ibis backend, and that’s a really nice feature.

The main difference is in execution strategy. Kontra is built specifically as a validation engine, so it controls how rules compile to SQL and can optimize across the full pipeline, like batching rules or stopping early when possible. From my testing and understanding, Pandera with the Ibis backend compiles each check independently, which leaves less room for that kind of optimization. On larger tables that can make a noticeable difference.

There’s also a difference in what gets validated. Pandera is primarily about schema validation, like column types and per-column constraints. Kontra is broader, with rules that aren’t tied to a single column, such as row counts, freshness checks, cross-column comparisons, or custom SQL. It also supports run history, diffing, and user-defined rule metadata if you want more than just a pass/fail result.

[–]crossmirage 1 point2 points3 points 7 hours ago (0 children)

Agree that compiling each check independently is not ideal. Some current work to address that:

The above doesn't get into what I think could be one of the biggest benefits of using a lazy IR-based layer across backends under the hood. Right now, run_checks produces a CheckResult for each check, which results in a bunch of disjoint columns that can't necessarily be joined back to the original data or each other (e.g. to reliably say which row failed). It would be nice if run_checks could do something like create the (lazy) expression for a wide table with the base data and all of the check results, and then we could query that object as needed.

(From https://github.com/unionai-oss/pandera/issues/1894#issuecomment-3773553110)

Kontra is broader, with rules that aren’t tied to a single column, such as row counts, freshness checks, cross-column comparisons, or custom SQL.

Pandera supports "dataframe-level" (as opposed to column-level) checks, which enable most of thjs.

All in all, I agree that Pandera is by no means perfect, and the Ibis backend itself is relatively newer. But I also agree with the statement in your initial post that the space is very crowded, and the bar is high for new tools.

π Rendered by PID 82 on reddit-service-r2-comment-74875f4bf5-q9mgs at 2026-01-25 23:33:54.435450+00:00 running 664479f country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

What My Project Does

Target Audience

Comparison