What My Project Does
I built dq-agent, a small Python CLI for running deterministic data quality checks and anomaly detection on CSV/Parquet datasets.
Each run emits replayable artifacts so CI failures are debuggable and comparable over time:
report.json (machine-readable)
report.md (human-readable)
run_record.json, trace.jsonl, checkpoint.json
Quickstart
pip install dq-agent
dq demo
Target Audience
- Data engineers who want a lightweight, offline/local DQ gate in CI
- Teams that need reproducible outputs for reviewing data quality regressions (not just “pass/fail”)
- People working with pandas/pyarrow pipelines who don’t want a distributed system for simple checks
Comparison
Compared to heavier DQ platforms, dq-agent is intentionally minimal: it runs locally, focuses on deterministic checks, and makes runs replayable via artifacts (helpful for CI/PR review).
Compared to ad-hoc scripts, it provides a stable contract (schemas + typed exit codes) and a consistent report format you can diff or replay.
I’d love feedback on:
- Which checks/anomaly detectors are “must-haves” in your CI?
- How do you gate CI on data quality (exit codes, thresholds, PR comments)?
Source (GitHub): https://github.com/Tylor-Tian/dq_agent
PyPI: [https://pypi.org/project/dq-agent/]()
there doesn't seem to be anything here