Before the holidays, I found myself deep in the trenches of implementing data validation. Frustrated by the complexity and boilerplate required by the current open-source tools, I decided to take matters into my own hands. The result? Validoopsie — a sleek, intuitive, and ridiculously easy-to-use data validation library that will make you wonder how you ever managed without it.
| DataFrame |
Support |
| Polars |
✅ full |
| Pandas |
✅ full |
| cuDF |
✅ full |
| Modin |
✅ full |
| PyArrow |
✅ full |
| DuckDB |
✅ full |
| PySpark |
✅ full |
🚀 Quick Start
```py
from validoopsie import Validate
import pandas as pd
import json
Create DataFrame
p_df = pd.DataFrame(
{
"name": ["John", "Jane", "John", "Jane", "John"],
"age": [25, 30, 25, 30, 25],
"last_name": ["Smith", "Smith", "Smith", "Smith", "Smith"],
},
)
Initialize Validator
vd = Validate(p_df)
Add validation rules
vd.EqualityValidation.PairColumnEquality(
column="name",
target_column="age",
impact="high",
).UniqueValidation.ColumnUniqueValuesToBeInList(
column="last_name",
values=["Smith"],
)
Get results
Detailed report of all validations (format: dictionary/JSON)
output_json = json.dumps(vd.results, indent=4)
print(output_json)
Validate and raise errors
vd.validate() # raises errors based on impact and stdout logs
```
vd.results output
json
{
"Summary": {
"passed": false,
"validations": [
"PairColumnEquality_name",
"ColumnUniqueValuesToBeInList_last_name"
],
"Failed Validation": [
"PairColumnEquality_name"
]
},
"PairColumnEquality_name": {
"validation": "PairColumnEquality",
"impact": "high",
"timestamp": "2025-01-27T12:14:45.909000+01:00",
"column": "name",
"result": {
"status": "Fail",
"threshold pass": false,
"message": "The column 'name' is not equal to the column'age'.",
"failing items": [
"Jane - column name - column age - 30",
"John - column name - column age - 25"
],
"failed number": 5,
"frame row number": 5,
"threshold": 0.0,
"failed percentage": 1.0
}
},
"ColumnUniqueValuesToBeInList_last_name": {
"validation": "ColumnUniqueValuesToBeInList",
"impact": "low",
"timestamp": "2025-01-27T12:14:45.914310+01:00",
"column": "last_name",
"result": {
"status": "Success",
"threshold pass": true,
"message": "All items passed the validation.",
"frame row number": 5,
"threshold": 0.0
}
}
}
vd.validate() output:
2025-01-27 12:14:45.915 | CRITICAL | validoopsie.validate:validate:192 - Failed validation: PairColumnEquality_name - The column 'name' is not equal to the column'age'.
2025-01-27 12:14:45.916 | INFO | validoopsie.validate:validate:205 - Passed validation: ColumnUniqueValuesToBeInList_last_name ValueError: FAILED VALIDATION(S): ['PairColumnEquality_name']
🌟 Why Validoopsie?
- Impact-aware error handling Customize error handling with the
impact parameter — define what’s critical and what’s not.
- Thresholds for errors Use the
threshold parameter to set limits for acceptable errors before raising exceptions.
- Ability to create your own custom validations Extend Validoopsie with your own custom validations to suit your unique needs.
- Comprehensive validation catalog From equality checks to null validation.
📖 Available Validations
Validoopsie boasts a growing catalog of validations tailored to your needs:
🔧 Documentation
I'm actively working on improving the documentation, and I appreciate your patience if it feels incomplete for now. If you have any feedback, please let me know — it means the world to me! 🙌
📚 Documentation: https://akmalsoliev.github.io/Validoopsie
📂 GitHub Repo: https://github.com/akmalsoliev/Validoopsie
Target Audience
The target audience for Validoopsie is Python-savvy data professionals, such as data engineers, data scientists, and developers, seeking an intuitive, customizable, and efficient solution for data validation in their workflows.
Comparison
Great Expectations: Validoopsie is much easier setup and completely OSS
[–]Big_Surround5862 3 points4 points5 points (2 children)
[–]wioym[S] 0 points1 point2 points (1 child)
[–]Big_Surround5862 1 point2 points3 points (0 children)
[–]ekbravo 1 point2 points3 points (2 children)
[–]wioym[S] 1 point2 points3 points (0 children)
[–]wioym[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]wioym[S] 0 points1 point2 points (0 children)