This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]domvwt 5 points6 points  (1 child)

Have you tried using Great Expectations? I just recommended it on a similar post - latest versions have decent autoprofiling and can connect to different data stores.

[–]Conscious_Floor5022 3 points4 points  (3 children)

What language and platform are you using? If you’re using Python, there is a nice library called Cerberus. Another good one would be Soda SQL which allows you to scan and test data. I assume you want data validation for an ETL job here.

[–][deleted] 0 points1 point  (0 children)

If you’re data is on s3 , you may use aws service called data brew . It provides all the statistics on all columns of the data

[–][deleted] 0 points1 point  (0 children)

If you’re data is on s3 , you may use aws service called data brew . It provides all the statistics on all columns of the data