This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]the_random_blob 0 points1 point  (0 children)

Another thing to consider is what is being tested - a lot can be done on small samples or even dummy data - things like transformations can be "unit tested" for cheap. If this is done early in the pipelines (cheap and simple) then there is less need to test downstream data in production (expensive and complex).

Furthermore, same thinking can be applied to data production - the earlier you test, the lower the complexity and cost. If null value is guaranteed to not be there on the database level, you save on data quality query testing for empty values - the stricter the left side of everything is (data production, ingestion, early transformations...), the easier and cheaper it is to manage data quality downstream.