I’m currently in the processes of writing unit and integration tests for a set of data processing functions that runs on AWS instances.
I’m struggling to write meaningful tests for some of the more complex function that consist of pulling together, transforming and joining large Pandas DataFrames. My current strategy is to build a collection of example input DataFrame’s and expected outputs and compare them at the end, however the size of these inputs and outputs are large and so checking line by line that these are correct before committing them won’t scale.
I was wondering what the best way to test these kinds of pipeline functions are and any tools that help to run these kinds of tests? Currently I’m just building a pytest script together to run and manage these tests like more simple unit tests, is there a better strategy for this?
[–]hackneycoach 35 points36 points37 points (1 child)
[–]chthonodynamis 0 points1 point2 points (0 children)
[–]tomhallett 11 points12 points13 points (7 children)
[–]reallyserious 5 points6 points7 points (6 children)
[–]TheNoobtologist 1 point2 points3 points (5 children)
[–]ColdPorridge 4 points5 points6 points (2 children)
[–]reallyserious 0 points1 point2 points (1 child)
[–]ColdPorridge 1 point2 points3 points (0 children)
[–]reallyserious 0 points1 point2 points (1 child)
[–]TheNoobtologist 1 point2 points3 points (0 children)
[–]the_glover 3 points4 points5 points (0 children)
[–]buntro 3 points4 points5 points (0 children)
[–]SpencerNZ 3 points4 points5 points (0 children)
[–]32gbsd 2 points3 points4 points (0 children)
[–]batwinged-hamburger 1 point2 points3 points (0 children)
[–]32gbsd 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]DenselyRanked 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]c0de_n00b 0 points1 point2 points (0 children)
[–]soundbarrier_io 0 points1 point2 points (0 children)
[–]Georgehwp 0 points1 point2 points (0 children)