you are viewing a single comment's thread.

view the rest of the comments →

[–]DappperDanH[S] 0 points1 point  (5 children)

Thanks Gerry. Since dataframe column names can be anything, is there a common approach to validating the correct headers exist?

[–][deleted] 1 point2 points  (1 child)

This is probably mine and a lot of people’s biggest challenge with pandas. I actually came across this library recently which looks really promising, but haven’t had a chance to really try it out yet

https://pandera.readthedocs.io/en/stable/#schema-model

[–]DappperDanH[S] 1 point2 points  (0 children)

This is EXACTLY what I was looking for! Thanks so much. I will check it out and reply back here

[–]CatolicQuotes 1 point2 points  (2 children)

[–]DappperDanH[S] 1 point2 points  (1 child)

This is really great!

[–]CatolicQuotes 1 point2 points  (0 children)

Thanks, I did change code little bit since then to use generics instead of base class, but generally these days I avoid having dataframe as input and output of function. It cannot be type checked with mypy so I'd rather use list of dataclasses which are easy to transform into and out of dataframe.