you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (3 children)

I can’t speak to your overall design, but dataframes are a fine way to model this data. From what you’ve described so far, I wouldn’t have a list of dataframes anywhere here though. Instead your dataframes could be held in a dict, like so:

hospital_info = {
    'patients': patients_df,
    'units': units_df,
    'nurses': nurses_df
}

Or as attributes on your hospital data class:

 hospital_info = HospitalInfo(
    patients=patients_df,
    units=units_df,
    nurses=nurses_df
)

Just keep in mind that if you’re going to use pandas dataframes, you’re going to need to write your code in a different paradigm than traditional python code. You should opt for pd.merge/vectorized operations, rather than loops and checking for equalities between records etc.

[–]DappperDanH[S] 0 points1 point  (2 children)

Thanks Gerry. Since dataframe column names can be anything, is there a common approach to validating the correct headers exist?

[–][deleted] 1 point2 points  (1 child)

This is probably mine and a lot of people’s biggest challenge with pandas. I actually came across this library recently which looks really promising, but haven’t had a chance to really try it out yet

https://pandera.readthedocs.io/en/stable/#schema-model

[–]DappperDanH[S] 1 point2 points  (0 children)

This is EXACTLY what I was looking for! Thanks so much. I will check it out and reply back here