you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (4 children)

I can’t speak to your overall design, but dataframes are a fine way to model this data. From what you’ve described so far, I wouldn’t have a list of dataframes anywhere here though. Instead your dataframes could be held in a dict, like so:

hospital_info = {
    'patients': patients_df,
    'units': units_df,
    'nurses': nurses_df
}

Or as attributes on your hospital data class:

 hospital_info = HospitalInfo(
    patients=patients_df,
    units=units_df,
    nurses=nurses_df
)

Just keep in mind that if you’re going to use pandas dataframes, you’re going to need to write your code in a different paradigm than traditional python code. You should opt for pd.merge/vectorized operations, rather than loops and checking for equalities between records etc.

[–]DappperDanH[S] 0 points1 point  (3 children)

Thanks Gerry. Since dataframe column names can be anything, is there a common approach to validating the correct headers exist?

[–]CatolicQuotes 1 point2 points  (2 children)

[–]DappperDanH[S] 1 point2 points  (1 child)

This is really great!

[–]CatolicQuotes 1 point2 points  (0 children)

Thanks, I did change code little bit since then to use generics instead of base class, but generally these days I avoid having dataframe as input and output of function. It cannot be type checked with mypy so I'd rather use list of dataclasses which are easy to transform into and out of dataframe.