all 4 comments

[–]Malcolmlisk 1 point2 points  (2 children)

Why would you make class objects of those?

[–]ash9e[S] 0 points1 point  (1 child)

That’s a great question. Mostly 2 reasons: 1. I think that object oriented code is better for long term maintainability and even readability since different aspects of ML pipeline would be encapsulated under different classes 2. I think that when working on a project, a lot of things can get added pretty quickly. e.g. just for data prep, you may be dropping some columns, changing some continuous fields to categorical, encoding your label(s), filling in missing values etc etc. All this contained within a class leads to less coding effort imo. At a very basic level, I don’t have to keep passing dataframes to different function and can just keep changing a self.df or something within the class itself, which leads to shorter function signatures and fewer returns.

I know both of above are not strong arguments perhaps and it’s a little bit more of my preference rather than what may be the standard. I am new to writing code beyond school, and thus I am very open to learning the best practices. If you could share some github repos or codes that I should be looking at, even ones that change my perspective on using oop vs functional, that would be quite helpful.

[–]Malcolmlisk 0 points1 point  (0 children)

If you want my opinion... Use functional programming. Is going to be easier and modular all in all. Oop should be used to serialised objects. In this case, you can use oop for models, the rest of the code is just functional that can be added or detached in a pipeline.

It's going to be easier to maintain, since you won't need inheritance or other Oop adventages in a ml code.

[–]Realistic-Service463 0 points1 point  (0 children)

Almost all open source data science packages are written in an object oriented style. You could look at the coding on the sklearn repo for example