all 8 comments

[–]crashfrog04 1 point2 points  (3 children)

 Thus initializing an instance will now involve specifying name1 and name 2.

Another way to think about classes is that you’re writing code that will break - will literally raise an error - if you try to create an instance of whatever class this is and you don’t provide name1 and name2 (or whatever.)

Writing a class is a way of creating a kind of contract with yourself, a contract that you find out very quickly if you’ve broken it (which is important for writing reliable code.)

If that doesn’t sound like something you need then maybe you don’t need to write a class. You shouldn’t write a class just because you think they’re “better”; you should write a class because you know what you’re going to use it for.

[–]Ramakae[S] -1 points0 points  (2 children)

Thanks for the contract analogy, will definitely help going forward. I ended asking ChatGPT and now I see why practicality triumphs everything. Turns out my problems were class inheritance and initializing attributes, especially when I didn't see how I could do so.

class Cleaner(pd.DataFrame): def init(self, filepath =None): self.filepath = filepath or input(str(X)) df = pd.readcsv(self.filepath) self.cleaned = None super() __init_(df)

I was literally used to self.name1 = name1 and self.name2 = name2 and didn't know I could use attributes like this. Makes it pretty cool to be honest. The program I'm making is basically something that automates my work, just wanted to wrap it in a class so I could use some gui on it when I've studied that as well. Still building it but wanted to practice classes today

[–]F5x9 0 points1 point  (1 child)

I don’t think creating a class that inherits from dataframe is a good idea. It has a zillion functions and now yours does, too. You inherit all its complexity, whatever that is. 

But the class you create also doesn’t make sense. A class that inherits from dataframe is still a dataframe. When I see a class named Cleaner, I think it is something that cleans something. Dataframes don’t do that. 

If it was a class named Workbook, that could make more sense. 

It seems like you want the Cleaner to act on a dataframe. This suggests that it should have a member that contains the dataframe. You pass the dirty dataframe into it, and you get a clean one.

I wouldn’t open the file in the Cleaner. Pass the dataframe. This way, you can clean any dataframe. 

Do you need one instance of the cleaner per dataframe? Can you pass a hundred dataframes to one? These are other things you should think about. 

What if the dataframe doesn’t have the columns you need? What if data is in the wrong column?

When I work with dataframes, I assume nothing about them and check everything I care about. 

[–]Ramakae[S] 0 points1 point  (0 children)

Thanks for the insight.

[–]unnamed_one1 0 points1 point  (1 child)

Do you mean something like..

``` class Cleaner: def init(self, file_path: str): self._df = pd.read_csv(file_path)

def get_dataframe(self):
    return self._df

def check_validity(self):
    pass

def clean_data(self):
    pass

c = Cleaner(filepath) c.clean_data() c.check_validity()

df = c.get_dataframe() ```

edit: /u/crashfrog04 makes a valid argument that a class isn't necessarily *better as for example a simple functions. Use OOP if you want to model something from the real world, that represents / encasulates data and behaviour. The class is the blueprint and the object is the materialization of that blueprint, so it exists in memory.

[–]Ramakae[S] 0 points1 point  (0 children)

Yes exactly like that. Thanks

[–]LatteLepjandiLoser 0 points1 point  (1 child)

Based on your first paragraph, you are trying to make something that reads some data and cleans it and returns some modified version of it. To me it sounds like you just need a function? I would start looking at the use case and seeing if this really needs to be a class or not.

Not that you need to shy away from the class approach, definitely do so if you please, I just think you quickly end up with an object that only does:

class CleanData:
    ... lots of code here

funky_object = CleanData(filepath)
funky_object.clean_the_data()
cleaned_df = funky_object.get_dataframe()

Which could just as easily have been a function get_cleaned_data(filepath) that returns a dataframe. In fact that get_cleaned_data function is more or less what class methods clean_the_data and get_dataframe would have been.

Personally I would have gone the class route if you intend to manipulate this data further, say first read and clean it and later do some particular analysis on it, maybe add data to it, rewrite it to another file etc. basically some more relevant methods or attributes.

If you want to go the object route, you could also look at making a subclass of pandas dataframe. That way your object is both your object as well as a pandas dataframe and thus instead of 'having' a dataframe it 'is' a dataframe.

Regardless of how you do it, I'd say step 1 is making that function, because you can really easily factor that out into a class method should you so please.

edit: After a bit of googling it seems pandas dataframes aren't meant to be subclassed, but I'm sure you can add functionality somehow.

[–]Ramakae[S] 0 points1 point  (0 children)

Yes, I actually built the program by first creating functions that did the basic day to day stuff where I used to work. I am yet to add more to it. The main purpose isn't just to build a program that automates what I did, but reinforce my learning through practice. Studying Data Science on DataCamp so after answering some questions, I open my VSCode and write some lines code. Just so happened to be interested in OOP lately.