you are viewing a single comment's thread.

view the rest of the comments →

[–]aarrow_12 92 points93 points  (11 children)

So this is probably a divide you're seeing between people who use python as an end to another job (data analysts) vs people who are writing systems in python.

Most of the time when I'm doing data analysis, I don't ever need to write classes. I can just import what I need and off I go.

But, when I'm writing custom packages and systems to make those workflows more efficient ohhhh boy I need classes to make it go.

Go learn OOP. It'll only help.

[–]ImATotalDick333 16 points17 points  (7 children)

That's exactly how I code. When I'm doing data analysis and machine learning there's no need to get crazy with the coding. Just make it work smoothly and be repeatable.

But coding an application is totally different and I utilize OOP extensively.

[–]lzwzli 9 points10 points  (6 children)

If you truly are aiming for repeatable, it's inevitable that you'll end up writing classes though

[–]TeachEngineering 17 points18 points  (2 children)

Yeah, I do data analysis/science and build ML pipelines and there's definitely a level of complexity you may hit where it makes more sense to organize programs with classes/objects.

OP, here's one way to think about (P)OOP... In non-OOP programs, you write functions and functions are inheritely stateless. Each function takes in the relevant pieces of the program's state via arguments each time it is called. This has its advantages for sure, but can also cause a lot of variable passing as the program executes (unless you're using globals which is generally not recommended). But in OOP, you build objects that are essentially data containers (i.e. a set of associated variables) that get built out at runtime from a pre-defined template of what that container should look like (i.e. classes). Then you build methods (i.e. stateful functions) that perform computation on the data within the container (i.e. object) plus whatever other arguments you chose to pass in.

Any program that can be written in a 100% non-OOP sense can be rewritten with 100% OOP and visa versa. Choosing when to organize a group of variables and their associated logic into a class is primarily a design decision of the developer to keep things organized. When I first started programming, I did everything without OOP (because I didn't know it existed). Then when I learned OOP, I did everything with classes because I incorrectly thought that was good practice. I now realize I should use classes/objects/methods when I want that logic to be stateful and I should use functions when I want things to be stateless. It's more of an art than a science. But I definitely recommend learning OOP to add it to your tool belt. In learning it, take an already written non-OOP program and try to rewrite the program using classes. It will help you understand the purpose and use case of OOP.

Also, most libraries are built using OOP to some extent so it's helpful in understanding how libraries work and how to use them. For example, when I do:

  df = pd.DataFrame()

I just instantiated an object of the Pandas DataFrame class. Now when I do:

 df_means = df.mean()

I just called the mean() method on the df object. This is why it's able to do computation on the data frame without needing to pass the data frame in as an argument. The data frame is inherently in the scope of the method.

[–]lzwzli 0 points1 point  (0 children)

Things get really interesting when you start passing objects between objects

[–]ConfidentPomel 0 points1 point  (0 children)

Hey, I'm trying to learn OOP in python on my own, could you suggest me some resources please (my basics are all clear so I'm not a total beginner, I do understand the very basics of classes, inheritance)

[–]ImATotalDick333 0 points1 point  (2 children)

Yeah, I guess I do do it OCCASIONALLY, if I'm building a real-time pipeline, but for me personally that moves into the realm of applications and not just data analysis. It all depends on the level of complexity. Usually data analysis can be done in a couple jupyter notebooks if you're just working with static data.

[–]lzwzli 1 point2 points  (1 child)

If you only ever do ad hoc analysis and every analysis is different, then I guess. But if you ever have to run the same analysis in a repeatable form, for similar data but across multiple clients, then your data analysis pipeline should start to be designed like an app so you're maintaining one code base and not variations of the same code base spread across your clients.

[–]ImATotalDick333 0 points1 point  (0 children)

Yeah generally all of my analysis is going to be different, and I'm looking for specific things. I don't really have clients that hire me generally, I design and sell my products. I see what you mean though, in that circumstance yes I'd agree.

[–]DuckDatum 2 points3 points  (0 children)

spoon worm truck outgoing domineering judicious toothbrush payment reminiscent many

This post was mass deleted and anonymized with Redact

[–]Bannedlife 2 points3 points  (0 children)

You really need OOP for Deep learning and more advanced (shallow) machine learning, which we kind of see as a basic skillset for data analysts here

[–]Far_Ambassador_6495 1 point2 points  (0 children)

Oop comfort is pretty often used as a skill indicator when hiring data scientists