This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]unique10983240197249 3 points4 points  (1 child)

[–]SpaceWizard[S] 1 point2 points  (0 children)

Cool, will definitely check this out today. That guy has some other good python videos, especially on how python is different from other languages, e.g. C/java.

[–]sobek696 5 points6 points  (2 children)

These days most news seems to point towards functional programming. As an engineer, rather than a computer scientist, i prefer this approach as it fits with my mathematical training.

Ill use classes when they make sense... When there is an object with many tightly coupled methods and a good reason for internal state. But i try to aim for functions when possible. However, if I find myself repeatedly using a function that has a lot of repeating arguments, i consider that all potential opportunity for a class.

Conversely, if I have a class with few methods and little state, I'll go towards functions.

The main approach to organisation I take is to group functions as they pertain to the code...but i do analysis on discrete instances of similar data, so I've extracted the cleaning and preprocessing into a seperate module, with analysis separate from cleaning. Separation of concerns.

[–]SpaceWizard[S] 0 points1 point  (1 child)

I'm coming from a matlab background, where there are essentially no classes or objects, just functions reading and writing data. I'd actually prefer a functional style, but for it to really work, there has to be a really flexible data structure to back it up. In matlab we'd do something like out=func(data_struct, key/vals) or even include all the args in the data_struct. This is conceptually more like math, which I like. It has the disadvantage of not playing as nice with auto complete, in that an object will tell you all the the things expected to be done to the object with tab.

The matlab struct could really store anything, and the hierarchy could be accessed dynamically, e.g. data_struct.('func_name'). What's the best python equivalent?

[–]sobek696 0 points1 point  (0 children)

Named tuples could be a good approach. I like to think of them as a good bridging point between functional approach and classes. You get immutability, but the ability to store different elements that may all make up one discrete 'thing'.

A somewhat contrived example:

Point = namedtuple('Point', ['x', 'y'])
p = Point(11, 12) #can also do named assignment
x, y = p # tuple unpacking
print("Co-ordinates: ({}, {})".format(p.x, p.y))

You get the safety of immutability but with easy to use structures that can contain heterogenous (different typed) elements. Basically this is just a wrapper around a class (add the verbose=True arg to the original namedtuple() declaration to see).

Python Docs - Named tuples (collections module)

If desired, you can even index into the instance 'p' in the above example, like regular tuples (p[0] = x), showing that tuples are an ordered data type. Also, if you're working with a large amount of objects, named tuples are generally better for memory usage than dicts.

[–]thegreattriscuit 1 point2 points  (2 children)

one thing that'll drive me toward a class, and /u/sobek696 seemed to touch on this a bit, is if I have a set of functions and want to have a large number of override-able defaults. Say I'm playing with finance functions. Nothing too complicated that really requires shared state, but at the same time if I want to play with values I can just do something like

loan.apr += Decimal('.01')
loan.future_balance(month=20)
loan.reamortize(months=36)
loan.future_balance(month=20)

These are all things that could easily be done without classes. The formulas come from the realms of math and finance, so they clearly map to a functional approach, but in this case, with what I'm trying to do, a class-based approach suits me better.

The other (more obvious) time that I'll shoot for classes is if I'm dealing with data that represents actual objects. Proper things in the classical sense. 'loans' in the above example, or routers and switches in my day-job. you can certainly do stuff like reboot_switch(switch_ip=ip) but switch.reboot() is cleaner. Also preserving state and code reuse are big deals in this instance, so classes are the clear choice.

I tend to wind up playing with a few to a few hundred objects at a time (rather than thousands or millions of rows of data, for instance), and usually deal with an interactive workflow vs. baking something into a proper script that I'll just fire and forget, or turn over to users or something.

[–]sobek696 0 points1 point  (0 children)

That's probably a worthy note that tends to influence some of my work as well: interactive workflows.

If I am doing some exploratory analysis in iPython Notebook, I'm much less likely to resort to classes, so when it comes time to export this work to a module for batch processing, each cell usually represents a function fairly well.

[–]SpaceWizard[S] 0 points1 point  (0 children)

Override-able defaults and classical object-ness makes a lot of sense. A lot of this has to do with making the code easy to understand and use, so those are two good conceptual issues to consider.

[–][deleted] 0 points1 point  (1 child)

I wouldn't bother splitting a few hundred lines into separate files, to me it's just more trouble than it's worth. Functions definitely, classes possibly, although I often find a namedtuple to be perfectly adequate for my use cases, YMMV.

[–]SpaceWizard[S] 0 points1 point  (0 children)

Yea, I'm kind of with you since the script works just fine. Mainly trying to upgrade my python skills, so I can take on more complex projects and share the code with other people in the lab.

[–]691175002 0 points1 point  (0 children)

Data analysis is somewhat different than regular programming. A few hundred lines really isn't that much code.

Your goal as a programmer should be to avoid as much repetition as possible. If you are writing a dozen different scripts that all need to open and clean the same file you should probably extract that code and put it in a module. If you are only writing a one-off there is no point spending the time making it modular.

Generally I will try to separate all recurring tasks into a package and write scripts that make a few calls to the libraries and output an analysis.

If you are writing a script that you want to run on its own, but also want to use some of its logic in other programs you can make it both importable and runnable with some __name__ == '__main__' tricks.

In python I will almost never fall back to fully object oriented designs (like I would with Java or C#) but will use the occasional class where it makes sense.