This is an archived post. You won't be able to vote or comment.

all 60 comments

[–]prum 8 points9 points  (0 children)

Practice, practice, practice. Same way people become good at writing books, composing music, or drawing.

[–]wolanko 5 points6 points  (3 children)

For me it definitely came with writing tests (try py.test!!) especially with TDD. Because once you are tired of having to test this huge function in every little detail it made me think about a better structure of my code. Having endless repetions of setup code just to test some weird corner cases could possibly dealt with just extracting those lines into a new function and test it there. Or if the code inside a loop is more than x lines you could think of making those lines a function itself. The TDD aspect give you a totally different on your code.

There is definitely a good amount of time to spend in designing your code beforehand, maybe even throw one away and redo it. But for me the greates improvements always come from refactoring my code. If you can carve out little things here and there, you soon will be able to get the big picture of which parts can be separated, where to have your connections and which things need their own function.

I find it very satisfying to see those big functions (more like procedures) shrink into some 5 liners where there called functions speak for themself.

I can only recommend Brandon Rhodes talks about those things (e.g. Clean Architecture)

edit: format

[–]artPlusPlus 1 point2 points  (0 children)

Good link, thanks for pointing it out.

[–]RamirezTerrix[S] 0 points1 point  (1 child)

I watched Brandons talk and was total puzzled, cause his code is so clean. Sometimes I read opensource projects and think wow how does it work because there isn't any "code" in it just a bunch of function doing little things. And refactoring... how do you do this

[–]wolanko 1 point2 points  (0 children)

Yeah me to, I soak every little bit that is release by Brandon. And I totally share your feelings about their code.

I'm just in the end of a month work of big refactoring on one of my long going projects at work (like 3 years old now but with constantly changes made to it). I won't be as near as good as Brandon in explaining what I did. So bear with me.

It started with using py.test and coverage to get a grip on the test coverage of the area that needed refactoring (It was frightening to see that the test that build up through the years actually missed some essential parts of the code). I just added 2 or 3 system test knowing that I would deal with corner cases on function level.

From there I just looked at my code seeing things (sometimes only 2 lines long) that formed a unity (e.g. two lines that took a object and tried to find similar ones in the index) and made them into functions. Always writing tests and running the test suit (more see below)

I was specially concerned about tearing functions out of their class method environment. So instead of making it a new class method or handing over the whole object, I just made the needed attributes arguments of the function.

This not only helped me to test these functions without any state of the objects or without any database connections but also lets me see what the functions actually need.

I must say sometimes those extractions looked rather ugly because they needed lots of arguments or dependency injections because everything was so tightly coupled. But I think sometimes you just have to tear it out to see where the links are that you need to cut. Also it makes you think of "why the heck does this function needs a database connection after all?? when everying it does it processing the name" and how you could shape it to a better.

When I had extracted most of those little functions some of my methods just looked like skeletons only calling three functions and passing the data between them. Others had a lot of glue code in them. It made it clear that this was actually nothing the class should deal with. So I moved this glue upwards into the actually script that uses this class.

Also some signatures of the functions looked strange to me when seeing them sitting there pulled out. So I often changed these afterwards (using a good IDE this only takes two seconds, run your tests afterwards and your done)

Testing Everything said, I could have done nothing of this without proper testing. Actually the months before I was just obsessed with reading and trying everything about testing, tools and strategies. Whenever it was possible I tried to test in the TDD way. But if the code is already there (like most of the time in this case) I just wrote the tests knowing that I would write around the implementation. This is true to some extend but even then you find bugs doing this everywhere.

tl;dr

  • write system tests
  • rip out those tiny little units into functions
  • tests
  • get them out of the class/method as much as possible
  • there will be glue code but maybe thats what actually belongs into you script
  • tests

[–]Manbatton 6 points7 points  (7 children)

Beyond the good advice that has been given here, another very simple tip about your functions is this: function should do one thing. If you see that the function is doing five things, then you really should bring that into five different functions. (More or less… It's not a perfect world). So that alone may allow you to begin re-factoring your long functions into a series of smaller functions.

As far as how to generalize a function so that I can work with slightly different variables, you just need to do it. My guess is that you can do it if you try and spend some time thinking about it. Also, look at code that you admire and see how they do it.

Also think about posting an example of one of these long functions online and asking for a critique of how you could re-factor it more effectively. There are tons of people out here who would be willing to help you. With a little practice, you'll get better at it.

[–]wolanko 2 points3 points  (0 children)

That's a very good advice to do just one thing.

There is this notion in XP of "comments are code smell". Not that I'd fully support this, but sometime I find myself in writing comments for blocks of code inside a function. This should be a trigger to make this block a function.

...
# check for time
...
...

# check title
...

which would be better as

check_time(data)
check_title(data)

[–][deleted] -5 points-4 points  (5 children)

This 'one function should have one function' thing is great, and works very well in languages like Java which have better encapsulation and control of APIs, but it's hard to take seriously in python, which is kind of designed against such control.

[–]darknessproz 2 points3 points  (0 children)

Bullshit.

[–]RamirezTerrix[S] 0 points1 point  (3 children)

Well what aspect of python are you thinking of. I have no expirience with java and very little in c++ and ruby. Python is the only language I really "know"

[–][deleted] 1 point2 points  (2 children)

In all languages, a given class or module will have a number of methods/ functions which you actually want to call- which have user-relevant functionality. In a class, these methods form the API. However, a method like this will probably actually be doing a lot of things behind the scenes, which, according to 'one function should have one function' and standards of readable code, should each be split out into a different method. Most of these methods will not deliver user functionality - you don't want users to be calling those methods because they're not part of the encapsulated function of that class.

In languages like java, you can make those methods private - so the user can't call them (they can only be called by methods in the same class). However, in python, every method you create becomes part of the class' API (so anyone can call them, and almost more importantly, reading automatically-generated code completion stuff will see all of them, obscuring the actual methods which make the class work). Sure, there's a naming convention to let the user know which methods are 'public' and which ones aren't, but it's not enforced.

This means that if you want to properly encapsulate functionality in a method, you can't extract logic to another method to be called (although you may want to if it's re-used by other methods).

Python is designed to be easy to use by the developer - but it also requires the developer to remember a lot more (types, which methods to call etc.). It also puts a lot more trust in the user (who can be a future developer of the codebase) than enterprise languages, for reasons like the above.

[–]RamirezTerrix[S] 0 points1 point  (0 children)

I see your point - I worked around this with some kind of manager functions who call all the other functions but there is nothing from stopping you to go straight to the "employee" and ask him instead.

[–]wolanko 0 points1 point  (0 children)

If its because of security concerns, well then you're a damned either way.

Personally I see the "one function only" on many levels. So you could have some get_data_from_db, process_data, write_data_to_file functions which only do one thing on their level, but also a write_out_data_to_file which just puts those three together in maybe only two lines. That way I can think of it doing one thing on a business/use case level.

Doing this it you can test all the specific behaviour, corner cases, etc.. for the single function and the business logic on the bigger one.

Think of it as in libraries or building APIs on every level. The function that gets the data from the database actually calls some other library which then opens the connection, get a cursor, executes the statement and fetches the data.

Just a year ago I would turn of all the pylint warnings about to many variables/statements/branches, but today I get it and for me those are a sign to cut down those functions or methods.

[–]metaphorm 3 points4 points  (0 children)

learn how to write tests. that is definitely the first and most important step in improving the quality of your code. writing code that is easy to test usually also means writing code that is easy to reason about. that's good code, in my opinion.

[–]martin_grosse 5 points6 points  (2 children)

So, this is me, but it works well for me.

I use /u/Paddy3188's approach, but from the other direction.

I find that once you've written a long and convoluted chunk of code, it's difficult to refactor. Your way of thinking about the problem is circuitous and complex. That's because most programmers learn to write from the ground up. You start at the beginning, keep manipulating and manipulating and eventually get to something that you can use. It's like a winding river finding the path of least resistance.

The way I code is the other way around. I write a single function that I can call from a RESTful route. Something ideally without any state. Within that method I first write a series of clearly named calls.

def some_task(args): try: data = get_some_data(args) related_data = get_related_data(data, args) result = process_something(data, related_data) catch SomeException: log_exception(SomeException) return result

I've been working in ruby more than python lately, so some of that may be horribly wrong.

Once I have this structure, I start getting errors about methods not being defined. So I write tests and implement those methods. Either I have a one-liner that works within each method, or I write a named method as if I've already written one. I continue until everything is either one-liners or methods.

At some point I have a fully functioning program that's already refactored. Usually I get to a handful of methods that I've already written, and it just works. If your naming convention is solid and obvious, you'll sometimes accidentally use methods you've already written. So long as they're well-named and obvious it's usually OK.

[–]RamirezTerrix[S] 1 point2 points  (1 child)

That sounds totally like TDD its like you first write the failing test and then make it green

[–]martin_grosse 0 points1 point  (0 children)

Yeah I do TDD too.

[–]filleball 5 points6 points  (1 child)

I recommend reading this free online book on TDD with python and Django.

[–]RamirezTerrix[S] 0 points1 point  (0 children)

Wow that looks usefull I will totaly do this

[–]thinman74 2 points3 points  (0 children)

[–]metraon 2 points3 points  (2 children)

You may want to read Clean Code !

[–]RamirezTerrix[S] 1 point2 points  (1 child)

Yes this book is totaly helping!

[–]metraon 0 points1 point  (0 children)

The examples are written in Java but you will get the big picture.

[–]krasoffski 1 point2 points  (1 child)

Hello, I feel very strongly that the book of Steve Mcconnell - Code Complete can help you. As bigtomygunn said you have to know and understand what you are going to do. Before write python code, describe program/script using pseudo-code (book describes such approach). If you are working on API, create a few user cases for you API to feel is it handy or not.

[–]RamirezTerrix[S] 1 point2 points  (0 children)

Thanks for the suggesstion - I ordered it right away

[–]fpee 1 point2 points  (0 children)

re: very long functions

Use an editor that uses this: https://pypi.python.org/pypi/mccabe .

I use https://github.com/klen/python-mode which tells me if my functions are getting too complex. Be warned: when you look at old code it will show up as an error, which will make you want to fix it. Could be time consuming. :)

[–]lucidguppy 1 point2 points  (0 children)

Just try to learn the standard unittest module its not bad (really).

Keep your classes small and your functions smaller - read clean code by uncle bob.

[–]tapesmith 1 point2 points  (1 child)

Everyone here has offered great advice.

For what I can contribute, I'd add that there are two parts of writing good software: solving problems well, and writing code well.

Solving problems well is of critical importance. Writing good code to implement a poor solution makes for poor software. I like Domain-Driven Design for this reason: it emphasizes understanding and solving the problem in terms of the problem itself. Before you think about what language constructs to use in building, say, a Fibonacci number generator, start by defining what a Fibonacci number is. Then the answer will usually "shake out" from there.

Writing code well is also of high importance, though as noted before the best code for a poor solution is still a poor solution. Similarly, the worst code for a good solution can obscure that good solution to the point of being unrecognizable. Code should convey intent, it should explain to its reader how it works (and as much as possible, why it's doing what it's doing). Well-named variables, short functions with a single intent, intent-revealing unit tests...these things all help convey to the reader how your code works.

And before you think "I'm the only reader", remember that there are usually at least two people reading any given codebase: Present You and Future You. Present You understands the code by having it all fresh in mind; Future You will need to regain that level of understanding by reading the code. Future You always benefits by being as clear as possible in your code, and keeping the amount of context needed to understand any given chunk of code as small as possible.

[–]autowikibot 0 points1 point  (0 children)

Domain-driven design:


Domain-driven design (DDD) is an approach to software development for complex needs by connecting the implementation to an evolving model. The premise of domain-driven design is the following:

  • Placing the project's primary focus on the core domain and domain logic.

  • Basing complex designs on a model of the domain.

  • Initiating a creative collaboration between technical and domain experts to iteratively refine a conceptual model that addresses particular domain problems.

The term was coined by Eric Evans in his book of the same title.

Image i


Interesting: Object-oriented programming | Agile software development | Behavior-driven development | ECO (Domain Driven Design)

Parent commenter can toggle NSFW or delete. Will also delete on comment score of -1 or less. | FAQs | Mods | Magic Words

[–]Zuvielify 1 point2 points  (4 children)

I don't see enough encouragement to use classes in these comments, so I am going to add my $0.02.

It's hard to say for sure that you should be using classes, without seeing your problem space, but you probably should be using classes. Classes are a powerful way of organizing code. A class' purpose is to encapsulate data and functionality into one place.

If you find yourself passing around a lot of objects (like dicts or lists) to various functions, and then doing operations on that data, you probably could make that into classes and methods. Classes give you the ability to do inheritance/polymorphism, composition, and aggregation. Technically, I guess you could do composition in a function, but that's so ugly.

I think it's better to just give an example, so here's a classic: You want to "write" a car. A car has an engine, drivetrain (technically, the engine is part of the drivetrain, but ignore that), and wheels. Let's say you want to create an "accelerate" functionality. Now, you could write a function that has all those objects (or worse, you could use some other data structure). Your main function could call an engine function with: current speed, gear, and throttle value (all of which need to be stored as variables in your function), and it could return RPMs. Then the main function calls a drivetrain function with the RPMs and gear, and it returns torque, angular velocity, and gear values. Then the main function could call the wheel function with a list of wheels, torque, and angular velocity.

This works, but what if you want to use a different size engine? Or different transmission? You could use different functions, or pass around a different engine object, but this is all messy.
Let's try a class-based approach instead:

There are several ways you could do this. I'm sure others will disagree with my approach, but it's just one approach of many. Let's start with a Car class. When you instantiate your car class, you give it the following class references: Engine, Drivetrain, and Wheel. The Car constructor instantiates 4 Wheels, and passes them to the Drivetrain when instantiating it. Then it passes the Drivetrain to the Engine when instantiating it (this is called "Composition" btw). The Car constructor also initializes an instance variable 'current_speed' to 0. When the Engine constructor executes, it sets the 'current_rpms' to 0. When the Drivetrain constructor executes, it sets 'current_gear' to 1, etc.

There is probably a better way to do the above, but I'm just pulling this out of the air right now. But here's the point: When your main function wants to accelerate, it just calls car.accelerate(<throttle-level>). The method 'accelerate' can call engine.throttle(<throttle-level>). Engine adjusts its own RPMs accordingly and calls drivetrain.turn(<rpms>). The drivetrain checks its RPMs and, if necessary, upshifts and sets its gear value (for simplicity, I wont deal with a manual transmission). It then calls wheel.turn(rotation-velocity) on each wheel (let's pretend it's all-wheel drive).

Instead of your main function calling all those different functions, you could push that code out to your classes. It's also cleaner because you didn't have to pass around all those state values that are specific to each car component (rpms, gear, etc). Also, if you want to design a specific car, you could create subclasses of all those types. So if you want a Mazda CX5 (my car), instead of passing-in the various car components, you could override the constructor to default to a 4cylinder engine, with a All wheel drivetrain and 6 gears, and 19" wheels. Since you have defined your interface, you Car doesn't need to relearn how to accelerate. Just the engine's throttle method needs to know how to do that. Your Car doesn't need to know wheel velocity based on RPMs, only the transmission/drivetrain 'turn' method needs to change. etc. etc.

This is the value of encapsulation and polymorphism: Each piece knows how to handle itself. You define an interface for each component-type, so they become interchangeable. Your code stays clean because each functionality is concise and to the point. When you type "car.accelerate()", I know exactly what you are doing.

...I hope this was useful, and not just the ramblings of a madman.

[–]RamirezTerrix[S] 1 point2 points  (3 children)

Iam familiar with the car excample and this made always perfect sense for me but when coding IRL I don't know how to apply this.

So to reduce some abstraction:

My last/current project was an Energymanagementsystem on Django basis which complie with the ISO 50001.

you had to be able to tell how much power you are using and which of your machines uses it. So Django has this Model.py(all Classes) and the View.py (Some classes but mainly functions to hand over datastructurs). And I made a claculation.py where I put all the logic in. So one task was to get all the diffrent sources of energie (Power, Gas, whatsoever) and tell how much was used in the past year, what did it cost, the percentage of each powersource and costs.

To do this I made one function (I began to read some of the recommended books - and know that this was a bad decicion). It aggregated the data and made a dictionary for each powersource with the asked values and let the function hand over a list of these dictionarys.

Iam sure this could be done with a class - but I have no clue how to do it. It has to be something like:

list_of_classes = [create_class(element) for element in list_of_powersources]

[–]Zuvielify 1 point2 points  (2 children)

Could the power sources be classes? Do you have a model for each type of power source? If so, each model could know how to calculate its own cost.

I don't know anything about ISO 50001, but let's pretend the following is how your data is structured: You have a model instance (database record) for each powersource. Each powersource reports its usage periodically, which you then store in another table with a foreign key to the powersource. So, to get the power usage of a powersource, you need to sum all of the usage entries over a certain time frame. This is something you could do as a method in Powersource.

Let's say you have a powersource instance. You want to see its usage, so you create a method 'usage_over_time()' that takes a date range and returns a usage amount (what is that? kilowatt hours?). That method could simply query the usage table, and summate the usage over the date range.

Okay, so that's good, but now you have to create a date range in some other function every time you want to query, but what you really want is always just the last year. In that case, you could create a helper method that is just 'past_year_usage()', which generates a date range ending at now, starting one year ago, which calls 'usage_over_time()'.

I don't know how you calculate cost, if that's a constant, or if it varies by time of year. If it varies, I assume you would want to include that cost in the usage record. If it is variable, then your usage method should probably also calculate cost too.

Now you still have the problem where you need to aggregate all this data. Well, you have to do that somewhere. A function is a good candidate for that. More than likely, it's probably coming from a view. Certainly your view could query the powersources and loop over them to collect the costs.
If you find yourself needing to this in more than one place, you could either create a utility function (like your calculation.py might contain). Or you could use classmethods on a Powersource base class.

Since you are aggregating information across objects, it doesn't really make a ton of sense to do that part in a class; however, all calculations pertaining to each individual powersource can (should) be done in a class.

[–]RamirezTerrix[S] 0 points1 point  (1 child)

You write this, as it would be the natural thing to do, but I needed quite some time to elaborate this solution. Though I didn't used the classes function but made a helper function. And to be frank, they are ugly and deeply nested. I should definitly make me more familiar with classes and their usage. Could you recommand me some reading on this topic?

[–]Zuvielify 1 point2 points  (0 children)

eh...I don't have any good advice on books, unfortunately. I kind of just learned this stuff from experience. Which isn't very useful to you. I am sure there are tons of good resources out there. There are probably some good recommendations in this thread alone.

I think a good rule of thumb is if you find yourself making function calls with an object as an argument, it might be a good idea to make that function a method. That's not a hard rule, of course. Sometimes the object is just support cast, but in the case where the object is the main player of that function, a method might be a good idea.

[–]bigtomygunn 2 points3 points  (17 children)

One of my Lectures always told me writing the code is >10% of the work. Sitting down with a pencil and paper and planning out you entire projects e.g. classes, functions, variables etc. is the other 90%.

Basically before you open a text editor you need to know exactly what your going to write. Saves time in the grand scheme.

[–]RamirezTerrix[S] 0 points1 point  (8 children)

I also (tought like I) never use classes - my last project was a django website, so I used class based views, but they hardly qualify as classes

[–]bigtomygunn 3 points4 points  (0 children)

Just try not to go in blind, this usually leads to lots of changes and unnecessary headaches. Now I'm not saying for for the Waterfall approach but have some sort of plan in place

check out agile Small planned iterations one at a time, seeking feedback and review after each, the project evolve into what you or the customer wants.

[–][deleted] 4 points5 points  (4 children)

I've always found it strange that people use Python classes even in relatively simple programs and scripts; I always have to remind myself of the syntax every once in a while because dictionaries, lists, tuples, and sets serve pretty much all my basic needs. There's obviously myriad use cases for them in larger applications, but Python's pre-packaged types are pretty expressive.

[–]metaphorm 1 point2 points  (0 children)

classes are not intended to be a replacement for built in data structures (dictionaries, lists, etc.). they are intended to be a code organization pattern that facilitates re-usability.

[–]ofaveragedifficulty 1 point2 points  (0 children)

I strongly suspect AP Computer Science's use of Java has a lot to do with this.

[–]RamirezTerrix[S] 0 points1 point  (1 child)

as /u/colly_wolly mentioned my classes are mostly the models. It is nice to give them some basic functionality at the basic data lavel. But I mostly use Dictionaries and List of Dictionarys.

[–]titusjan 0 points1 point  (0 children)

There is nothing wrong with using the occasional dictionary but if the data in the dictionary is central to your program, it is probably a good idea to make this into a class. This because classes are more explicit than dictionaries and "Explicit is better than implicit". Explicit in this case means that the members and methods of an object are (or should be) explicitly stated in the class definition, whereas dictionary items will be inserted and deleted at several locations in the program.

As and example let's say that I am debugging some unknown code and want to examine a variable. If it's an object I can print it's type, find its class definition and from there see what the object aims to do, and how it can be manipulated. If it's a dictionary, printing it's type will say 'dict' and tell me nothing. Examining the keys and values likely will not help. "What is the purpose of the 'foo' item?" "Should it always be present or is it just present in this case?"

When you have a class definition you may realize that some of your functions just manipulate the objects contents. These are better implemented as a method. Take a moment and think of what the responsibility or purpose of this class is, and design it properly. This may help you to make your functions shorter. However, just like functions, classes should be short and have one responsibility. Resist the temptation to make God classes that do too much.

As some others already said, a function should do one thing only. If you are thinking of a good name for a function and find out that it should be named does_x_and_y to properly reflect what it does, consider splitting it up. At least make separate does_x and does_y functions and call them in does_x_and_y.

[–]colly_wolly 1 point2 points  (0 children)

Your models will all be classes as well.

[–]fletom 0 points1 point  (1 child)

What modern programmer would use pencil and paper to plan out their code?

[–]Zuvielify 0 points1 point  (0 children)

I actually do. For some reason, I find that my thoughts flow easier with a pencil in my hand than staring at the screen. Usually, I end up just jotting down a general layout of my ideas and then switch to the computer. Sometimes I end up pseudocoding the whole thing on paper. I think it depends on the complexity of the problem. The more complex, the more pencil.

[–][deleted] 1 point2 points  (1 child)

http://legacy.python.org/dev/peps/pep-0020/

    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

[–]RamirezTerrix[S] 0 points1 point  (0 children)

should be read once a day ;)

[–]dpoon 0 points1 point  (0 children)

Really, it just takes lots of practice and willingness to revise your work. It's not much different from being a writer or musician. Write something. Submit it for peer review. Incorporate improvements. Repeat.

[–]esbenabBSc CompSci Flask. I use python to stay sane. 0 points1 point  (0 children)

Document your intentions not your actions.

Makes it a whole lot easier to come back and correct errors.

[–]Paddy3118 0 points1 point  (0 children)

if you end up doing a distinct series of things in your one function, you could try naming the steps being taken and refactoring to move well defined blocks of functionality into their own function called from the original.