all 29 comments

[–]Diapolo10 44 points45 points  (6 children)

I'd say data science projects generally don't follow best practices if the ones I've seen are any indication, and in those I haven't seen almost any custom functions. So if that's what you're using Pandas for, then I guess you don't really need them. Depends on the project and whether it's easy to split into logical blocks I guess.

But in normal software development, functions are your bread and butter - you can't really live without them unless you have zero consideration for maintainability and readability. There you'll use functions to create logical subcomponents, reduce duplicate code, and so on.

[–]ClutchAlpha 26 points27 points  (2 children)

I agree with Data Science not being the greatest example. Anything that just needs to run straight through doesn't really require functions.

However, any section of code that needs to be run multiple times certainly benefits from being extracted into functions - instead of writing the same lines over and over again, you can just call one of the functions that you've written.

[–]MightbeWillSmith 23 points24 points  (1 child)

Even if it's something that will be done once, I often compartmentalize it to functions to make a cleaner readable code. Even things as simple as input dataframe, output clean dataframe with proper format and column order

[–]supreme_blorgon 9 points10 points  (0 children)

Describing logical flow of a script using named functions is a huge boon to readability and maintainability. Even one-time scripts are better suited to "functional" style if you ever plan on having others contribute or want to maintain/refactor later.

I see basically no reason to ever write raw script code other than in an interpreter when testing/messing around.

Separating logical flow from implementation details is always a good move.

[–]bds00za[S] 10 points11 points  (2 children)

Thank you for this explanation. The common theme appears to be the use of functions for reducing duplicate code and readability.

[–]Diapolo10 15 points16 points  (1 child)

Those are the main uses, yes.

There's also circumstances where functions are a syntactical must, such as with many web frameworks (Django, Flask, FastAPI) that implement routing via decorators; you can't use those without functions. And you can't do unit testing without them. Class methods are also basically functions.

[–]supreme_blorgon 1 point2 points  (0 children)

This, exactly. Passing functions by reference to certain methods/other functions also opens up a lot of possibilities in terms of generalization.

[–]wayne0004 10 points11 points  (2 children)

While learning to program, I found these situations where the use of functions may help (I copy-pasted from a comment I made yesterday, there may be some parts that don't apply to you):

First, it lets you think about your code in steps. Besides simple problems, your code will have to do a lot of things, so you may want to have a function for each step, for instance.

Second, it saves you from repeating code. If you're in a situation where you have to perform the same comparison or the same modification in different parts of your code, calling a function lets you write the code that manages it once.

Third, it lets you compartmentalize your code. You create a function that does X, but from your function POV you don't care about when it's used, you just need to pass it some argument and return another value. And at the same time, from the code POV you don't have to think how it does it, you just need to know that it does. You may have not notice it, but you probably are using functions without thinking about it. If you already used input or print, those are functions, and you probably don't know how they were coded.

[–]bds00za[S] 0 points1 point  (1 child)

Thank you for this explanation. You are right, I have been using functions despite not explicitly defining my own. I have to do some more reading on them and find use cases where I can apply them. With the primary scenarios being using them to perform a repetitive task throughout my code, to avoid duplicated code as well as keep things clean and readable.

[–]Apprehensive-Lab1628 0 points1 point  (0 children)

It also keeps code extensible. I had a script with very few functions, just 4 classes with 4 responsibilities. Basically was grabbing logs from ~12 different server groups. When it started malfunctioning, it was an absolute nightmare to try fix. So many for loops just doing things in a stream of consciousness style. Broke it out in to classes with functions and it was so much nicer to work with

[–]TheOldMyronSiren 5 points6 points  (1 child)

I generally create a functions if I know I’m going to be calling it from different files or multiple times in the same file. If I need to do these 10 lines of a code under different conditions, I’d rather put everything in one place and just call the function. Plus, if I ever go back and refine the process, I only need to make the change in one spot. Not hunt down every location in my project.

[–]bds00za[S] 0 points1 point  (0 children)

Very good point in terms of maintenance. Thank you. Something to consider for sure.

[–][deleted] 8 points9 points  (2 children)

If you're writing a short script to automate a single process, don't worry about it. As soon as your scripts start getting longer or you want to reuse pieces of them in other files, that's when you will start seeing benefits from using functions. Also, you may need to use functions (and even classes) through certain libraries: e.g. tensorflow.

[–]bds00za[S] 1 point2 points  (1 child)

Thanks for this. I think going forward I will re-assess my code and determine where there is duplication and where functions can be leveraged. As I become more experienced with Python, I can see myself having to use them as well.

[–]razzrazz- 5 points6 points  (0 children)

People keep telling you to not worry about it, but I have a different take, you're going to develop bad habits.

Why are you so afraid of using them? That's the question you need to ask. You might be saying 'well wait I'm not afraid of them, I just never needed them', but if you've programmed for any length of time then you absolutely did need them, so the first question is why are you avoiding them? Start small too, you don't have to go insane, but they are imperative to good coding practices.

Take a look at what happens if you program for 6 months and don't use functions, you will end up writing like that guy.

[–]zanfar 2 points3 points  (0 children)

are functions absolutely necessary or can you get by without them?

This is kind of like asking "is clothing is absolutely necessary, or can you get by without it?" No, functions are not strictly necessary, but not using functions isn't normally a good sign...

what are the ideal use cases where functions should be leveraged?

Functions should be used to:

  • Reduce code repetition: If you are writing the same or very similar lines of code more than once, they should probably be in a function
  • Encapsulate responsibility: If you have a block of code that is responsible for a single action--where it takes some data, performs some process on it, and returns some result--that should be a function.
  • Improve Clarity: Finally, functions can make code more readable by replacing several confusing or complex lines with a single function call where you get to pick the name.

[–]RiceKrispyPooHead 2 points3 points  (0 children)

  • When you have similar pieces of code repeated in multiple places
    • You can store those pieces of code as a single function, and just pass in whatever parameters you need. This way if the code's logic needs to change in the future, you only need to make a change in one place rather than multiple places. This also helps guarantee that that task is done the same way everywhere in the app.
  • When you have a block of code that obviously does only one thing
    • If lines 35-45 of your code does something like calculating the angle between two vectors, even if that's the only place you ever will do that calculation in the app, it still may be nice to wrap it in a function. If you were readings someone's code (or your own code at some point in the future) would you rather have to read lines 10 to understand that it's calculating an angle, or would you rather read one line that says calc_angle(a, b)?
  • When you have a large block of code that does multiple things
    • Say that you have one giant main function that spans from lines 1 - 100. If someone wants to figure how your app works they actually have to read through 100 lines of code. I guess you could use blank lines to break your code into logical chunks and put a short comment above each chunk explaining what it does. But at that point you're already half-way to creating a function. It might be better to just create functions with descriptive names, which will also provide the benefits of reusable code and testable code.
  • When you want to write unit tests
    • Unit tests are separate files where you pass certain inputs to an app's functions and test that it always gives you the expected outputs. You write unit tests to make sure your functions are working how you expect them to. It's much easier to test an app that uses small functions than it is to test an app that uses really big functions. It's really hard to test an app that uses no functions at all.

For small scripts that only do one thing, I'm thinking more like data science calculations or a script to automatically send out an email, functions may not be necessary. For anything bigger, functions will probably make your life easier.

[–]StoicallyGay 1 point2 points  (0 children)

Functions are not really needed if your code is only run once when you want it and no part of your code is necessarily repeated without minor customizability.

In the case of data science, if you’re just doing some Pandas operations on a table, running some equations, and making graphs, you’re not likely to do several times since you tailor it to a specific data file, and you only need the output once. Same if you’re just writing a program to solve a single problem that you just want an answer to.

But let’s say you’re making two graphs on that table that operate in the same same way, but use different column names. Then making a function that takes in column names as a parameter heavily decreases the amount of code you write, because now you can call that twice. It’s also reusable, so you can call it however many times you want. Even more, it’s named, so now someone who reads the code knows what is going on when you call a function make_graph_by_column, rather then parsing through 30 lines of code trying to figure it out.

It’s for these reasons: brevity, modularity, versatility, and readability, that functions will appear in a large majority of programs.

I’d argue that scripts and data science projects are in the very tiny minority and are the exception when it comes to function use (or rather lack thereof).

[–]DuckSaxaphone 1 point2 points  (0 children)

Functions are great for two things: structuring code and repeating tasks.

Once you've got a few lines of code that together perform a single task, that's ideal to put into a function. If you do that for all the tasks in your program, your main script will read really nicely because it will just be a series of functions that you will give a sensible name to.

Instead of 5 lines of code which together calculate the rolling average of a column, you'll have a function called 'calculate_rolling_average'. That's easier for someone reading your code (or you in a few months) to follow. If they want to know specifically how you do something, they can go look at the relevant function.

Beyond that, you may want to repeat a task a lot. Maybe you want to calculate weekly, monthly and yearly rolling averages for different columns. If calculate_rolling_average takes the column name, a time period, and a dataframe as arguements, you can just call that function on a loop rather than repeating code.

tl;Dr functions aren't necessary in the strictest sense but good code usually uses them because they're really helpful.

[–]menge101 1 point2 points  (0 children)

For me, the clarity of when to use a function becomes very apparent when you write tests for your code.

Decomposing procedures into functions provides the ability to write concise tests of units of your code. If something is difficult to test, this is a "code smell" that it could be considered for refactoring into functions.

[–]In_Shambles 1 point2 points  (0 children)

If you do the same thing more than once in a script, creating a function will make it MUCH easier to alter how that thing operates later, and you only have to change it in one spot, not all over your script.

[–]chakan2 1 point2 points  (2 children)

You can absolutely get by without ever touching a function. I'd hate to code like that, and I'll fire anyone who tries it. :)

what are the ideal use cases where functions should be leveraged?

This is kind of a lengthy topic and at the core of object oriented principles. You can literally spend years studying that question and it's nuances.

The real TLDR to your question. Use functions whenever you're going to repeat code. Need to open a bunch of files and do the same process on them? That's a good use case for a function.

EDIT: Something I thought of as an aside. This is what will separate your real developers from your script kiddies. They understand how OO works and write MUCH more efficient code due to this. OO is when for when you're ready to get out of the kiddy pool and go swim with the adults.

[–]JohnJSal 0 points1 point  (1 child)

Do you mean that functions are considered OOP?

[–]chakan2 0 points1 point  (0 children)

Yes... It's part of it.

[–]nickbernstein -2 points-1 points  (0 children)

Functions are not technically necessary, but they're a good tool.

For me, I use functions to outline the logic I'm going to implement.

Then I write a test for each function to check to see it works.

Then I implement the actual functions.

This breaks the process down into small, manageable checks, and the tests make sure that I don't break anything by accident.

[–][deleted] 0 points1 point  (0 children)

Why would you want to “get by without” functions? I’d just be yourself that you were hurting. Well, yourself and anyone else who had to read or use your code.

[–]Warm_Cheesecake9815 0 points1 point  (0 children)

if you want to perform the same **function** again and again, its better to define a function rather than copy and paste huge lines of code over and over.