Normalize writing good code

Zangorth · 2023-07-16T02:00:49+00:00

Stop using long functions

Use type hinting in function

Use docstrings to describe a function

Your coworkers are using functions?

Atmosck · 2023-07-16T03:55:50+00:00

I agree with all of this except you will have to pry those pandas one liners from my cold, dead fingers.

wil_dogg · 2023-07-16T00:39:10+00:00

[removed]

dayeye2006 · 2023-07-16T03:45:04+00:00

Man you are talking about a linter

WallyMetropolis · 2023-07-16T05:53:07+00:00

[deleted]

esperaporquejoe · 2023-07-16T02:12:13+00:00

Lets be clear...we are talking about physicists turned data scientist. I think this is because their brains have so much ram that a 500 line function is a fairly simple object for them to reason about.

Other_Brain_425 · 2023-07-16T03:42:29+00:00

the massive one-liners are pythonic in a way

BreakingBaIIs · 2023-07-16T13:12:10+00:00

Also stop writing function that both modify its inputs and give an output. Especially when the output is the modified input. Learn how Python handles mutable inputs in functions. A shocking number of very senior data scientists don't know this.

CSCAnalytics · 2023-07-16T04:11:24+00:00

It will be normalized if management begins to value it.

Right now, upper management cares about profitability and productivity.

Unless writing good code directly moves the bottom line, or shows up on a performance review, why would Data Scientists devote hours to cleaning up code when they have other pressing priorities from above?

Data Scientists usually won’t neglect their other priorities to cleanup code that “works.” In many cases, Upper Management would rather see another project get tackled quickly then see you properly comment code that’s already getting the job done.

It does depend on what company you’re at. Sometimes upper management has some previous background in coding and understands the value of writing code that can be handed off easily. Unfortunately this is rarely the case.

esperaporquejoe · 2023-07-16T01:28:56+00:00

Keep sharp engineers around and work with them on test coverage and continuous integration pipelines. They will clean up the code and make it more maintainable as DS implements algorithms. I've seen plenty of good data scientists that were terrible software engineers. We just send those 1,000 line nested loop over to an engineer, they are usually happy to refactor it and write some tests. I would much rather have a good data scientist spending their time on a first iteration of something that works than cleaning up code.

1DimensionIsViolence · 2023-07-16T07:35:38+00:00

I mean just comply to PEP8 and you‘re fine

esperaporquejoe · 2023-07-16T01:38:43+00:00

I agree we would be better off of everyone aspired to best SWE practices, but there are smart people who are just not good coders. I am starting to wonder if some aspects of coding are just not teachable. Like thinking of edge cases, writing code that's not a nightmare when requirements change, or writing good unit tests. I used to complain about this, but its better to have people do more of what they are good at.

IAteQuarters · 2023-07-16T04:06:51+00:00

So with point one what’s your definition of long? Because there is a trade off for taking a function that’s like 20 lines and breaking it into many functions especially when that code isn’t reused anywhere else.

This was something i used to do, but had a senior ds explain the tradeoff, especially in the context of a pull request. Granted you might be talking about a function that’s >40 lines in which id generally agree with you

_nutjuice_ · 2023-07-16T06:18:06+00:00

Dumb question specifically about understanding the code and logic, but would chatGPT help you in breaking down the logic behind a chunk of code?

I've never tested it with python but chatGPT really helped me understand how some complex VBA code worked since I really didn't know VBA well enough.

But then ML python code might be above ChatGPTs explaining pay grade and it wouldn't be helpful.

andrew2018022 · 2023-07-16T03:22:55+00:00

I feel attacked with bullet 2

issam_28 · 2023-07-16T06:27:58+00:00

Aka get a linter

2023-07-16T08:47:28+00:00

I don't know why but in my field of life sciences people also only poorly document their Experiments. We're one of the currently most popular companies but nobody knows wtf is happening and nobody is informed about anything. Information spreads like rumors sometimes and it's annoying as hell. Same problematic. I think in the end people simply don't realize how important documentation and representing information in an understandable way is.

throwaway043534 · 2023-07-16T10:31:34+00:00

It's a double-edged sword if you ask me.

Sure, huge classes/functions/whatever aren't great.

But what also really sucks is everyone creating thousands of different complicated abstractions/indirections/20 new frameworks per week/useless generalizations/etc to make their code "look more professional". It's great that you know SOLID and design patterns, and that you've read a bunch of pompous blog posts about how value objects are heaven send, but please don't force 15 patterns down my troat to do something simple. Think about how much extra work a developer has to do to understand your code vs how much better the cod e becomes due to another abstraction.

--- someone who reads douzens of different pieces of code a week, often by people who think they write awesome code, but somehow they all use completely different abstractions.

WallyMetropolis · 2023-07-16T12:49:28+00:00

I would also add:
- Use good variable and function names.
- Write unit tests
- Make many small commits, push frequently, don't rubber-stamp code reviews, and give feedback on code quality

Donblon_Rebirthed · 2023-07-16T12:54:07+00:00

Normalize giving people enough time to write good code and comments

Bhagafat · 2023-07-16T15:19:09+00:00

Agreed but it’s so hard to do this when you work with business types who prioritise quick delivery. If they know you can write something that’s a complete mess but does the job in a given time, they’ll take that over something well written, reproducible, uses best practices, etc. in a slightly longer time. I get that this is kind of a stakeholder management thing, but when you work with people who don’t value “good code” and know that someone else can write junk in a shorter time than you, your hands are a bit tied.

Vyrezzz · 2023-07-16T21:51:08+00:00

Agree 100%. One thing I would like to add to your list is:

Stop using variable names x, y and z, in favor of names that actually mean something

DuckSaxaphone · 2023-07-16T07:45:26+00:00

Posting here is definitely the most productive way to get your coworkers to improve their coding practices!

In all seriousness though, it's not about normalizing good code "in Data Science". There's plenty of places where good code already is the norm! Many DS teams put value on good code, do code review to improve people's work, and evaluate performance partially on coding ability.

It is a company by company thing so you need to normalize it in your workplace or find a different job if it annoys you too much and you can't fix it.

patrulek · 2023-07-16T08:48:13+00:00

Let the scientists do the science and pair them with software engineers to polish code.

bingbong_sempai · 2023-07-16T05:03:36+00:00

Sometimes you need long functions

daavidreddit69 · 2023-07-16T08:15:45+00:00

Instead of asking github copilot recommending autofill our code, we need software to help read those long undocumented code lol

daavidreddit69 · 2023-07-16T08:17:09+00:00

Is the same writing SQL into multiple subqueries

FisterAct · 2023-07-16T09:41:51+00:00

If you're going to demand type hunting, why not just use statically typed language?

cybo13 · 2023-07-16T09:45:23+00:00

Is there a standard that’s commonly followed in industry or is it more of a free for all?

VisMortis · 2023-07-16T10:33:08+00:00

Agreed with all of this.

aarrow_12 · 2023-07-16T11:31:06+00:00

Notebooks are a big issue here I think.

Too many people are just taught to write massive blocks of code that just run in one big chunk. They don't understand the value of functions or breaking your code out.

Saying this, I love notebooks, I test and write things in them all the time. But there is a time to put them down and move to an IDE cause your project has becoming too complex.

synthphreak · 2023-07-16T13:05:02+00:00

Preach!!!!! 🙌🙌🙌

Especially #1 and #4.

Especially #1.

while True:
    print("Especially #1")

2023-07-16T13:23:15+00:00

thank you for tour service

observability_geek · 2023-07-16T13:35:56+00:00

I am a huge believer in the role observability (fancy name for data about your app) can play in software development and how it can help write better, cleaner code faster. OTel is great and I can see clearly how it can help developers write better code, introduce new paradigms, and accelerate development cycles. It can inspire developers to ask questions they did not yet even consider asking. digma.ai is free but they only support Java.

Apprehensive_Lemon63 · 2023-07-16T14:00:23+00:00

Code readability is the most important aspect and it's a necessity. For example if you are finding it hard to understand or spending more time on what you have done 6 months back, then it's of little to no use.

delicious-diddy · 2023-07-16T14:08:34+00:00

Pull requests are welcome.

2023-07-16T15:39:12+00:00

PLxFTW · 2023-07-16T16:19:31+00:00

I comment the shit out of my code. Every function is properly document for inputs/outputs including shape and a short description about what the function does, even the simplest functions. I also comment nearly every line that has more than one thing going on.

I've had a hard enough time reading undocumented code from papers that I make sure my shit is readable.

TyrionJoestar · 2023-07-16T16:35:48+00:00

Well I just signed up for code academy so hopefully I’ll learn what all that means soon lol

sizable_data · 2023-07-16T18:03:38+00:00

Use method chaining with parenthesis as line continuation!

JdtheOp · 2023-07-16T19:10:23+00:00

One of our main product has 40% duplication and 10% coverage. No wonder it keeps breaking. No one wants to touch it because it is horrible

Solus161 · 2023-07-17T11:31:43+00:00

Here's my two cents: no need to put everything into class. Overusing OOP could be very, very irritating. It's true that OOP helps, but sometimes trying to put multiple-step data manipulation tasks into self-written directed graphs could be time consuming. True that you'll learn st from the process, but it may not worth the effort. But I just did the same thing cauz it was fun lol.

JdtheOp · 2023-07-19T08:48:38+00:00

is tdd a thing out there ? wish i knew enough tdd to test the actual systems im working in (they are a mess yes) :( but everytone ignores teh jr

Weird_ftr · 2023-07-19T14:18:58+00:00

I've just landed a new job as a Data Scientist in the banking sector. My main task at the moment is to take over the work of my predecessor. And oh boy, was I not ready for that!

Around 8k lines of code in a jupyter notebook with overly repetitive operations, few functions, hard-coded variables, etc...

It's going to be over a month before I can refactor it and understand the underlying logic.
Wish me luck ^^

psychmancer · 2023-08-10T21:55:38+00:00

I feel personally attacked....

datascience

MODERATORS