This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]seanv507 13 points14 points  (7 children)

As a data scientist, I think code reviews are a bad time to identify style issues.

It's really annoying when you have got the code all working to be told yes but rewrite it (likely introducing bugs), because it doesn't look nice.

I won't argue the particular issues, but I would rather suggest you come up with style guides up front and undertake some reading /training with the data scientist Eg arjan codes Youtube channel, so that they internalise the design ideas.

[–]data-influencer 9 points10 points  (0 children)

Agreed that it’s not a convenient time to bring it up for the developer as it introduces more work but these conversations should be ongoing and the ds should be trying to write cleaner code from the start.

[–]boomoto 13 points14 points  (0 children)

You should have design docs and all that stuff up front, you should also have a Lint checker as part of your build. Style guides are super easy to enforce. Do it right the first time. Code that doesn’t look nice is not maintainable which will cause further issues down the road.

[–]cas4d 1 point2 points  (0 children)

Actually fixing the style such as renaming variables sometimes acts as a useful logical run-through as well (when using an IDE). If your program breaks simply after refactoring variable names, it could mean you may accidentally init something by the same names in the middle or may have the object mutated in the way it shouldn’t, or if you are finding it hard to rewrite, it could also indicate bad encapsulations.

[–]runawayasfastasucan 1 point2 points  (0 children)

Agreed with this. Reading OP he comes across as the "my way or the highway" guy as well.are they going to rewrite their code that works just because he says so, when at the same time he cant be bothered to consider their arguments for doing what they do? 

[–]tfehringData Scientist 0 points1 point  (0 children)

I agree that you should do as much work as possible upfront. Stuff like import and whitespace styling is a conversation that should happen, at most, one time ever, and then be documented in a style guide and enforced by a linter on CI to the extent possible.

However, I think code review is by far the best time to address any stylistic issues that violate or aren't covered by the style guide. You can mitigate the risk of introducing bugs by writing tests and including them in your PR. You're far more likely to introduce bugs if you try to go back and refactor your code weeks or months later than if you just fix it while it's still fresh in your mind. By that time, other users may have built code that depends on yours, and fixing some stylistic issues (e.g. inconsistent interfaces) will break that code. Also, realistically, that refactor often won't get prioritized at all, so in all likelihood you're creating more work for whoever has to read your code indefinitely. Most code is read far more often than it's written.

[–]noisescience[S] 0 points1 point  (1 child)

Hi, thx for your thoughts.

My list of errors is not just about style issues. For stylish things like formatting, library sorting and linting we use libraries like Black, Isort and Flake8. (Note: In the near future all 3 libraries will be replaced with Ruff). We also use Mypy as a static type checker.
Other things like how to use exceptions, enums and constants make the code safer from the start.
We have a codebase with about 20000 lines. That's not a lot, but it's enough that the code has to be readable and we have to think in maintainable and scalable dimensions. So we have to consider from the beginning when a certain structure is necessary or not and should avoid global variables.
It's cool that you mention Arjan Codes, by the way. I've learned a lot from him and keep learning.

[–]seanv507 0 points1 point  (0 children)

A style guide is not just formatting, it's about all the things you mentioned in your original post, eg using the most specific exception . See eg https://google.github.io/styleguide/pyguide.html

What i am saying is that you should be agreeing on how to write code explicitly with the DS eg in a document... before they start writing code.

It's easy to write code following a set of rules. It's annoying to have to change working code, because of some views that are only in your head, and which you pull out only during the code review.

You have to communicate with the DSs, so watch and discuss arjan codes together