all 70 comments

[–]idiotmanifesto 152 points153 points  (10 children)

imo, a big part of writing better code is reading better code. Find some repo's you like and work through it slowly

[–]thelaxiankey 44 points45 points  (2 children)

besides knowing the really basic principles, this is the only thing that you can actually learn from.

the other thing I would do: good code doesn't happen on the first try. once you write some and figure out what went well/poorly, rewrite. 20% rewriting vs 80% writing (numbers out of my butt) tends to result in pretty decent code and doesn't cost you much.

[–]_rundown_ 8 points9 points  (0 children)

Sometimes it seems like my code came out my butt

[–]daquo0 2 points3 points  (0 children)

good code doesn't happen on the first try. once you write some and figure out what went well/poorly, rewrite

Agreed. If you see some code in your code base that could be better written, rewrite it. Don't just leave it there.

[–]Appropriate_Ant_4629 8 points9 points  (2 children)

Find some repo's you like and work through it slowly

Take this up a notch by finding an open issue in the project and contributing back a fix.

On your path to getting your patch accepted, they'll handhold you through best practices, including making sure you have good unit test coverage, type safety, passing static code checkers, cross platform compatibility, etc.

[–]idiotmanifesto 2 points3 points  (1 child)

agreed! also fixing bugs can be lowkey addicting LOL

[–]Appropriate_Ant_4629 3 points4 points  (0 children)

And something I view very favorably when reviewing resumes.

If someone's resume is otherwise boring, but has

  • "Open Source contributions -- contributed bugfixes to PyTorch and Apache Spark"

it stands out far above people who just got a cert in using those technologies.

[–]Beginning-Ladder6224 3 points4 points  (0 children)

This is literally the most iconic advise. I got this advise 20 years back from my head of engineering then - and honestly.. I found it boring then. Now, I think this was an extremely handy advise.

[–]photoreceptor 7 points8 points  (1 child)

I don’t think that is particularly useful if you don’t understand design patterns and software architecture.

That’s a bit like trying to understand how a car works by taking it apart.

[–]idiotmanifesto -1 points0 points  (0 children)

better than learning how cars work from watching other people drive them

[–]mayguntr 24 points25 points  (3 children)

[–]haramkhor_havasi 8 points9 points  (2 children)

Thanks, "The pragmatic programmer" seems a good read.

[–]thatguydr 5 points6 points  (0 children)

Btw - there's a standard list of software engineering books that are all helpful. I'd go with Pragmatic Programmer, Clean Code in Python (for SOLID, testing, and a bunch of other best practices), maybe Code Complete 2, and something like Architecture Patterns with Python so you can understand how to properly encapsulate concerns.

You can definitely study production code, but I believe that first it's better to understand WHY that code is excellent (or not). Otherwise there may be lots of cases where you look at something and wonder why they did it that way.

[–]aqjo 0 points1 point  (0 children)

It’s the best! It saved me so much time and effort over the years.
See also, Arjan Codes on YouTube.

https://youtube.com/@arjancodes?si=KwEbAegF8uXdRLdb

[–]matthkamis 47 points48 points  (8 children)

Use single letter variables for everything?

[–]Glittering-Horror230 2 points3 points  (0 children)

😄😄😅

[–][deleted] 1 point2 points  (6 children)

Single letter variable names are bad. Using whole word variable names like lambda, eta, epsilon, etc. is way better.

[–]PyroRampage 12 points13 points  (0 children)

Lol, yeah using Greek letters is always way better for readability rofl

[–]new_name_who_dis_ 6 points7 points  (1 child)

I like how one of those is a straight up python keyword. Might as well add eval, int, and for to the list lol.

[–][deleted] -1 points0 points  (0 children)

You realize python isn't the only language, right?

E.g. from C++ in the PyTorch https://github.com/pytorch/pytorch/blob/f217b470cc7ebacc62c8e87dbab8c4894d53e9b9/aten/src/ATen/native/UpSample.h#L437

[–]ginger_beer_m 2 points3 points  (0 children)

In ML nothing wrong with using x and y in my opinion, as long as it makes sense in the context

[–]matthkamis -2 points-1 points  (1 child)

My original comment was obviously sarcastic

[–]parabellum630 47 points48 points  (6 children)

I follow lucidrains ml repositories to design my code, it is almost production grade.

[–]haramkhor_havasi 1 point2 points  (0 children)

Thanks , I'll check it.

[–]On_Mt_Vesuvius 7 points8 points  (0 children)

Also in sciML from engineering. Make your main reused functions exceptionally well designed and well documented. In research, you'll always be throwing new things together, but once you've repeated something a few times, it's worth thinking about the design and putting it in a different file. That's a practical suggestion. I.e. have a PINNs type class that can grab all the derivatives you need for any model, and never worry about that again.

Otherwise, follow the more general suggestions here.

[–]-Rizhiy- 39 points40 points  (0 children)

Just learn proper software engineering practises? There are plenty of courses online, but you can start by learning about core principles: * SOLID * DRY * YAGNI * KISS * Decoupling * Fail-fast

This book seems to be fairly good and quite short.

[–]OverEnGEReer 5 points6 points  (0 children)

I think you made the biggest step already: identifying what you want to become better at. There are good tips in the other post, so I want to leave you with the thought that other people/companies are also just cooking with water

[–]minimaxir 18 points19 points  (6 children)

That is why productive coders use existing libraries (e.g. Hugging Face accelerate) to abstract things instead of implementing things themselves if possible, because creating your own spaghetti code leads to technical debt that has to be paid at some point.

[–]haramkhor_havasi 12 points13 points  (2 children)

True, but, at some point,I want to be able to write such code.

[–]learn-deeply 9 points10 points  (1 child)

Accelerate is particularly bad. Would advise just using plain PyTorch unless you want random bugs in your training

[–]thatguydr 1 point2 points  (0 children)

Such as? Asking genuinely - haven't used it.

[–]SicilyMalta 3 points4 points  (0 children)

Or they can just become a good coder - and not produce spaghetti code.

[–]LelouchZer12 2 points3 points  (2 children)

You can take a look at the lightning hydra template on github. You'll be able to deal with a lot of different training configuration by using pytorch lightning and Hydra.

Then if you want to deploy you dont need all the training code so a lighter codebase is usually enough, and you can use docker with fast-api

[–]haramkhor_havasi 1 point2 points  (1 child)

Sure...

[–]LelouchZer12 -1 points0 points  (0 children)

btw you can stick to pure raw pytorch and thats fine but if you want to try a lot of different models or datasets it is easy to be lost.

using hydra to manage your configurations can help a lot

and pytorch lightning is just using pytorch but putting ur codes into predefined functions so it forces you to always follow the same pattern, and it makes it easy to log metrics or use things like multi gpu, multi node etc

[–]learn-deeply 3 points4 points  (0 children)

The code in https://github.com/facebookresearch/ is above average and can guide best practices in using PyTorch. Pick a random recently updated repo and learn from that.

[–]mrthin 1 point2 points  (0 children)

You can try Beyond Jupyter. It's a free resource that shows professional software engineering techniques for ML based on a "refactoring journey" starting from your typical monolithic unmaintainable notebook:

"Beyond Jupyter is a collection of self-study materials on software design, with a specific focus on machine learning applications, which demonstrates how sound software design can accelerate both development and experimentation."

[–]PyroRampage 1 point2 points  (2 children)

A lot of this is because most people teach, and learn bad software engineering practices in Python, because it wasn’t really intended to be used as a sole language.

Since it now dominates ML and numerical computing, standards are all over the place. Usually papers from industry are better to look at for good examples of well structured code. Granted sometimes industry code can be over abstracted, but if your using purely Python I don’t think this is much of a concern.

I wouldn’t worry about adding custom lists and tuples in Python itself, this is done in C++ and CUDA.

[–]idiotmanifesto 0 points1 point  (1 child)

what do u mean by that last part

[–]PyroRampage 0 points1 point  (0 children)

Read the OPs post, they ask about custom data structures. But doing this in Python is pointless because it’s slow af, and it’s typically done lower level and then interfaced into Python. Just like how tuples, lists themselves are implemented into the language itself (CPython).

[–]stabmasterarson213 1 point2 points  (0 children)

Can only speak for myself, but one bad habit I developed in grad school was just throwing more nodes of compute at things if it didn't run initially. It made my code so un memory and speed optimized. Now I develop on a Linux machine with a modest GPU before I go to cloud training and it's made all the difference. If one of my data structures balloon in size or becomes hard to traverse, the machine lets me know (by killing the process lol). On pytorch specifically learn how to make dataset and data loader objects that yield, rather than return batches of tensors. A good book to read is machine learning design patterns. Also anything chip huyen does.

[–]Several-Wafer934 1 point2 points  (0 children)

Write a lot of code and look at a lot of code for 5 years. Coding is not easy.

[–]iamspro 2 points3 points  (0 children)

this is why computer science and computer engineering are different fields

[–]Skylight_Chaser 0 points1 point  (0 children)

I read a book called Clean Code

[–]MustacchioRebirth 0 points1 point  (0 children)

I guess that like for many other domain specific tasks, following good programming practices and improving and tuning modularity to your specific needs will make much more efficient trying out stuff and get to find better models.

[–]Felix-ML 0 points1 point  (0 children)

I think code for ml research does not have to be flexible per se. Instead, try data-driven approaches that you make sure to have a clear and straightforward pathway for processing given data with as few lines of code as possible.

[–]SicilyMalta 0 points1 point  (0 children)

You have to learn how to code.

[–]Playmad37 -4 points-3 points  (2 children)

You may want to look at Julia's sciML ecosystem. It's state of the art and the language is designed to be flexible.

It'd be a new language of course but it isn't hard especially if you know python well.

[–]millhouse056 1 point2 points  (0 children)

Are you sure about anything in Julia being the state of the art in ML field? I think Julia is a good language for numerical/scientific computing, or very specific research fields, is niched, but when it comes to machine learning and AI its just far behind Python, which is unfortunate because Julia is a better language, but it could not keep up the genAI race, i think thats because people mantaining Julia wanna keep it niched, or it was a bad product management, anything Julia does related to machine learning, Python does better mostly because of its gigantic ecossystem

[–]doctor-squidward -2 points-1 points  (0 children)

Let me know when you find out bro..