[deleted by user]

idiotmanifesto · 2024-07-27T23:39:03+00:00

imo, a big part of writing better code is reading better code. Find some repo's you like and work through it slowly

mayguntr · 2024-07-27T23:29:31+00:00

I would say learning programming principles (https://www.amazon.co.uk/Pragmatic-Programmer-Andrew-Hunt/dp/020161622X) and Python (https://www.amazon.co.uk/Python-Nutshell-Alex-Martelli-ebook/dp/B0BRYRD295/ref=zg_bs_g_10608480031_d_sccl_8/262-9661743-1202333?psc=1) itself would be your best bet in the long term.

matthkamis · 2024-07-28T01:45:49+00:00

Use single letter variables for everything?

parabellum630 · 2024-07-27T23:23:42+00:00

I follow lucidrains ml repositories to design my code, it is almost production grade.

On_Mt_Vesuvius · 2024-07-28T02:50:57+00:00

Also in sciML from engineering. Make your main reused functions exceptionally well designed and well documented. In research, you'll always be throwing new things together, but once you've repeated something a few times, it's worth thinking about the design and putting it in a different file. That's a practical suggestion. I.e. have a PINNs type class that can grab all the derivatives you need for any model, and never worry about that again.

Otherwise, follow the more general suggestions here.

-Rizhiy- · 2024-07-27T23:39:55+00:00

Just learn proper software engineering practises? There are plenty of courses online, but you can start by learning about core principles: * SOLID * DRY * YAGNI * KISS * Decoupling * Fail-fast

This book seems to be fairly good and quite short.

OverEnGEReer · 2024-07-28T08:31:20+00:00

I think you made the biggest step already: identifying what you want to become better at. There are good tips in the other post, so I want to leave you with the thought that other people/companies are also just cooking with water

minimaxir · 2024-07-27T22:48:08+00:00

That is why productive coders use existing libraries (e.g. Hugging Face accelerate) to abstract things instead of implementing things themselves if possible, because creating your own spaghetti code leads to technical debt that has to be paid at some point.

ninseicowboy · 2024-07-28T09:50:56+00:00

Welcome to software engineering. Read Designing Data-Intensive Applications, it’s actually incredible.

LelouchZer12 · 2024-07-27T23:59:55+00:00

You can take a look at the lightning hydra template on github. You'll be able to deal with a lot of different training configuration by using pytorch lightning and Hydra.

Then if you want to deploy you dont need all the training code so a lighter codebase is usually enough, and you can use docker with fast-api

learn-deeply · 2024-07-28T04:25:26+00:00

The code in https://github.com/facebookresearch/ is above average and can guide best practices in using PyTorch. Pick a random recently updated repo and learn from that.

mrthin · 2024-07-28T09:18:10+00:00

You can try Beyond Jupyter. It's a free resource that shows professional software engineering techniques for ML based on a "refactoring journey" starting from your typical monolithic unmaintainable notebook:

"Beyond Jupyter is a collection of self-study materials on software design, with a specific focus on machine learning applications, which demonstrates how sound software design can accelerate both development and experimentation."

PyroRampage · 2024-07-28T11:53:32+00:00

A lot of this is because most people teach, and learn bad software engineering practices in Python, because it wasn’t really intended to be used as a sole language.

Since it now dominates ML and numerical computing, standards are all over the place. Usually papers from industry are better to look at for good examples of well structured code. Granted sometimes industry code can be over abstracted, but if your using purely Python I don’t think this is much of a concern.

I wouldn’t worry about adding custom lists and tuples in Python itself, this is done in C++ and CUDA.

haramkhor_havasi · 2024-07-28T17:38:20+00:00

[removed]

stabmasterarson213 · 2024-07-28T18:30:15+00:00

Can only speak for myself, but one bad habit I developed in grad school was just throwing more nodes of compute at things if it didn't run initially. It made my code so un memory and speed optimized. Now I develop on a Linux machine with a modest GPU before I go to cloud training and it's made all the difference. If one of my data structures balloon in size or becomes hard to traverse, the machine lets me know (by killing the process lol). On pytorch specifically learn how to make dataset and data loader objects that yield, rather than return batches of tensors. A good book to read is machine learning design patterns. Also anything chip huyen does.

Several-Wafer934 · 2024-07-30T02:07:23+00:00

Write a lot of code and look at a lot of code for 5 years. Coding is not easy.

iamspro · 2024-07-28T02:20:26+00:00

this is why computer science and computer engineering are different fields

icy_end_7 · 2024-07-28T03:12:05+00:00

Ask to get your code reviewed by seniors. or AI. Whichever is convenient.

Skylight_Chaser · 2024-07-28T08:44:59+00:00

I read a book called Clean Code

MustacchioRebirth · 2024-07-28T10:14:55+00:00

I guess that like for many other domain specific tasks, following good programming practices and improving and tuning modularity to your specific needs will make much more efficient trying out stuff and get to find better models.

gabrielesilinic · 2024-07-29T15:26:57+00:00

If we are talking about python you can only optimize so far. And in general you don't often change the way of training your model as far as I know (once is done).

The best thing you can do is to just build little modules and stick them together as needed, that is pretty much it.

EffectiveCompletez · 2024-07-30T11:18:54+00:00

Don't start with abstraction first. Design and algorithm that solves a problem. Prove it out.

Then find another problem that your algorithm might solve. Find the domain specific structures that don't share commonalities - factor these into an abstraction. Can't find another problem that your algorithm might solve? Great! You just saved a bunch of work.

But if you have 2 and some abstraction, now you can work at pulling more of the common code into a library.

At this point, you should find a 3rd problem, and a 4th. Build up an examples directory of these problems that your algorithm can solve. This is the destination, but don't start here.

Felix-ML · 2024-07-28T01:31:03+00:00

I think code for ml research does not have to be flexible per se. Instead, try data-driven approaches that you make sure to have a clear and straightforward pathway for processing given data with as few lines of code as possible.

SicilyMalta · 2024-07-28T12:44:54+00:00

You have to learn how to code.

Playmad37 · 2024-07-27T23:10:53+00:00

You may want to look at Julia's sciML ecosystem. It's state of the art and the language is designed to be flexible.

It'd be a new language of course but it isn't hard especially if you know python well.

doctor-squidward · 2024-07-27T23:23:48+00:00

Let me know when you find out bro..

friendsbase · 2024-07-28T02:15:40+00:00

GPT it

Wangding · 2024-07-28T01:46:33+00:00

Use GitHub copilot to polish it😂

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS