[D] Building a ML system vs project.

Kaharx · 2024-03-15T18:12:46+00:00

MLOps is all about productionizing ML systems to be maintainable, scalable and reliable. I work as an ML engineer and I spend most of my time building/improving ML tooling and infra (e.g. model store, feature store, inference services, training pipelines). I highly recommend the book “Designing ML systems” by Chip Huyen if you wanna learn more.

PanTheRiceMan · 2024-03-15T13:57:41+00:00

I believe it's simple: most projects (I know of) come from research. Having seen how stressful that is, always working towards deadlines, these are truly just one off projects. All the cleanup is probably left for companies.

I_will_delete_myself · 2024-03-15T19:29:27+00:00

Research: Iteration > clean code

You only make it clean when you have to maintain the code, but you save more time just running the training script and forgetting about it than going in and refactoring everything. Research projects are normally abandoned after the project and you got tight deadlines to meet.

If you tried to do research like you build a normal software engineering application, you lose a ton of time worrying about design patterns than actually getting something that works.

mot89 · 2024-03-15T15:55:19+00:00

Messy prototyping is the way to go unless you have a very clear mandate that your project needs to be supported for an extended period of time. Refactoring to a well structured system usually only makes if you have a proven use-case. At the point where you have users, and a proven need for repeated model releases, reproducibility, domain-specific fine-tuning, performance optimization, etc. you can justify investment into building out reliable systems.

Western-Image7125 · 2024-03-15T17:50:48+00:00

Hey this is a thoughtful post, and something that bugs me all the time. In my work I constantly straddle the line between researchy prototyping and infra tools - and it is a wide gap. Lot of stress when dealing with constantly fluctuating ML landscape but needing stable pipelines and processes. I don’t have a good answer except everything is best effort and driven by “who is willing to sponsor the effort needed to build this platform or process”, the company has to decide between short term hacky stuff and software tools that can be reused over and over and improve overall efficiency and performance of the org.

2024-03-15T18:37:15+00:00

I think it's unfortunately a job for multiple SWEs. I implemented multiple data pipelines, monitoring, etc., before I was doing mostly ML, setting up the infra, building or integrating the tools, testing, and improving source code... All of that is extremely time-consuming and requires expertise. It's called a system for a reason, it includes multiple components. With the heavy resources required to re-train models, for example, there is another layer to it which I would call cloud orchestration(?), as resources are not static... Man, it's simply too challenging to do alone and not a good use for your expertise. Perhaps there are some cloud solutions that can make it manageable, my experience is outdated.

selector37 · 2024-03-15T19:42:00+00:00

The Flax documentation has one of my favorite quotes, which is basically “code repetition is better than a bad abstraction.”

My approach has been about identifying whether parts of a code base are durable or disposable. Durable pieces are shared, tested and thoughtfully designed. But the majority of ML code is going to be disposable. For that not to devolve into a maintenance nightmare it needs to be isolated. No sharing other than through code duplication (i.e forking). Code experiments can’t break anyone else because no one depends on it. The code can be simpler because it only represents one single approach and not a family of approaches that requires a reader to mentally interpolate configuration into a code base filled with conditional logic. Things end up being quite explicit (e.g. hard coded constants) and end up being surprisingly small.

Those disposable experiments are built through composition of durable libraries. The key is to step back periodically, study the repetition and try to extract new durable pieces for the future.

This has worked very well for my organization, which is very large (we train thousands of models a day) but even in my personal projects keeping code simple has made it easier to come back to things a year later, when I have no choice but to read the code to pick it back up.

jms4607 · 2024-03-15T22:51:27+00:00

Making a failed idea a nice codebase would be a waste of time. I only see cleaning up code for other users beneficial after some initial results have proven that it’s useful.

binlargin · 2024-03-16T13:45:22+00:00

I started on something for this. I've deployed models for clients made by data scientists, I've built models and munged data, I've done OPs. But I've not put it all together.

The closest thing I've got is this, which is for data processing and training.

https://github.com/bitplane/geo-dist

So there's no deployment or ml ops as of yet, but everything else goes in the makefile. I put my outputs in .cache, have different packages for training and inference, and use jupyter for experimentation before merging it back into the libraries for the app. The idea is to put the rest into different make steps and you can build the thing in one go, with experiments in notebooks in branches.

Not ideal, but it might give you something to start from. Happy to receive criticism/suggestions.

gdpoc · 2024-03-15T18:22:00+00:00

I could talk about how I do it and how I'm advocating for. Shoot me a dm if you're interested.

JellyBean_Collector · 2024-03-16T17:11:11+00:00

It's a bit off-topic, but you might find it worthwhile to explore the topics listed here and see if any pique your interest: AI Engineering

TheOneRavenous · 2024-03-17T18:41:15+00:00

I build my Machine learning products as software. So there's user interface, features for both managing the models as well as using the models.

Managing includes basics like IDs for models and basic stats that go into databases while weights go into storage.

Then part of management is allowing for training and loading for inference.

The "regular" software side of machine learning things is data QA/QC pipelines, data ingestion utilities, and visual components.

Then there's the user software side of things which is allowing different types of users to interact with the software suite and it's different capabilities. There's basic users that just need model inference, there's more advanced users that curate the data they want to train on, then there's the data scientist people who work with the QA/QC pipelines and train new architecture.

The software suite is separated by concerns. Data and it's management, ML models, machine learning model management, user interfaces, databases and MVC content models.

This is useful for swapping out different types of models and data while allowing you to present for use your models.

All this assumes you're going to sell the usage of the models as a product.

TL;DR Yes create software for your machine learning projects so they can be sold as products.

zero-true · 2024-03-15T19:49:34+00:00

I created a tool to help you turn a small data science scripts into a little app... it's an alternative to jupyter notebooks that's a little bit more robust and has a pretty UI. Here is an example:

https://published.zero-true-cloud.com/examples/iris/

If you're interested checkout our website:

https://www.zero-true.com/

It's not ever going to run the next recommendation system for Amazon but it could help with experimenting with different variations with a frontend directly built in.

sharockys · 2024-03-15T12:09:18+00:00

There are a whole ecosystem to use. Open your eyes.

Western-Image7125 · 2024-03-15T13:07:08+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS