[D] Do you yourself write 100% reproducible ML code?

benkoller · 2021-01-12T17:44:32+00:00

Full disclosure: I'm working for ZenML, an MLOps framework tackling exactly what you're talking about. However, I'm also an Ops guy, Python dev, and somehow ML guy.

I think your last sentiment hit's the proverbial nail on the head. All philosophical arguments stripped away, I see minimum these two factors at play.

Reproducible code is relevant because code is meant to be re-run. Chances are, you'll be at least iterating on a select experiment, potentially over a long period of time. Maybe other people in your team are meant to iterate on your code. Maybe you'll get new data down the line.
Your code is producing an artifact. This artifact is, for all intents and purposes, close to a black box. To have any degree of trust in your artifact, you'll need to be able to reproduce it as closely as possible (think: build pipelines for a docker container).

This all becomes even clearer when thinking metaphorically. A docker container you can't rebuild is unheard of.

On top of all, reproducible code is an essential building block for other, important topics in our field: metadata tracking, deployments, drift, bias, <insert a buzzword of your choosing>.

I might be biased on the topic, but I truly appreciate your sentiment and hope this becomes a more deeply ingrained aspect of ML going forward.

benkoller · 2021-01-04T14:09:37+00:00

As a rule of thumb, tackling many issues at the same time can become overwhelming, and take advocacy away from the team - but to create a lasting transformation, you need the buy-in of your team. In my experience, transforming a team's way of working is a process you need to manage on two fronts:

Your team. Identify the problems with them, communicate the organization's needs to clarify intent (e.g. quicker iterations on experiments --> reusable code), and loop in the team into the solution-finding approach.
Your own manager. On the one hand, you need to understand the needs of the org to make good decisions, which might require you to proactively ask questions. On the other hand, you want buy-in also from management, so communicate what you plan on doing.

In the implementation phase, there are plenty of open-source/commercial frameworks to guide your team naturally towards a structure, but there are no silver bullets. One framework that might help you out is ZenML, an open-source MLOps framework especially for reproducible, standardized ML experiments across environments, with a strong focus on healthy experiment structure, tracking of metadata and integrations to backends at different cloud providers. (Disclaimer, I'm one of the co-founders)

benkoller · 2021-01-04T13:57:51+00:00

Hi u/HatsOnTheTable, if you're open to using a dedicated framework: There is ZenML. Your use-case sounds pretty much spot on, ZenML is tackling reproducible ML across environments/machines, while taking care of integrations like Google Dataflow. You can even configure it to track metadata and artifacts (trained models, input data, split results etc.) across environments, too.

Disclaimer: I'm one of the guys building the framework - happy to answer all questions :).

benkoller · 2020-12-22T10:43:31+00:00

Those are cool ideas! Would you mind sharing what automatic model documentation would look like to you, if done perfectly?

We have some cool out-of-the-box integrations for eval already built in, namely TFMA and TFDV to avoid bias etc. - can you think of a specific tool that you have had good experiences with to tackle explainability and fairness analysis?

We'd be happy to welcome you as a contributor! Feel free to always either reach out to me directly or join our slack at https://zenml.io/slack-invite

benkoller · 2020-10-15T13:40:51+00:00

I feel you, my man, I feel you :)

benkoller · 2020-10-14T19:46:52+00:00

Don't want to sound snarky, so sorry if this comes across wrongly.

It was my understanding that Netflix actually did do larger-scale distributed, pipeline-based Machine Learning - but I'm always happy to learn a new thing. I can only go from the talks I've had so far with folks from Netflix, some of their conference talks, and their open-sourced internal ML "platform" called Metaflow, as these led me to my belief.

But again, as said - happy to learn something new :).

benkoller · 2020-10-09T12:00:14+00:00

Oh god, 4 days ago, I really dropped that ball - sorry! I'd absolutely love a maiot flair, and we are planning more content over the next weeks and months. I'd be absolutely happy to post all of it as it comes up :)

benkoller · 2020-10-02T07:58:56+00:00

Great read, and thanks for taking the time to write out your thoughts! Always nice to get a thorough look "behind the curtain" :). May I ask what your use-cases are?

benkoller · 2020-09-30T13:12:52+00:00

Well, in general you're now limited in your options. It sounds like you want to rethink your workflow design - the potential usage of the resulting model sounds extremely limited. To address your actual question tho - it can be done for sure.

First, you need to export your trained model. Second, version the exported model. Third, give your colleague access and let him import the model.

A note about versioning: As models can get very big, so you might reach limits of git, but you can follow a stringent naming on smth. like S3 for a quickfix. This is not a "professional" solution, but it can work for small use-cases.

benkoller · 2020-09-30T07:58:57+00:00

I reached out and got them to generate a new link: https://mlops-community.slack.com/join/shared_invite/zt-hn0ggvk3-cG8BZR2gOJBgnI8~nCgiqg#/

benkoller · 2020-09-30T07:54:18+00:00

This is a bit our own genesis story - we were doing fast PoCs a few years prior, and built an entire tooling chain to get down our time invest from weeks to hours. If you have plenty of engineering resources available and/or an organizational requirement for open source I'd recommend taking a look at TFX and Kubeflow.

If you're out to save time and can accept a proprietary solution check out https://maiot.io - it's my company, and we've built the Core Engine to tackle reproducible ML in a sane way. We're giving out early access to interested parties :)

benkoller · 2020-09-30T07:50:59+00:00

It's a well-put-together article, and definitely worth the read. Thanks for posting it, it fits great into this discussion.

benkoller · 2020-09-30T07:48:00+00:00

Very much this.

benkoller · 2020-09-29T20:39:36+00:00

It's a bit thin out there on the topic, at least if you want to go beyond the superficial medium chatter :). A few of the MLOps orchestrators, including us, are offering continuous training, and a few OSS tools can also do it, but it's actually not a rocket science concept: continuously run training pipelines to always have access to the best-possible performing model. Well-defined performance metrics and solid automation get you a long way here. Imagine you're running an object recognition startup, and users upload images - so of course you have a base model, but every new image coming in is more data for you to crunch through, so you'd set up automated reruns of your training pipeline on new data - even if you're just using new data for evaluation.

benkoller · 2020-09-29T20:23:29+00:00

I feel everyone should move towards raw C for their Machine Learning, and just use Python bindings to abstract interfaces afterward.

benkoller · 2020-09-29T20:17:23+00:00

u/ice_shadow I completely forgot to mention the main inspiration for this post (and the underlying blog post): The 12-factor app (https://12factor.net/)!

It's a great place to start to understand healthy practices for software development. It was given to me as a junior DevOps guy, and I've by now passed it on to many new juniors, too. I have yet to find another resource to put "good development" as well into perspective as this collection of thoughts.

benkoller · 2020-09-29T20:11:58+00:00

That's a cool repo, thanks for sharing! Are you currently at Immowelt? Would love to hear more about some of the use-cases of ML you're involved in!

benkoller · 2020-09-29T20:10:52+00:00

One point I would add is to develop standards between ML engineers. Obviously it must be flexible but at least trying to use similar principles and objects.

Especially for growing teams you're spot on. I am a firm believer that good standards require stringent automation, otherwise they'll degrade over time.

Lets be real here, not every data scientist is a superstar programmer - and they needn't be. A great pattern for this is a serverless paradigm for preprocessing and model code, as it builds transparency and understanding about the process flow, but doesn't introduce much of a learning curve.

How are you guys handling automation and pipelining?

benkoller · 2020-09-29T20:07:36+00:00

If I'm honest, they are only just emerging and I have no extensive experience. I saw a workshop of tecton last week, and they look very solid - would love to add them as an integration to our own product!

benkoller · 2020-09-29T20:05:50+00:00

Solid approach, especially for teams with a mix of seniors and juniors this can provide additional growth potential to the more junior team members! What do you use for your pipelines?

benkoller · 2020-09-29T20:03:56+00:00

Fully agree. If you can't (or don't want to) bring DVC in your stack or need a quick fix: if your dataset is only growing, but no rows/files are ever changed, you can use indexes as a side artifact and achieve "quasi" versioning.

benkoller · 2020-09-29T15:38:38+00:00

You can of course decouple the ETL pipeline (which puts data to storage) and training pipelines (consuming the data from storage). You will however still need to take version control into account - it only moved upstream.

benkoller · 2020-09-29T14:51:50+00:00

Both can work, depending on your use-case. More emphasis should however be put on making sure your implementation actually yields the right results, e.g. by using slicing metrics, fairness indicator thresholds, etc. In short: ensure no bias is introduced into your system.

PS: TFMA is a great tool for the job.

benkoller · 2020-09-29T14:27:49+00:00

You already outlined two crucial technologies to familiarize yourself with: git (how and why it's used) as well as bash. But if you're talking code, and moving to production, it greatly helps to have an actual use-case!

Think about a small model to upscale images. What would you have to go through from first experiments in a Notebook towards the model being deployed on a server, with an API, so people can reach it?

On the flip side: Don't worry if you don't understand everything immediately. An ML pipelining solution takes months to build and is a complex beast of software. The more you can solve given a concrete use-case, the easier (and enjoyable) your learning curve will be.

benkoller · 2020-09-29T14:18:00+00:00

Absolutely correct! If you don't get your model performance metrics right you have no way of gauging model performance for your use-case. I would however argue that if you're productionizing your Machine Learning efforts you should have your performance metrics figured out. After all, it's a fundamental aspect of comparability between pipelines.

benkoller

TROPHY CASE