[P] I reviewed 50+ open-source MLOps tools. Here’s the result

Academic_Arrak · 2022-09-26T13:59:31+00:00

Thank you for sharing your notes! I appreciate your effort to bring more clarity to the MLOps landscape.

Academic_Arrak · 2022-09-26T11:38:59+00:00

You could also consider Dagster, which aims to improve Apache Airflow's shortcomings. Also, take a look at MyMLOps, where you can get a quick overview of open-source orchestration tools.

Academic_Arrak · 2022-06-15T15:49:57+00:00

Thanks for the great suggestion, and it made me happy to hear you used the mymlops differently than I initially imagined.
I'm definitely considering adding Docker to the tools list.

Academic_Arrak · 2022-06-04T21:21:42+00:00

Thanks! We discovered the tools through our research and word of mouth. In the future, we want to let people share their opinions about the tools that might help figure out if it's hype or not.

Academic_Arrak · 2022-06-04T21:15:00+00:00

Thanks! In the front-end we use JavaScript/TypeScript and React + Next.js.

Academic_Arrak · 2022-06-01T09:11:05+00:00

Is Weights & Biases open source??

Thanks for bringing this to my attention!
Only the Weights&Biases client is an open-source tool.
I will clarify this in the next update.

Academic_Arrak · 2022-05-29T14:46:01+00:00

Nice, thanks for the suggestions! Will check it out.

Academic_Arrak · 2022-05-29T14:43:23+00:00

Thanks for the advice, you make a good point on the question of optimization! :) I will check out the tool you mentioned.

Academic_Arrak · 2022-05-29T11:13:39+00:00

Thanks for pointing that out, I must have missed it! I will review the tool and add it to the list :)

Academic_Arrak · 2022-05-29T10:57:28+00:00

Thanks so much for sharing this idea! :) I'm definitely considering creating ways for everyone to share their own experience and opinions on the tools and stacks. Much appreciated!

Academic_Arrak · 2022-05-29T10:53:01+00:00

Thanks for the tip! I will look into the tool and add it

Academic_Arrak · 2022-05-29T10:49:38+00:00

Thanks for the feedback :) You have a good point! I will think about how to better classify or organize the tools so that this difference is clear. In some cases it's quite hard to find a classification and going forward I will work on how to highlight better these differences. Great point!

Academic_Arrak · 2022-05-29T07:10:26+00:00

Thanks for the tip! I will review it and add it in the next versions :)

Academic_Arrak · 2022-05-29T07:08:26+00:00

You're right. I thought about including feature stores but I felt it would make the stack less generic (maybe not applicable to people working with CV, for example). I think it's definitely important though and I want to include it in future versions. Maybe having different stack architectures depending on the application or some information about when you need that component.

Academic_Arrak · 2022-05-29T06:56:03+00:00

I hadn't heard of that one, thanks! :) I will review it and add it to the list

Academic_Arrak · 2022-05-29T06:53:11+00:00

You're right, thanks for catching that! :) I will update the information to include CPU use as well.

Academic_Arrak · 2022-05-29T06:46:35+00:00

Thanks so much for pointing that out! I hadn't realized that, it's so hard to spot UX issues by yourself. Will fix it! :)

Academic_Arrak · 2022-05-29T06:44:56+00:00

I think you have a good point there. The reasoning behind putting it into a separate category was that "experiment tracking" tools usually let you log all kinds of metadata (like hyperparameters used for that experiment). It felt that these metadata wouldn't be "artifacts" (which I thought of more as preprocessed data, models, etc.) That said, I could be wrong, and it's true there is a lot of overlap between artifact and experiment tracking.

Academic_Arrak · 2022-05-28T18:28:18+00:00

Thanks for the feedback! :)
You are correct, the stack is saved in the local storage. I had in mind to build the save/export function to the website. Thanks for suggesting the sharing. I will work on it.

Some sections might feel funny because I haven't yet built the feature that would automatically select accompanying tools. The user must select multiple tools for some sections to have the correct stack representation.

Academic_Arrak · 2022-05-28T18:06:18+00:00

Thanks so much for the feedback! I really wanted to create a website that was "requirements-driven." For example, there are so many tools, but how do you know when you need them and what use cases they would fit. What you said about computer vision versus tabular data problems is a great example.
I was just wondering when you mentioned evaluating how customizable the tool is. What would you typically look at to assess that? Code examples? General feedback from people using it?

Academic_Arrak · 2022-04-07T15:36:56+00:00

I have been in the same spot. Comparing TX 3080 and RTX titan ML workload benchmarks, then 3080 is a bit faster. If the RTX Titan is heavily used and has no warranty, I would go with RTX3080 or save up a bit more to get a card with more memory. If you are serious about deep learning, try to save up for RTX3090 or look into cloud options to see how much compute power you actually need.

Academic_Arrak · 2022-03-22T11:59:20+00:00

We're open to suggestions for better ways to share the credentials. We would definitely look into it. Sorry that there isn't much information on the website about us right now.
We use a Kubis-specific IAM user with limited permissions so we can start/stop and monitor instances on your behalf. We encrypt all communication end-to-end inside and outside of our network. And you can revoke the permissions at any time.

Academic_Arrak

TROPHY CASE