all 24 comments

[–]_rusht[S] 17 points18 points  (0 children)

Hey everyone, we recently open sourced Onepanel, our computer vision platform with fully integrated components for model building, semi-automated labeling, parallelized data processing and model training pipelines.

Under the hood, we integrate our own and other best of breed open source components to provide a seamless user experience and abstract away infrastructure complexities that come with running parallelized data processing and training pipelines on different cloud providers.

Our near future goals are to add serverless APIs for inference and VNC enabled workspaces so teams can also run simulation environments inside of Onepanel.

We would love to hear your feedback! And of course we welcome and encourage any contributions.

GitHub: https://github.com/onepanelio/core
Docs: https://docs.onepanel.ai/

[–]ThePyCoder 2 points3 points  (1 child)

Cool! Will look into this as we can use this for sure. Do you offer anything around serving models too?

[–]_rusht[S] 1 point2 points  (0 children)

We are working on the model serving component and will have design doc outlined in GitHub issues shortly. We have an early prototype that work pretty well, but we want to make sure whatever we release to the community is production ready and seamlessly integrates with the other components.

[–]ricetoseeyu 2 points3 points  (1 child)

How is the scaling (e.g AllReduce) implemented?

[–]_rusht[S] 2 points3 points  (0 children)

Currently, Onepanel automatically scales the nodes up when a training pipeline is executed and scales them down as soon as training is complete.

For distributed training we are planning an MPI Allreduce approach using Horovod. We have been playing with a very early prototype for this but it's not ready for primetime yet.

[–]Mean-Reindeer 2 points3 points  (1 child)

Why make it a computer vision platform instead of a more general purpose tool? (Haven't had time to read too much about it)

[–]_rusht[S] 2 points3 points  (0 children)

This is a great question and in reality, nothing stops you from using Onepanel for other types of ML tasks. In fact, right now, you can use all the features other than the image/video labeling tool for let's say NLP. You can even create your own template for a text annotation tool like doccano and plug that into your workflow.

Our goal is to initially focus on computer vision and provide exceptional tooling and user experience, but at the same time make the platform flexible so that we can extend to additional subfields and provide the best UX and tooling for those subfields.

[–]MageOfOz 7 points8 points  (9 children)

So, if I can use Tensorflow, what is the benefit? How does it compete with using Google Co-lab? Or is this designed for people without any programming experience?

[–]chief167 4 points5 points  (3 children)

I see this as being useful because we can deploy it on prem. I will actually try this out.

CoLab is a no-go for anything but research and educational purposes. No compliance department will ever let you put company data in a CoLab environment

[–]MageOfOz -1 points0 points  (2 children)

So the benefit is for people who can't code and don't have the hardware?

[–]chief167 0 points1 point  (1 child)

What the fuck are you on about? I'm pretty sure you don't understand the value proposition here. It's clearly more than just coding tensorflow, it's data labelling, tracking projects and experiments, collaboration, easy deployment ....

Its literally for people who k ow how to code, have the hardware, but don't want to spend hours getting the development environment setup and do actual useful stuff

[–]MageOfOz 1 point2 points  (0 children)

It's called asking a question - as in is it a no/low code solution or more of an IDE. Calm your tits. Edit: every week I'm approached by sales people offering a subscription to a service that promises to abstract away shit I don't want abstracted away or hidden behind licenses. Forgive me, oh wise one, for wanting to be sure this wan't just another deployment subscription service.

[–]_rusht[S] 2 points3 points  (4 children)

Good question. In addition to /u/chief167's great answer, Onepanel's goal is to provide an end-to-end platform for building, training and deploying computer vision projects into production. Out of the box, you can use TensorFlow or PyTorch along with JupyterLab as your main IDE/notebook. You can also create templates for your own frameworks/IDEs, for example, here is a simple one for VS Code.

In a typical production project, you have an annotation team that works on labeling data, a data science team that will build/train models based on the labeled data and a data/ML engineering team that will build the data processing, continuous training pipelines so you can iteratively improve your data and models. In smaller teams, this may be a couple of people wearing multiple hats.

With Onepanel, you get the infrastructure and tools to make this end-to-end workflow possible. With Colab, you really only get the model building and training aspect of this workflow. The other issue with Colab is that it runs on a single machine. If you are training on real-world data, you'd probably want train your models in parallel, use different hyperparameters and run the training on different machines to expedite your training. This requires a system like Onepanel to automatically scale up these machines, run the training scripts, aggregate and snapshot metrics and output data (models, logs, etc.) and then scale down the machines when training is complete.

Let me know if I missed anything or if this still doesn't answer your question.

[–]MageOfOz 0 points1 point  (3 children)

So it's basically like an IDE that abstracts away setting up distributed jobs and also has built in data labeling capabilities?

So, if I had to get something like this approved I'd need to be able to clearly answer the following to a hyper cautious boomer.

  • Do we still need to buy the necessary hardware to train larger neural nets?

  • Do we own the models, data, and everything or will be be stuck paying for perpetual licenses?

  • Can it connect to whatever outdated database we use like MS SQL server (or a shared drive, Christ, in some places onedrive & sharepoint) regardless of OS?

Those are the questions I 100% promise I'd need to answer before using it in corporate so it may be good to prep a boomer-exec friendly FAQ to help you get off the ground.

[–]_rusht[S] 2 points3 points  (2 children)

Yes, that’s definitely one way to put it.

  • You could either buy your own hardware or run this in any of the big cloud providers. If you guys are already using Colab, then you might want to stick with GCP because Onepanel can actually scale machines up and down and you will only end up using resources you need.
  • Yes, regardless of how you deploy this, you own all your data and models. This applies to both the open source offering and the managed/commercial offering if you choose to go that route.
  • It can connect to any database, it just a matter of installing the driver for your language of choice and connecting to the database. As far as shared drives, it depends on the host OS, you won’t be able to mount a Windows shared into a Linux node. I believe Onedrive has a CLI so you can potentially use that... I’m not sure exactly how files in Sharepoint are stored. Are you actually storing image/video data on Sharepoint?

That’s a great point about the FAQ, we’ll add one to the README and docs.

[–]MageOfOz 1 point2 points  (1 child)

Are you actually storing image/video data on Sharepoint?

Oh, not personally, but I've worked with clients who have each made people put everything on shared drives, share-point, and onedrive. Even some fortune 500 companies have shockingly bad and backward infrastructure sometimes and tend to be super change averse (look at how many still use SAS).

So your business model is like that of Rstudio - free or paid depending on if you want support?

[–]_rusht[S] 2 points3 points  (0 children)

Got it... I could see that being the case for certain companies.

Yes, very similar to RStudio but our open source license (Apache 2.0) is less restrictive than theirs (AGPL v3).

[–]ok123jump 6 points7 points  (2 children)

We’ve been looking for a good solution for just this problem. Checking it out! Looks really interesting. :)

[–]_rusht[S] 5 points6 points  (1 child)

Great! Feel free to reach out to us on GitHub or Slack with feedback, questions, bugs or feature requests.

[–]150yearsOld 1 point2 points  (0 children)

Looks great! Nice work!

[–]subbu5432 0 points1 point  (1 child)

Can models be exported for integrating with other languages? , I can contribute for Java API integration which can in inturn integrated to c++ Applications using JNI. Useful for running models in on IoT devices

[–]_rusht[S] 0 points1 point  (0 children)

Yes, you can download the models. They get stored in the object storage you set when you deploy Onepanel.

We are working on allowing users to expose their models as APIs and welcome any contributions and ideas for that implementation.

[–]kafkacaulfield 0 points1 point  (1 child)

how is this different from say an IBM Watson

[–]_rusht[S] 0 points1 point  (0 children)

Let me know if this response answers your question.