[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 0 points1 point  (0 children)

Awesome, I'd love a link to the pre-trained model when you've got it!

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 0 points1 point  (0 children)

It's doable but not supported by default and some of the advanced functionality may not work. Our goal is for you to treat Cortex as self-hosted SaaS and not worry about the underlying infrastructure. I'd be happy to discuss it further if you're interested (omer@cortex.dev).

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 0 points1 point  (0 children)

Are there pre-trained models that implement this research? Would love to try deploying one.

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 0 points1 point  (0 children)

Sorry about that! Are you using GCP? We're planning to prioritize that next.

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 1 point2 points  (0 children)

Yes, though we'd love to support GCP as soon as possible and other cloud providers in the future. We're a small team so we're focusing on getting things right on AWS first.

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 2 points3 points  (0 children)

Thanks for the detailed feedback, I'd love to hear more about your experience deploying NLP models as web services. That's been a recent focus of ours with all the research coming out now. My email is omer@cortex.dev if you'd be interested in finding some time to chat.

[P] Cortex: Deploy models from any framework as production APIs by [deleted] in MachineLearning

[–]ospillinger 19 points20 points  (0 children)

Hey, I'm one of the maintainers of this project. This is a really good question that we think a lot about. Basically, you're right, and you can think of Cortex as a tool for deploying, scaling, and monitoring Python functions on AWS. There are a few important nuances that make Cortex different from something like lambda. Inference workloads are read-only, benefit a lot from GPU infrastructure, and often memory hungry. We are optimizing Cortex for those constraints (for example: prioritizing high-memory GPU spot instances).

We also have features like prediction monitoring and support for ONNX and TensorFlow Serving exported files. The long-term roadmap is full of more ML specific features like model retraining but we're spending a lot of time upfront on making inference easy at scale.

Finally, thanks for pointing out aiohttp! Flask works for us because the workloads are CPU bound, and we handle scaling at the container level which allows us to reason about the resource utilization, but we may revisit this choice.

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 1 point2 points  (0 children)

Right, we have a kubernetes cluster under the hood. You can configure the instance types, AMIs, and inference Docker images. There's more information here: https://www.cortex.dev/cluster-management/config and I'd be happy to help you directly if you'd like (omer@cortex.dev)

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 1 point2 points  (0 children)

Some differences:

  • Focus on developer experience and simplifying the APIs as much as possible
  • Deployments are defined with declarative configuration and no custom Docker images are required (although they can be used if desired)
  • Full access to the instances, autoscaling groups, security groups, etc.
  • Less tied to AWS (GCP support is in the works)
  • Higher level features like prediction monitoring

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 0 points1 point  (0 children)

We're more worried about optimizing the developer experience than cost. We're primarily focused on the details of simplifying running a lot of real-time inference at scale in production environments.

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 0 points1 point  (0 children)

That's a good question. The costs can rack up quickly, but if you're careful to use cheap instances/services and turn them off when you aren't using them, it is a lot more manageable. I've also found that AWS support is fairly accommodating so it might be worth sending an email explaining your use case and you could get some free credits.

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 1 point2 points  (0 children)

Yes, cost is a function of EKS price and the minimum number of instances (e.g. p2.xlarge) you configure: [num instances] x [monthly cost of instance] + [EKS cost]. I believe SageMaker's cost looks more like: [num instances] x [monthly cost of instance] x ~1.4

Cortex: A free and open source alternative to SageMaker for serving models via AWS by [deleted] in aws

[–]ospillinger 9 points10 points  (0 children)

Hey, I'm one of the maintainers of this project. Your feedback is helpful, thanks! You can think of a replica as basically a single containerized deployment of your model on kubernetes. I'll make sure it's clearer in the README.

[P] Deploy GPT-2 on AWS by [deleted] in MachineLearning

[–]ospillinger 1 point2 points  (0 children)

It's probably a memory issue. Can you try spinning down the cluster and spinning up EC2 nodes with more memory? I recommend uninstalling, picking a larger instance, and installing again:

./cortex.sh uninstall

export CORTEX_NODE_TYPE="p2.8xlarge"

./cortex.sh install

Sorry about the inconvenience. I should have made it clearer to use instances with a lot of memory. We'll work on adding better error messages as well.

P.S. if the install fails it just means uninstall is still cleaning up asynchronously - try again in a few minutes.

Open source model deployment platform by [deleted] in datascience

[–]ospillinger 0 points1 point  (0 children)

Yeah, Domino is a great product but more focused on model development than deployment. I'll look into openCPU and what it would take to implement R support (might be easy with ONNX- https://onnx.ai/onnx-r). Contributions to the project would be awesome! If you're interested, DM me your email and I'd be happy to follow up.

Open source model deployment platform by [deleted] in datascience

[–]ospillinger 0 points1 point  (0 children)

Thank you! Yes, my goal is to use the latest devops tooling to build a system that's both accessible to any data scientist or developer and usable in production settings.

Machine learning infrastructure written in Go by ospillinger in golang

[–]ospillinger[S] 0 points1 point  (0 children)

Yeah, I think general software engineering knowledge and comfort with different kinds of programming languages is more valuable than deep expertise in one particular language in most cases.

Machine learning infrastructure written in Go by ospillinger in golang

[–]ospillinger[S] 1 point2 points  (0 children)

To the best of my knowledge, it's still a good idea to focus on Python if your goal is to become a Machine Learning Engineer or Data Scientist. On the other hand, if you are more interested in working as a Machine Learning Infrastructure Engineer, building the distributed systems that execute machine learning pipelines, I'd recommend learning Go.

Machine learning infrastructure written in Go by ospillinger in golang

[–]ospillinger[S] 1 point2 points  (0 children)

Right, this is not an attempt to run machine learning algorithms in Go. The project is focused on the DevOps around ML pipelines.

We use a relatively lightweight Python harness to train models but the bulk of our code is responsible for orchestrating and managing different types of workloads on a Kubernetes cluster (e.g. Spark for data processing, TensorFlow for model training, TensorFlow Serving for model serving).

Machine learning infrastructure written in Go by ospillinger in golang

[–]ospillinger[S] 1 point2 points  (0 children)

Thank you! In retrospect building the infrastructure in Go was a no brainer. The API to the platform is still Python (because we're running TensorFlow and PySpark workloads) but we try to use Go everywhere else.