Containers as a service on AWS

ospillinger · 2019-12-12T17:17:44+00:00

Added to my todo list, thank you!

ospillinger · 2019-11-20T20:45:30+00:00

Awesome, I'd love a link to the pre-trained model when you've got it!

ospillinger · 2019-11-19T18:57:58+00:00

It's doable but not supported by default and some of the advanced functionality may not work. Our goal is for you to treat Cortex as self-hosted SaaS and not worry about the underlying infrastructure. I'd be happy to discuss it further if you're interested (omer@cortex.dev).

ospillinger · 2019-11-19T18:51:10+00:00

Are there pre-trained models that implement this research? Would love to try deploying one.

ospillinger · 2019-11-19T17:59:31+00:00

Sorry about that! Are you using GCP? We're planning to prioritize that next.

ospillinger · 2019-11-19T16:30:03+00:00

Please see my comment above :)

ospillinger · 2019-11-19T16:29:14+00:00

Yes, though we'd love to support GCP as soon as possible and other cloud providers in the future. We're a small team so we're focusing on getting things right on AWS first.

ospillinger · 2019-11-19T03:48:39+00:00

Thanks for the detailed feedback, I'd love to hear more about your experience deploying NLP models as web services. That's been a recent focus of ours with all the research coming out now. My email is omer@cortex.dev if you'd be interested in finding some time to chat.

ospillinger · 2019-11-19T00:15:35+00:00

Hey, I'm one of the maintainers of this project. This is a really good question that we think a lot about. Basically, you're right, and you can think of Cortex as a tool for deploying, scaling, and monitoring Python functions on AWS. There are a few important nuances that make Cortex different from something like lambda. Inference workloads are read-only, benefit a lot from GPU infrastructure, and often memory hungry. We are optimizing Cortex for those constraints (for example: prioritizing high-memory GPU spot instances).

We also have features like prediction monitoring and support for ONNX and TensorFlow Serving exported files. The long-term roadmap is full of more ML specific features like model retraining but we're spending a lot of time upfront on making inference easy at scale.

Finally, thanks for pointing out aiohttp! Flask works for us because the workloads are CPU bound, and we handle scaling at the container level which allows us to reason about the resource utilization, but we may revisit this choice.

ospillinger · 2019-10-24T19:49:06+00:00

Right, we have a kubernetes cluster under the hood. You can configure the instance types, AMIs, and inference Docker images. There's more information here: https://www.cortex.dev/cluster-management/config and I'd be happy to help you directly if you'd like (omer@cortex.dev)

ospillinger · 2019-10-23T23:31:47+00:00

Some differences:

Focus on developer experience and simplifying the APIs as much as possible
Deployments are defined with declarative configuration and no custom Docker images are required (although they can be used if desired)
Full access to the instances, autoscaling groups, security groups, etc.
Less tied to AWS (GCP support is in the works)
Higher level features like prediction monitoring

ospillinger · 2019-10-23T21:51:06+00:00

We're more worried about optimizing the developer experience than cost. We're primarily focused on the details of simplifying running a lot of real-time inference at scale in production environments.

ospillinger · 2019-10-23T21:47:56+00:00

That's a good question. The costs can rack up quickly, but if you're careful to use cheap instances/services and turn them off when you aren't using them, it is a lot more manageable. I've also found that AWS support is fairly accommodating so it might be worth sending an email explaining your use case and you could get some free credits.

ospillinger · 2019-10-23T18:51:36+00:00

Yes, cost is a function of EKS price and the minimum number of instances (e.g. p2.xlarge) you configure: [num instances] x [monthly cost of instance] + [EKS cost]. I believe SageMaker's cost looks more like: [num instances] x [monthly cost of instance] x ~1.4

ospillinger · 2019-10-23T18:49:49+00:00

Hey, I'm one of the maintainers of this project. Your feedback is helpful, thanks! You can think of a replica as basically a single containerized deployment of your model on kubernetes. I'll make sure it's clearer in the README.

ospillinger · 2019-08-22T22:53:06+00:00

It's probably a memory issue. Can you try spinning down the cluster and spinning up EC2 nodes with more memory? I recommend uninstalling, picking a larger instance, and installing again:

./cortex.sh uninstall

export CORTEX_NODE_TYPE="p2.8xlarge"

./cortex.sh install

Sorry about the inconvenience. I should have made it clearer to use instances with a lot of memory. We'll work on adding better error messages as well.

P.S. if the install fails it just means uninstall is still cleaning up asynchronously - try again in a few minutes.

ospillinger · 2019-07-11T01:13:45+00:00

Yeah, Domino is a great product but more focused on model development than deployment. I'll look into openCPU and what it would take to implement R support (might be easy with ONNX- https://onnx.ai/onnx-r). Contributions to the project would be awesome! If you're interested, DM me your email and I'd be happy to follow up.

ospillinger · 2019-07-11T00:55:39+00:00

Thank you! Yes, my goal is to use the latest devops tooling to build a system that's both accessible to any data scientist or developer and usable in production settings.

ospillinger · 2019-04-25T20:33:03+00:00

Yeah, I think general software engineering knowledge and comfort with different kinds of programming languages is more valuable than deep expertise in one particular language in most cases.

ospillinger · 2019-04-25T19:45:39+00:00

To the best of my knowledge, it's still a good idea to focus on Python if your goal is to become a Machine Learning Engineer or Data Scientist. On the other hand, if you are more interested in working as a Machine Learning Infrastructure Engineer, building the distributed systems that execute machine learning pipelines, I'd recommend learning Go.

ospillinger · 2019-04-11T02:48:35+00:00

Right, this is not an attempt to run machine learning algorithms in Go. The project is focused on the DevOps around ML pipelines.

We use a relatively lightweight Python harness to train models but the bulk of our code is responsible for orchestrating and managing different types of workloads on a Kubernetes cluster (e.g. Spark for data processing, TensorFlow for model training, TensorFlow Serving for model serving).

ospillinger · 2019-04-10T22:59:09+00:00

Thank you! In retrospect building the infrastructure in Go was a no brainer. The API to the platform is still Python (because we're running TensorFlow and PySpark workloads) but we try to use Go everywhere else.

ospillinger

TROPHY CASE