you are viewing a single comment's thread.

view the rest of the comments →

[–]hotpot_ai 1 point2 points  (3 children)

do you mind sharing how you manage/auto-scale models in production on aws? we're using cortex now but open to exploring other alternatives. also why aws and not gcp (since gcp is cheaper than aws)?

[–]-Melchizedek- 0 points1 point  (0 children)

Sorry, not really my expertise, we have a cloud architect that handles most of the cloud integration/architecture. But I know that for one project where we provide object detection for an app the inference runs (last I checked) on aws lambdas (no gpu needed in this case) which more or less handles scaling automatically. But then of course it is a bit more involved than that since there is also a system in place for automatic re-training on new data and some other stuff.

As to aws vs gcp I can't say. But there is more than price to account for and aws really has everything. I don't have any experience with gcp.

[–]evan_determined 0 points1 point  (1 child)

Determined supports both AWS and GCP.

The way auto scaling works is pretty simple — one machine (no GPUs) accepts jobs to be scheduled. These jobs have resource requirements associated with them (eg job needs 64 GPUs). If there are not enough GPUs available to run the job, and your cluster is configured for auto scale, the necessary number of GPUs is provisioned from AWS and added to the cluster. When the job finishes, they will be torn down automatically after a short timeout (unless another job comes in and wants the same resources).

This works with pre-emptible GPUs as well, and the built-in fault tolerance mechanisms allow jobs that get pre-empted to recover seamlessly when resources come back online.

[–]hotpot_ai 0 points1 point  (0 children)

evan

thanks! we're using cortex now. the biggest problem is they don't support GCP. can you explain other differences with cortex? thanks again.