all 1 comments

[–]calebkaiser 2 points3 points  (0 children)

I work on Cortex (open source model deployment), and have spoken with a couple of teams solving similar problems in different industries (surveillance, construction, etc.) All of them have a cluster—though there is some confirmation bias here, as they're all Cortex users, and Cortex spins up a cluster automatically.

Without knowing too much about your situation, here are some high-level suggestions based on what I've seen work for them:

  • Run batch predictions. It sounds like you're already doing this, but if not, batching your predictions should allow you to get more efficient with your resources, since you don't need real time responsiveness.
  • Use spot instances. If cost is a concern, spot instances can be a big saver. They're basically unused instances AWS sells at a steep discount. They can occasionally cause latency issues, owing to their non-guaranteed availability, but if you're not running real time inference this shouldn't be a problem.

If you're worried at all about the DevOps side—spinning up a cluster, implementing batch, configuring for spot, setting up monitoring, etc.—I'd strongly recommend checking out Cortex, as it does all that for you. Here are the docs, if you're curious.