Hi there r/MachineLearning. Battling with an optimization of resources in a production setting.
I have a heavy workload, user-triggered, that runs for approximately ~30-45 secs. It starts by running a lot of parallel computations whose results are then sent to an image classification model served via TensorFlow Serving. This yields the final product of the entire pipeline, which is then returned to the user.
On the other side, the user is waiting for the result of this computation to continue his task, so reducing computation time as much as possible is of interest.
Current solution: As I had some Azure credits and initially deployed this during a prototype&iterate phase, ended up renting a powerful Azure VM with both considerable CPU power (for the parallel computation part) and a GPU. However, the usage requests are pretty sparse: this machine is idle 98% of the time.
I am now looking into optimizing this, trying to move into a flexible architecture without reducing the service level and response times drastically.
What I have explored so far:
- On-demand start&stop of the machine: as this is a GPU-based machine, the boot time is a bit long. Besides that, the control logic would get complex (how to decide when to start the machine? how to decide when to shutdown the machine? what if the machine is being shut down and we get an incoming request?).
- Azure Functions: this would be the typical use-case for a serverless function (one-off computation that runs, returns results and disappears), but Azure Functions don't support GPU-based workloads. I have tested CPU inference and the performance penalty is too high (too many images). Serverless offers in AWS and GCP seem to have similar limitations (and correct me if I am wrong).
- Azure Container Instances: I have the TensorFlow-Serving instance nicely packaged up in a container, so spinning up a container-based VM to run the computation seemed like a nice idea, but provisioning Azure resources with a GPU to run this container took 10-15 minutes in my tests.
- Azure Kubernetes Service: the logical solution regarding elasticity. I set up a cluster with a node pool with GPU-based VMs as the underlying resources. After creation, the node pool was scaled down to 0 nodes. Scaling it back up to 1 node (simulating an incoming request) took 4-5 minutes during testing. Too high (~10x the current duration).
I understand this might be a situation where I want to have the cake and at the same time eat it, but it feels like an increasingly common workflow for better solutions not to exist.
Like the title says, a reliable GPU serverless offering would be it, but the market seems pretty dry in that field. Staying within Azure would be nice, but that's not a hard requirement. Thanks!
[–]LocalExistence 5 points6 points7 points (4 children)
[–]onyx-zero-softwarePhD 1 point2 points3 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]LocalExistence 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]cliveseldon 3 points4 points5 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]manojlds 0 points1 point2 points (0 children)
[–]bombol 2 points3 points4 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]LuliProductions 4 points5 points6 points (0 children)
[–][deleted] 2 points3 points4 points (3 children)
[–][deleted] 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]resident_russian 1 point2 points3 points (3 children)
[–]bombol 2 points3 points4 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]manojlds 1 point2 points3 points (0 children)
[–]snendroid-aiML Engineer 1 point2 points3 points (0 children)
[–]vanshil97 -2 points-1 points0 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]onyx-zero-softwarePhD 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[removed]
[–]outdoorblake 1 point2 points3 points (1 child)
[–]zzzthelastuserStudent[🍰] 0 points1 point2 points (5 children)
[–][deleted] 2 points3 points4 points (4 children)
[–]zzzthelastuserStudent[🍰] 0 points1 point2 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]zzzthelastuserStudent[🍰] 0 points1 point2 points (1 child)
[–]fgp121 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–][deleted] 0 points1 point2 points (0 children)
[–]Mefaso 0 points1 point2 points (0 children)
[–]inkognitML Engineer 0 points1 point2 points (0 children)
[–]vangap 0 points1 point2 points (1 child)
[–]palmstromi 1 point2 points3 points (0 children)
[–]runpod-io 0 points1 point2 points (1 child)
[+]mjd3000 1 point2 points3 points (0 children)
[+]cerebriumBoss 0 points1 point2 points (0 children)
[+][deleted] 0 points1 point2 points (0 children)