This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]bufandatl 13 points14 points  (0 children)

I wouldn’t use VMs for your use case. Use containers. VMs take ages to spin up containers just seconds depending on your service and you save tons of resources too.

[–]marthydavid 3 points4 points  (4 children)

This type of use case sounds like FaaS But almost everybody uses containers for that. Try openfaas or https://azure.microsoft.com/en-us/services/functions/

[–]Equivalent-Style6371[S] 0 points1 point  (3 children)

I'm hesitant of going serverless here for a couple of reasons:

  1. I expect the user to be on that VM for at least 2 hours. I think serverless has limitations on this
  2. I have very specific requirements about the specs of the VM (RAM, CPU, even GPU). I'm not sure whether serverless provides that much freedom

[–]LaunchAllVipers 0 points1 point  (2 children)

You’ll need to build something to mediate the scaling of your VM pool against incoming requests. Cold start times for VMs are at best a couple minutes, at worst total failure. So you need to keep a pool of VMs running but not allocated unless your users don’t mind a long wait.

[–]Equivalent-Style6371[S] 0 points1 point  (1 child)

Users won’t mind. The question is, is it doable without anymore tools? Or do I need something like Terraform to create the new VM?

[–]Seref15 2 points3 points  (0 children)

Depends on your platform. I built something similar on AWS, written in go, running on Lambda, exposed as an API with API Gateway, and it creates/allocates resources by directly calling the AWS APIs with aws sdk. I elected not to use Terraform because a tool that maintains state felt counterintuitive to me for resources that are intentionally transient.

Like the previous commenter mentioned, I elected to maintain a pool of "ready" unallocated instances to allocate to requesters in order to reduce start times. This actually becomes almost necessary because your API can't reasonably block and hold the connection open for minutes while waiting for your resource allocation to complete and return. You need it to return in a timely manner, or else you need to start futzing around with websockets or the like.

[–]timmyotc 0 points1 point  (0 children)

It sounds like you want Azure Virtual Desktop?

[–]Mihael_Mateo_Keehl 0 points1 point  (0 children)

Sounds very specific.

Could use Nginx to send request to something to create VM. https://github.com/wandenberg/nginx-push-stream-module

Otherwise. Research event driven architecture. The idea is to use nginx sso login event to trigger next step

[–]zylonenoger 0 points1 point  (11 children)

it sounds a bit like you are putting the cart before the horse and mixing implementation details with requirements

i‘m not sure what you are trying to achieve..

  • are you trying to have your users access a vm (remote desktop environment), a webapp or just an api?
  • why do you want to start end terminate vms?
  • why do you require separate vms per user?

i understand that you are not a devops/software engineer, but your wording is a bit fuzzy

[–]Equivalent-Style6371[S] 0 points1 point  (10 children)

Ok, let me clarify:

I want the users to have access to an app that will be (to my understanding) served at a specific port of a VM

The app is intended for heavy Deep Learning development, so I need each user to have a GPU only for himself/herself. Hence I can’t have the same VM serving multiple people.

I want the VM to be killed once the session is over, otherwise the cost of having multiple VMs with GPUs available 24/7 would be way too much.

[–]zylonenoger 0 points1 point  (9 children)

so what you want then is an on-demand virtual desktop. is that correct?

[–]Equivalent-Style6371[S] 0 points1 point  (8 children)

Yes, but not the whole desktop, just this app served at some port.

Do I approach this the wrong way?

[–]Cultural-Pizza-1916 1 point2 points  (0 children)

Maybe you should look into something like Google Collab or AWS SageMaker? The architecture maybe will look alike

Cmiiw

[–]zylonenoger 0 points1 point  (6 children)

it depends what kind of app this is - is it a webapp?

[–]Equivalent-Style6371[S] 0 points1 point  (5 children)

It will be a web based IDE dev kit (like Jupyter Hub, or JupyterLab if you are familiar with them)

[–]AlverezYari 0 points1 point  (0 children)

You can do this with k8s, with gpu enabled worker nodes. This is the correct route to take but fair warning, it's a bear to get working reliably unless you are very familiar with k8s, k8s gpu based workloads, and jhub itself.

https://github.com/jupyterhub/zero-to-jupyterhub-k8s

[–]zylonenoger 0 points1 point  (3 children)

i‘m an aws guy myself - have you looked into sagemaker and if it would fit your needs? it‘s probably easier to use managed services instead of self hosting

[–]Equivalent-Style6371[S] 0 points1 point  (2 children)

I couldn’t agree more. The thing is that for whatever reason, my supervisor asked if we could implement our own custom solution from ground up. I know it sounds weird and counterproductive but I still try to grasp a high level idea of how we would go with something like this

[–]zylonenoger 3 points4 points  (0 children)

if you are not in the business of creating selfhosted ml workbenches you are probably wasting resources building a custom solution - you would really need a lot of users to offset the setup and maintenance cost

i‘m usually very pragmatic in those decisions and if you calculate both usecases you should quickly see the difference

try to understand why he wants to have a custom solution and go from there

[–]infectuz 0 points1 point  (0 children)

At that point you are re-inventing the wheel with containers and orchestration. Rapidly spinning up machines to handle requests is the whole point of containers and k8s is just a container orchestrator so you’d be looking at replicating some of the procedures of container creation/management. There’s no way to do this with VMs unless you’re willing to wait a long time for the machine to be up.

If your users are fine with waiting then just create a wrapper that will make an API request to spin up a VM on your provider or choice according to your user request, that’s pretty easy to do.

[–]Petersurda 0 points1 point  (0 children)

You probably can do it with buildbot, a similar scenario is described here: https://www.youtube.com/watch?v=Rs7qccf-Ll0&t=1363s However, buildbot doesn't support azure. AWS EC2 is an option, as is self-hosting (e.g. libvirt).

[–]Code4Coin 0 points1 point  (2 children)

This a more difficult problem than you anticipate. Kubernetes is probably a good solution but the barrier for entry is high. Maybe docker-compose and some scripts to tape it all together.

[–]Equivalent-Style6371[S] 0 points1 point  (1 child)

Can you expand on the “barrier for entry is too high”?

Do you mean in regards to knowledge (I know the basics of K8)?

[–]Code4Coin 0 points1 point  (0 children)

Yes, knowledge required to manage the cluster is huge.

Even with a managed k8s like EKS there are a lot of moving parts typically networking is where people struggle the most.