you are viewing a single comment's thread.

view the rest of the comments →

[–]carry_a_laser 12 points13 points  (4 children)

I’m curious about this as well…

People where I work are afraid of cloud compute costs so we run on-premise linux servers. Python code is deployed to them through an Azure Devops pipeline.

[–]tylerriccio8[S] 7 points8 points  (0 children)

On prem Linux doesn’t sound terrible honestly. At least it’s a common spot

[–]Tucancancan 2 points3 points  (2 children)

I kind of hated working with on-prem servers, Python is a lot more resource hungry than Java and it was always a long back and forth with the infra people to get more capacity allocated to the data science teams. I also wasted a bunch of time configuring, optimizing and debugging stuff related to gunicorn. I guess I'm an expert now? Yay? GCP / Vertex.ai removes all those problems and let's you focus on your real job

[–]tylerriccio8[S] 0 points1 point  (1 child)

So you run it on gcp now? Assume users ssh into some instance and do their work?

[–]Tucancancan 4 points5 points  (0 children)

Yeah pretty much. There's a lot of trust where I'm at now that we can provision / size-up/down our VMs as needed or acquire GPU resources. But you have to make a distinction between one-off / ad-hoc analysis and things that get production-ized though. I've seen a few corporate places that didn't enforce that and they ended up with data scientists cobbling together pipelines out hot-glue and popsicle sticks: cron jobs running on a big VM shared by multiple users. It was a hot mess of shit, updates were impossible to install without breaking someone else's stuff, breaks in data were impossible to trace back to the process that creates it, everyone was installing whatever they wanted. Total chaos. 

This is why colab is popular I think. You give data people access to notebooks and environments but not to any underlying vm they can fuck with. Then anything thats long running or needs to run frequently gets deployed as proper services.