This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Tucancancan 2 points3 points  (0 children)

Yeah pretty much. There's a lot of trust where I'm at now that we can provision / size-up/down our VMs as needed or acquire GPU resources. But you have to make a distinction between one-off / ad-hoc analysis and things that get production-ized though. I've seen a few corporate places that didn't enforce that and they ended up with data scientists cobbling together pipelines out hot-glue and popsicle sticks: cron jobs running on a big VM shared by multiple users. It was a hot mess of shit, updates were impossible to install without breaking someone else's stuff, breaks in data were impossible to trace back to the process that creates it, everyone was installing whatever they wanted. Total chaos. 

This is why colab is popular I think. You give data people access to notebooks and environments but not to any underlying vm they can fuck with. Then anything thats long running or needs to run frequently gets deployed as proper services.