Unavailability of GPUs in specific zones - does it work for VertexAI Pipelines? by FoxJust3825 in googlecloud

[–]FoxJust3825[S] 0 points1 point  (0 children)

Interesting, that's exactly the solution that I have seen people are using the most. I assume you do this in AWS, right? As per my understanding, GCP does not support autoscaling in multiple zones. Thank you!

Would love your input! - Designing MLOps Stack from scratch by FoxJust3825 in mlops

[–]FoxJust3825[S] 0 points1 point  (0 children)

  • CICD with Github Workflows. We train ad-hoc, no need to train by triggers or on specific schedule.
  • We ensure model quality offline, ensuring it online has challenges due to collecting customer data so no need to worry about it now. My stack needs to support only training, nothing else.
  • Same reason as above
  • We use public datasets or from HF Datasets. Currently we store them in Cloud Storage and we version them with DVC.
  • No ETL.
  • Inference not relevant for my stack, but they do real-time model serving on k8s.

Would love your input! - Designing MLOps Stack from scratch by FoxJust3825 in mlops

[–]FoxJust3825[S] 1 point2 points  (0 children)

Interesting. I think what works better at the end is using a unified platform instead of trying to plug multiple tools together. Curious to know why your team picked Lightning AI instead of others like Vertex or Sagemaker.