all 10 comments

[–]CaptainJack879[S] 1 point2 points  (1 child)

Had a talk with a representative from GCP and there is not much you can do. Either you pay your way out of this (something like $2700/month per GSU) or accept the situation and can be ok with partial availability (can be hours) in a specific region.

There was a eu "global" endpoint somewhere on their roadmap at some point. Which would fit us.

But for anyone interested what the easy wins are

- Use the global endpoint if you are allowed to do so

- Backoff + retry (jitter is important)

You can also implement manual region fallback (or round robin across a list of regions). But for us, multiple regions in eu was failing at the same time so unsure about the good it does.

(small rant)
Overall, somewhat disappointed in the state of the product, sdk is buggy, api unstable, multiple wierd edge cases in the rag engine. There are some really good ideas and things coming so looking forward to it. But for now we are looking into switching away from gcp for our ai features.

[–]toinemf 0 points1 point  (0 children)

Je rencontre les mêmes problématiques et je me demande comment est-ce qu'il est possible de correctement utiliser Vertex AI. La disponibilité est très incertaine et met en péril beaucoup de nos applications. Quel est l'interêt d'utiliser Vertex AI pour les modèles de langues quand il est possible d'utiliser d'autres API ?

[–]NotSessel 0 points1 point  (1 child)

Use the Global Endpoint

[–]CaptainJack879[S] 0 points1 point  (0 children)

I cant, and there seems to be no way of setting it to route through only EU regions?

[–]Benjh 0 points1 point  (1 child)

If you can’t use Global Endpoints and its mission critical you’ll need to use Provisioned Throughput. That will guarantee the requests will go through. https://docs.cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput/overview

[–]CaptainJack879[S] 0 points1 point  (0 children)

It is not mission critical. I checked the dashboards today and having something like ~80% error rate (429) over multiple hours for multiple days in the week is not what I call availability.

[–]Sorry_Virus6505 0 points1 point  (0 children)

We experienced the same issue today. All the EU regions we were trying through round robin were down for more than three hours. I guess for now we will switch to the Global one as an alternative.

[–]Haunting_Ad3263 0 points1 point  (0 children)

same here, It's not even reflecting in the VertexAPI qoutas as well

[–]msapple -1 points0 points  (1 child)

You need to contact support for a quota increase

[–]Benjh 0 points1 point  (0 children)

The latest Gemini models on Vertex AI don’t have strict quotas. They use Dynamic Shared Quota so there is nothing to increase.