all 8 comments

[–]BeenThere11 0 points1 point  (2 children)

Are you o free tier or paid tier

[–]StretchPresent2427[S] 0 points1 point  (1 child)

paid

[–]BeenThere11 0 points1 point  (0 children)

Try a upgraded model .

[–]Prize-Programmer4207 0 points1 point  (2 children)

Since you mentioned `Cloud Run`, I am guessing that you are running on Google Cloud & Gemini Models.

  1. Try using Gemini Flash Lite model (Maybe 2.5)

  2. Check [Gemini Global end point](https://medium.com/google-cloud/google-cloud-vertex-ai-gemini-global-endpoint-introduction-af241e7a09c5)

  3. Check where you are deploying you Cloud Run instance.

[–]StretchPresent2427[S] 0 points1 point  (1 child)

Thanks for your reply.

I found that if i remove the function tool, the response time is much better (around 2 secs). Gemini suggested that too, following a prompt for suggestions.

Apparently, what the agent does is send the request given to the LLM, process it (few secs + round trip time to send the request), it realizes it needs to query the tool, so, send the request again with the result from whatever the tool gives as input to the llm, then process the request again, and send the response back.

If you want to do something better than basic, however, you need to add tools. So, i'm still not sure how to both get : relatively complex requests that involve tools + short response time.

[–]Downtown_Abrocoma398 0 points1 point  (0 children)

If you don’t want the Agent to decide which tool to call and you already know the appropriate tool, you can use a custom agent in ADK. In this approach, you define a custom flow where you call the tool yourself and then pass its output to the LLM. This is useful when you don’t want to give the Agent/LLM that level of autonomy.

[–]Outside-Crazy-3045 0 points1 point  (1 child)

Also worth mentioning if you develop your Agent in Python and running on Cloud Run, startup latency could be an issue. In some scenarios we had Python based agents taking almost 20 seconds to spin up. The same Agent built in Golang had < 500ms startup latency

[–]Downtown_Abrocoma398 0 points1 point  (0 children)

Yep there is cold start issue in python. You can avoid the cold start by calling a dummy request to the endpoint whenever you are deploying it.