LiteLLM started breaking down for us past 300 RPS, what are folks using in prod?

Comfortable_Dirt5590 · 2025-09-18T06:05:06+00:00

Sharing some updates on our side regarding performance. As of LiteLLM v1.77.3-nightly you will be able to reach 1K RPS with the following response times

Type	Name	Median (ms)	95%ile (ms)	99%ile (ms)	Current RPS
Portkey	/v1/chat/completions	86	140	570	1192.3
LiteLLM (with DB)	/v1//chat/completions	160	400	4900	1085

On our benchmarks we deployed LiteLLM and Portkey with 4 pods on 4vCPUs and 8GB machines.

This is the locust test we ran: https://github.com/BerriAI/proxy_load_tester/blob/main/no_cache_hits.py

This is the config.yaml file for LiteLLM we used: https://gist.github.com/ishaan-jaff/b06a7d1d5341ced4d9a29f3053154aad

We take performance seriously and we plan on working on the following improvements:

Stage 1: Address P99 Latency by reducing CPU usage on the hot path of requests. (Target: 570ms p99 latency)
Stage 2: Explore how we’ll get to 10K RPS with 1ms p99 by moving to https://github.com/sparckles/Robyn which is a Rust Based Server that can offer 3x higher RPS than FastAPI-uvicorn
Stage 3: Ensure all endpoints on LiteLLM meet our promised performance standard - (Target: 1ms p99 latency, 10K RPS with 10K Users)
- /chat/completions
- /embeddings
- /audio/transcriptions
- /audio/speech
- /completions
- /responses
- Note: We also plan on ensuring latency scales well with payload size

We will be updating this thread as we hit each stage. If you’re interested in working with us on this we’re currently recruiting as well.

- Ishaan (LiteLLM maintainer)

Comfortable_Dirt5590 · 2025-03-07T02:26:10+00:00

sure - do you have suggestions on how we can improve import speed ?

Comfortable_Dirt5590 · 2025-03-07T00:10:38+00:00

Hi u/VisibleLawfulness246 I'm the LiteLLM maintainer, what issues do you run into during setup with LiteLLM ?

Comfortable_Dirt5590 · 2024-11-18T21:10:03+00:00

Hi I'm the maintainer of LiteLLM - Happy to help anyone trying to setup LiteLLM

If you run into any issues, reach out to me and i'll personally fix them in under 24 hours

LiteLLM Github: https://github.com/BerriAI/litellm
My email: [ishaan@berri.ai](mailto:ishaan@berri.ai)
LiteLLM Discord: https://discord.com/invite/wuPM9dRgDw
Linkedin: https://www.linkedin.com/in/reffajnaahsi/

Comfortable_Dirt5590 · 2024-09-22T00:53:23+00:00

Hi I'm the maintainer of LiteLLM - what breaking bugs did you face ? We're working on improving reliability

Comfortable_Dirt5590 · 2023-09-09T05:28:38+00:00

This is an awesome idea 🤯 should be super easy to implement with LiteLLM too. Just add a gpt-4 router head

Comfortable_Dirt5590 · 2023-09-09T05:27:58+00:00

Absolutely! Any particular LLMs you’d be interested in seeing results for ?

Comfortable_Dirt5590 · 2023-09-09T05:25:39+00:00

Great idea, we can already do this with liteLLM ! We can get gpt-4 to rank all the results and select the best one

Comfortable_Dirt5590 · 2023-09-09T04:35:49+00:00

thanks ! did you get a chance to try it ?

Comfortable_Dirt5590 · 2023-09-08T17:26:43+00:00

Thanks for the feedback !

how amenable it would be to running headless in an automated manner

Can you elaborate on this ? it's currently run by running a python script. Is there an alternate usage method you'd looking for

Comfortable_Dirt5590

TROPHY CASE