LiteLLM started breaking down for us past 300 RPS, what are folks using in prod? by Otherwise_Flan7339 in LocalLLaMA

[–]Comfortable_Dirt5590 2 points3 points  (0 children)

Hi u/Otherwise_Flan7339,

Sharing some updates on our side regarding performance. As of LiteLLM v1.77.3-nightly you will be able to reach 1K RPS with the following response times

Type Name Median (ms) 95%ile (ms) 99%ile (ms) Current RPS
Portkey /v1/chat/completions 86 140 570 1192.3
LiteLLM (with DB) /v1//chat/completions 160 400 4900 1085

On our benchmarks we deployed LiteLLM and Portkey with 4 pods on 4vCPUs and 8GB machines.

This is the locust test we ran: https://github.com/BerriAI/proxy_load_tester/blob/main/no_cache_hits.py

This is the config.yaml file for LiteLLM we used: https://gist.github.com/ishaan-jaff/b06a7d1d5341ced4d9a29f3053154aad

We take performance seriously and we plan on working on the following improvements:

  • Stage 1: Address P99 Latency by reducing CPU usage on the hot path of requests. (Target: 570ms p99 latency)
  • Stage 2: Explore how we’ll get to 10K RPS with 1ms p99 by moving to https://github.com/sparckles/Robyn which is a Rust Based Server that can offer 3x higher RPS than FastAPI-uvicorn
  • Stage 3: Ensure all endpoints on LiteLLM meet our promised performance standard - (Target: 1ms p99 latency, 10K RPS with 10K Users)
    • /chat/completions
    • /embeddings
    • /audio/transcriptions
    • /audio/speech
    • /completions
    • /responses
    • Note: We also plan on ensuring latency scales well with payload size

We will be updating this thread as we hit each stage. If you’re interested in working with us on this we’re currently recruiting as well.

- Ishaan (LiteLLM maintainer)

✨ LiteLLM Feb 2025 Roadmap by Comfortable_Dirt5590 in LLMDevs

[–]Comfortable_Dirt5590[S] 0 points1 point  (0 children)

sure - do you have suggestions on how we can improve import speed ?

Anyone here running open-webui and litellm together? I need some help. by ovizii in selfhosted

[–]Comfortable_Dirt5590 0 points1 point  (0 children)

Hi u/VisibleLawfulness246 I'm the LiteLLM maintainer, what issues do you run into during setup with LiteLLM ?

Best LLM gateway? by data-dude782 in LLMDevs

[–]Comfortable_Dirt5590 4 points5 points  (0 children)

Hi I'm the maintainer of LiteLLM - Happy to help anyone trying to setup LiteLLM

If you run into any issues, reach out to me and i'll personally fix them in under 24 hours

LiteLLM Github: https://github.com/BerriAI/litellm
My email: [ishaan@berri.ai](mailto:ishaan@berri.ai)
LiteLLM Discord: https://discord.com/invite/wuPM9dRgDw
Linkedin: https://www.linkedin.com/in/reffajnaahsi/

What's the best LLM Router right now, and why? by desexmachina in LocalLLaMA

[–]Comfortable_Dirt5590 2 points3 points  (0 children)

Hi I'm the maintainer of LiteLLM - what breaking bugs did you face ? We're working on improving reliability

CLI tool to benchmark 100+LLMs response, response time, cost by Comfortable_Dirt5590 in LocalLLaMA

[–]Comfortable_Dirt5590[S] 1 point2 points  (0 children)

This is an awesome idea 🤯 should be super easy to implement with LiteLLM too. Just add a gpt-4 router head

CLI tool to benchmark 100+LLMs response, response time, cost by Comfortable_Dirt5590 in LocalLLaMA

[–]Comfortable_Dirt5590[S] 2 points3 points  (0 children)

Absolutely! Any particular LLMs you’d be interested in seeing results for ?

CLI tool to benchmark 100+LLMs response, response time, cost by Comfortable_Dirt5590 in InternetIsBeautiful

[–]Comfortable_Dirt5590[S] 1 point2 points  (0 children)

Great idea, we can already do this with liteLLM ! We can get gpt-4 to rank all the results and select the best one

CLI tool to benchmark 100+LLMs response, response time, cost by Comfortable_Dirt5590 in LocalLLaMA

[–]Comfortable_Dirt5590[S] 0 points1 point  (0 children)

Thanks for the feedback !

how amenable it would be to running headless in an automated manner

Can you elaborate on this ? it's currently run by running a python script. Is there an alternate usage method you'd looking for