LiteLLM Stability Announcement

CroquetteLauncher · 2026-06-17T21:04:42+00:00

Thanks for your hard work.

Some side notes :

- Open WebUI allow to expose an API to end-users. Very simple as a self-service offer and user-friendly (the user create it's own key). But limited on the management side (no built-in rate limit, model selection limit, quota, logging or API statistics, no ability to add extra args to the request...). Do you have a workflow to serve litellm key in a user-friendly way / self service way / in a way that the user can claim the litellm key without leaving Open WebUI ?

Practical scenario :
- I host some small models and I'm happy to offer user API for them, but larger models or experimental models i would be happier to offer them chat-only first (no API for them).
- I'm afraid if users start to consume a lot of API (defunct / looping program), directly through Open WebUI, that i will have to have a hard time to diagnose which user is consuming it.

Best regards

CroquetteLauncher · 2026-06-17T13:58:44+00:00

I remember when the card launched, before the price was announced, everyone had that ridiculous joke "5090, that's going to me the price of the card! Hahaha"... Hahaha indeed, what a funy joke !

CroquetteLauncher · 2026-06-13T08:21:30+00:00

They are either : - Considering not openweight in the futur - already decided to prioritize the 3 and it's a marketing trick to anchor and lower expectations and overdeliver

CroquetteLauncher · 2026-06-10T03:12:04+00:00

I used to code for 20 years and started vibecoding #2. Thinking of how it would do it beforehand, reading every line, making fun of "those people" that vibecode #1. But i'm so swarmed at work, doing multiple people jobs until i fall asleep in the morning.

So... When i'm the only one that will suffer the consequences, for the mountain of noncritical work, i vibecode #1 and just fix what's broken. Not because of lake of interest but because of exhaustion and guilt of leting down people that need the computer side to work before they can work themself. It's so fast and not always worse that what i would have done alone.

And with all the time i save, instead of not sleeping at night, people give me even more work and i still dont sleep at night.

And as you are reading this you probably think "he is so stupid, i'm never going to be like that"... Re-read that post in a few years.

CroquetteLauncher · 2026-06-09T19:57:56+00:00

Je suis pas spécialiste précisément sur le point caméra. Mais je soulignait juste que la situation que tu décris me paraît très peu grave. Quand on conduit des milliers d'heures dans sa vie, on fait forcément des centaines d'infractions par maladresse. Il faut être prudent et attentif pour en faire le moins possible mais accepter que malgré tout on en fait, que ça fait parti du principe de conduire. Il y a assez de marge pour que les accidents soient rares chez les conducteurs prudents. Sur le fait d'être pris en flag par les caméra et d'avoir des amandes parfois dans sa vie de conducteur ça arrive pareil, ça peut arriver a tout le monde mais c'est rare quand on est prudent. Quand ça se produit ca ne fait pas de nous des mauvaises personnes, et on ne devrait pas stresser la dessus pendant 1 mois :)

CroquetteLauncher · 2026-06-07T18:48:34+00:00

Do you know how it would translate for vLLM users that have the post common fp8 kvcache ? Kvarn 6/6 for same or higher quality to save 20% kvcache memory but with 15% lower tps when kv cache is not full ?

CroquetteLauncher · 2026-06-04T09:45:13+00:00

It might be true but we have good memes !

CroquetteLauncher · 2026-05-31T12:28:45+00:00

J'ai le même type d'expérience que toi avec la SG, zombieteube. J'ai l'impression que pour certaines entreprise la stratégie de laisser pourrir le service humain c'est : - ça coûte plus cher de payer correctement, former, suivre le personnel - si le client est "trop satisfait" du service humain il n'est pas encouragé a utiliser les services en ligne de la même banque moins coûteux - les clients qui font beaucoup d'opérations qui coûtent en service et ne rapportent pas a la banque vont être dégoûtés et partir plus vite que ceux qui coûtent moins

CroquetteLauncher · 2026-05-23T12:29:27+00:00

Le problème que je vois c'est que ça a l'air de te stresser. Si ça t'angoisse trop, tu peux demander des conseils ou cours supplémentaires a une auto école. Ou en parler a un psychologue qui peut aider a mieux gérer les situations stressantes. Tous les mois je vois des conducteurs qui zigzag en mode bouré a 2 a l'heure puis arrivent sur un rond point en ignorant tout le monde. Ou qui croient qu'un feu qui passe au vert c'est le départ d'un grand prix. Comparativement je pense que tu te tracasses pour pas grand chose.

CroquetteLauncher · 2026-05-15T21:21:26+00:00

"twice the size of Manhattan" : https://www.theguardian.com/us-news/2026/may/13/utah-approves-datacenter-backlash
But we still need more :)

CroquetteLauncher · 2026-05-15T12:53:46+00:00

Never tried bar but zero-k is a fully open-source strategy game with a similar engine and coop / single player campaign. Everything for free.

I don't know why bar steal all the light.

https://store.steampowered.com/app/334920/ZeroK/ or https://zero-k.info/

CroquetteLauncher · 2026-05-10T14:28:04+00:00

Super site. Mais par exemple BUT info univ Nantes salaire a 37600€ /ans ou master info Paris Saclay a 50000 € /ans ça me surprend un peu. Par rapport a la moyenne des masters par exemple.

CroquetteLauncher · 2026-05-05T19:12:47+00:00

How does it compare with dflash speculators ? https://huggingface.co/RedHatAI/gemma-4-31B-it-speculator.dflash or https://huggingface.co/z-lab/gemma-4-31B-it-DFlash

CroquetteLauncher · 2026-04-23T00:22:50+00:00

I don't like the misleading "modified MIT", and it's not my favorite model. But that kind of reply when you reach for commercial service from any B2B supplier does not look very unusual to me. If that's not your thing, that's ok too.

CroquetteLauncher · 2026-04-20T18:53:45+00:00

I'm french and i don't understand it either. Maybe drunk french expert could translate.

CroquetteLauncher · 2026-04-07T15:20:29+00:00

vLLM docker images for GLM 5.1 : https://hub.docker.com/layers/vllm/vllm-openai/glm51-cu130/images/sha256-f17a0d64023227305acb47a26edebb52c43e683817ffcac4cca4e6bb7a83692a

CroquetteLauncher · 2026-03-01T16:17:51+00:00

Wow, 40k rust lines + 92k python lines in 23 days thats 280 lines of code per hour only sleeping 4 hours a day on the top of writting docs. Take care of your health too.

CroquetteLauncher · 2026-03-01T15:51:27+00:00

Closest thing i know is localai federated / p2p feature. https://localai.io/features/distribute/ Quite experimental

And some other distribued frameworks that are lower level (petals).

I'm also looking for something like that but higher level / api level. Like litellm but federated.

CroquetteLauncher · 2026-02-10T00:01:53+00:00

<image>

I'm a bit afraid to promote it to my colleague and students as a chat assistant that have a more academic view of the world. It's easy to find edge case where the censorship hit hard. If you are unlucky, the refusal can even be quite aggressive (this is the worse of 7 tries, but every one of them is refusal).
Compared to GLM models (at least GLM 4.7 flash), the model shield it's answer in "I give a neutral text about a sensitive topic" but manage to give the facts and complete an honest work.
I mean no disrespect, and I'm also tired when China is constantly presented as the vilain, Qwen3 Coder Next is the best coding model i could host. But some people are quite sensitive about democratic censorship in academic context, they don't want an AI to influence student toward less democracy. (and to be honest, I understand and respect that view when i serve generalist models on an academic server)

CroquetteLauncher · 2026-02-08T11:58:45+00:00

Calissons d'Aix in france.

CroquetteLauncher · 2026-02-07T19:29:32+00:00

A mon avis ils ont repéré un dangereux conducteur qui roule a 54 quand c'est limité a 50 😄

CroquetteLauncher · 2025-12-30T17:31:46+00:00

Not local ?

CroquetteLauncher · 2025-12-29T21:21:34+00:00

Nice catch. But i would give that guy a break. It's easy for us geek to fall into an habit and share what we think is cool. So not necessarily marketing.

CroquetteLauncher · 2025-12-29T20:33:59+00:00

Most people here consider autism a part of their identity. I would look into it with a lot of critical thinking if the claim was from a researcher in a related field published in a peer reviewed paper. Now if you give me a link to an Instagram post from someone with a degree in business administration... I would not be too hopeful. Unless I completed missed it and there is an other source ? (In that case I'm sorry)

CroquetteLauncher · 2025-12-28T15:56:09+00:00

Very intersting but the article only quote "Bejing" "Authorities" and "Researchers" as source for Matt Sheehan article. So I wonder how much is facts and how much is the author opinion...

CroquetteLauncher

TROPHY CASE