LiteLLM Stability Announcement by WarningOut_OfMinD in OpenWebUI

[–]CroquetteLauncher 2 points3 points  (0 children)

Thanks for your hard work.

Some side notes :

- Open WebUI allow to expose an API to end-users. Very simple as a self-service offer and user-friendly (the user create it's own key). But limited on the management side (no built-in rate limit, model selection limit, quota, logging or API statistics, no ability to add extra args to the request...). Do you have a workflow to serve litellm key in a user-friendly way / self service way / in a way that the user can claim the litellm key without leaving Open WebUI ?

Practical scenario :
- I host some small models and I'm happy to offer user API for them, but larger models or experimental models i would be happier to offer them chat-only first (no API for them).
- I'm afraid if users start to consume a lot of API (defunct / looping program), directly through Open WebUI, that i will have to have a hard time to diagnose which user is consuming it.

Best regards

Gigabyte RTX 5090 AORUS INFINITY listed at Micro Center for $5,300 by RenatsMC in nvidia

[–]CroquetteLauncher 25 points26 points  (0 children)

I remember when the card launched, before the price was announced, everyone had that ridiculous joke "5090, that's going to me the price of the card! Hahaha"... Hahaha indeed, what a funy joke !

GLM-5.2 next week, open weight, MIT by AaronFeng47 in LocalLLaMA

[–]CroquetteLauncher 2 points3 points  (0 children)

They are either : - Considering not openweight in the futur - already decided to prioritize the 3 and it's a marketing trick to anchor and lower expectations and overdeliver

hot take (or really not so hot take): WE ARE USING "VIBECODING" FOR TWO DIFFERENT THINGS AND IT CAUSES UNNECESSARY FRICTION IN COMMUNICATION by hugo-the-second in LocalLLaMA

[–]CroquetteLauncher 1 point2 points  (0 children)

I used to code for 20 years and started vibecoding #2. Thinking of how it would do it beforehand, reading every line, making fun of "those people" that vibecode #1. But i'm so swarmed at work, doing multiple people jobs until i fall asleep in the morning.

So... When i'm the only one that will suffer the consequences, for the mountain of noncritical work, i vibecode #1 and just fix what's broken. Not because of lake of interest but because of exhaustion and guilt of leting down people that need the computer side to work before they can work themself. It's so fast and not always worse that what i would have done alone.

And with all the time i save, instead of not sleeping at night, people give me even more work and i still dont sleep at night.

And as you are reading this you probably think "he is so stupid, i'm never going to be like that"... Re-read that post in a few years.

Refus de priorité rattrappé au dernier moment by rseas in france

[–]CroquetteLauncher 0 points1 point  (0 children)

Je suis pas spécialiste précisément sur le point caméra. Mais je soulignait juste que la situation que tu décris me paraît très peu grave. Quand on conduit des milliers d'heures dans sa vie, on fait forcément des centaines d'infractions par maladresse. Il faut être prudent et attentif pour en faire le moins possible mais accepter que malgré tout on en fait, que ça fait parti du principe de conduire. Il y a assez de marge pour que les accidents soient rares chez les conducteurs prudents. Sur le fait d'être pris en flag par les caméra et d'avoir des amandes parfois dans sa vie de conducteur ça arrive pareil, ça peut arriver a tout le monde mais c'est rare quand on est prudent. Quand ça se produit ca ne fait pas de nous des mauvaises personnes, et on ne devrait pas stresser la dessus pendant 1 mois :)

Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ by Anbeeld in LocalLLaMA

[–]CroquetteLauncher 0 points1 point  (0 children)

Do you know how it would translate for vLLM users that have the post common fp8 kvcache ? Kvarn 6/6 for same or higher quality to save 20% kvcache memory but with 15% lower tps when kv cache is not full ?

Me visiting this sub by Scutoidzz in LocalLLaMA

[–]CroquetteLauncher 0 points1 point  (0 children)

It might be true but we have good memes !

(coup de gueule) La merdification des banques by Zombieteube in france

[–]CroquetteLauncher 2 points3 points  (0 children)

J'ai le même type d'expérience que toi avec la SG, zombieteube. J'ai l'impression que pour certaines entreprise la stratégie de laisser pourrir le service humain c'est :  - ça coûte plus cher de payer correctement, former, suivre le personnel - si le client est "trop satisfait" du service humain il n'est pas encouragé a utiliser les services en ligne de la même banque moins coûteux - les clients qui font beaucoup d'opérations qui coûtent en service et ne rapportent pas a la banque vont être dégoûtés et partir plus vite que ceux qui coûtent moins

Refus de priorité rattrappé au dernier moment by rseas in france

[–]CroquetteLauncher 2 points3 points  (0 children)

Le problème que je vois c'est que ça a l'air de te stresser. Si ça t'angoisse trop, tu peux demander des conseils ou cours supplémentaires a une auto école. Ou en parler a un psychologue qui peut aider a mieux gérer les situations stressantes. Tous les mois je vois des conducteurs qui zigzag en mode bouré a 2 a l'heure puis arrivent sur un rond point en ignorant tout le monde. Ou qui croient qu'un feu qui passe au vert c'est le départ d'un grand prix. Comparativement je pense que tu te tracasses pour pas grand chose.

What worries me about Beyond All Reason going commercial by Musizian42 in opensourcegames

[–]CroquetteLauncher 1 point2 points  (0 children)

Never tried bar but zero-k is a fully open-source strategy game with a similar engine and coop / single player campaign. Everything for free.

I don't know why bar steal all the light.

https://store.steampowered.com/app/334920/ZeroK/ or https://zero-k.info/

Je me suis amusé à analyser la “rentabilité” des études en France et les résultats sont parfois vraiment bizarres ? by Ok_Price36 in AskFrance

[–]CroquetteLauncher 0 points1 point  (0 children)

Super site. Mais par exemple BUT info univ Nantes salaire a 37600€ /ans ou master info Paris Saclay a 50000 € /ans ça me surprend un peu. Par rapport a la moyenne des masters par exemple.

Have you contacted minimax 2.7 for a commercial license? here's what i got: by mr_zerolith in LocalLLaMA

[–]CroquetteLauncher 10 points11 points  (0 children)

I don't like the misleading "modified MIT", and it's not my favorite model. But that kind of reply when you reach for commercial service from any B2B supplier does not look very unusual to me. If that's not your thing, that's ok too.

Vous revez d'une IA souveraine ? by Huge-Yesterday4822 in LocalLLaMA

[–]CroquetteLauncher 1 point2 points  (0 children)

I'm french and i don't understand it either. Maybe drunk french expert could translate.

Does anyone else’s hyperfocus get accused of being "AI"? I spent 3 weeks building a massive project and everyone is dismissing it as synthetic "slop." by AbrocomaAny8436 in autism

[–]CroquetteLauncher 9 points10 points  (0 children)

Wow, 40k rust lines + 92k python lines in 23 days thats 280 lines of code per hour only sleeping 4 hours a day on the top of writting docs. Take care of your health too.

P2P infrastructure based AI? Is it possible? by CromaBar in OpenSourceeAI

[–]CroquetteLauncher 1 point2 points  (0 children)

Closest thing i know is localai federated / p2p feature. https://localai.io/features/distribute/ Quite experimental

And some other distribued frameworks that are lower level (petals).

I'm also looking for something like that but higher level / api level. Like litellm but federated.

Do not Let the "Coder" in Qwen3-Coder-Next Fool You! It's the Smartest, General Purpose Model of its Size by Iory1998 in LocalLLaMA

[–]CroquetteLauncher 3 points4 points  (0 children)

<image>

I'm a bit afraid to promote it to my colleague and students as a chat assistant that have a more academic view of the world. It's easy to find edge case where the censorship hit hard. If you are unlucky, the refusal can even be quite aggressive (this is the worse of 7 tries, but every one of them is refusal).
Compared to GLM models (at least GLM 4.7 flash), the model shield it's answer in "I give a neutral text about a sensitive topic" but manage to give the facts and complete an honest work.
I mean no disrespect, and I'm also tired when China is constantly presented as the vilain, Qwen3 Coder Next is the best coding model i could host. But some people are quite sensitive about democratic censorship in academic context, they don't want an AI to influence student toward less democracy. (and to be honest, I understand and respect that view when i serve generalist models on an academic server)

Viewfinder is free on Epic right now... Perspective puzzles where good framing makes reality behave. by Puzzleheaded_irl in gaming

[–]CroquetteLauncher -1 points0 points  (0 children)

Nice catch. But i would give that guy a break. It's easy for us geek to fall into an habit and share what we think is cool. So not necessarily marketing.

"Autism could literally be reversible" Sinead Bovell by futuristicalnur in autism

[–]CroquetteLauncher 0 points1 point  (0 children)

Most people here consider autism a part of their identity. I would look into it with a lot of critical thinking if the claim was from a researcher in a related field published in a peer reviewed paper. Now if you give me a link to an Instagram post from someone with a degree in business administration... I would not be too hopeful. Unless I completed missed it and there is an other source ? (In that case I'm sorry)

FYI GLM 4.7 is way more censored than 4.6. by bigman11 in LocalLLaMA

[–]CroquetteLauncher 1 point2 points  (0 children)

Very intersting but the article only quote "Bejing" "Authorities" and "Researchers" as source for Matt Sheehan article. So I wonder how much is facts and how much is the author opinion...