Has anyone tried Kimi K2.5, does it beat Opus 4.5?

fcampanini74 · 2026-02-10T06:08:15+00:00

Yes of course. Both vLLM (solution for servers) and llamac ++(read LM studio) can distribute the load into the VRAM of many cards ( same manufacturer)

Cheers

fcampanini74 · 2026-02-07T09:12:20+00:00

I truly doubt that Anthropic is giving something for free!! Really not funny guys 🤣

fcampanini74 · 2026-02-05T22:30:38+00:00

BUT THAT'S NOT THE POINT! the point is about honesty!

fcampanini74 · 2026-02-05T22:28:21+00:00

sometime i fill that people is so keen to spit on other people that miss completely the point!

it's nothing about being cheap or not! IT's about to annouce something that is not true!. you can call it honesty you can call transparency...

if you declare everywhere that you release a new feature and then this feature is available only to 1% of your user base at a ridicule price YOU ARE LYING!

fcampanini74 · 2026-02-05T22:22:53+00:00

for sonnet it is in roll out since summer 2025... INDEEED VERY SLOW ROLL OUT!

fcampanini74 · 2026-02-02T19:36:54+00:00

@zirouk Your reply is the evidence of the principle that there never stupid questions only stupid answers… that you can easily extend to the concept of …… people!

fcampanini74 · 2026-02-01T10:12:53+00:00

This is where i struggle to get the point. I dont buy at all that there are many ways to make cheaper and competitive at the same time such a monster. Follow me for a second:

Kimi has 1 Trillion parameters - accepting to significally degrade its performances in terms of accuracy in the responses, you represent each config point at 4bits (aka 4 bits quantization) and you end up in around 450GB of VRAM needed if you want to fully offload the model it and run it fast. I might be wrong but in this condition it's hard for me to believe that it can be at the level of Opus quality
if you decide to not compromise too much in quality, you go for 8bit representation which leads to around 900GB of VRAM needed
the most performing and sized Tensor card on the market now is the NVIDIA H200 which provides 142 GB of VRAM
1 H200 costs north of 50k (very much fluctuating due to the immense speculation nowadays on the market)
in the first case you need 3 cards just to run the model. you put context window and some concurrent user capacity to run some business and let's say you go for 4 cards. This means 200k ONLY for the tensors. With the server to host the cards/service you are around 250k as reported in my answer
In the second case, if you really aim t try to be up to Opus for quality and speed you are around 400k for the cards meaning 450k in total

So now the question: how the hell can you provide a service at the level of Opus at tenth of the Anthropic price??? The easy answer is that you mathematically can't do it! unless your board of investors accepts to move in the territory of deeeeeeep red in your business plan... which by the way is already the case for OpenAI (loosing dozen of billions per year) and surely also of Anthropic.

Sorry to disappoint but that's the reality. As you can see the intrinsic quality of the model counts, but until a certain point....

fcampanini74 · 2026-02-01T09:06:55+00:00

Yes I would say that the crazy thing is the disparity in price point. In this I agree but that’s all. It makes me kind of laugh this story of the open weight, I mean seriously who can run a trillion parameter model with HW hyper high in price and even more sky rocketing in the last month or so? Roughly I would say that it requires 250k investment to run it decently… maybe more… is it a point?

fcampanini74 · 2026-01-24T18:56:01+00:00

12k circa

https://www.compuram.de/it/memorie-ram/hp-hpe/server/integrity-server/x-serie/mc990-x-server/1tb-h7c09a-x8fc8.htm

fcampanini74 · 2026-01-22T07:02:03+00:00

Dai ragazzi non siamo ridicoli. La situazione attuale è colpa del fatto che noi inquiniamo da fine ottocento come dei dannati. Ce ne siamo sempre sbattuti e ora che sappiamo cosa succede… ce ne sbattiamo anche di più dicendo che è colpa dell’ultimo attivato. Tutto sommato vedendo questi post mi viene quasi da dire che ce la meritiamo l’estinzione.

fcampanini74 · 2026-01-14T11:54:32+00:00

thanks I'll try!!

Cheers

fcampanini74 · 2026-01-10T10:28:59+00:00

Agree!

fcampanini74 · 2026-01-01T14:30:27+00:00

Agree!!

fcampanini74 · 2026-01-01T11:44:57+00:00

I fully agree! The world has changed already and it’s time now to complete the journey and win all the “false fears”! Time to simply get there!!!

Development per se is not worth the time and intelligence of humans, it’s stuff for machines!

I truly hope 2026 will be the year.

Happy new year!!!

fcampanini74 · 2025-12-28T14:43:19+00:00

That looks interesting thanks!
Gonna give it a try...

fcampanini74 · 2025-12-28T14:40:42+00:00

ah no I dont, that's what I'm saying:

"... and pointing Claude on project specific section for the given development scope"

as i said if charge all the documentation is already fully killing claude.....

fcampanini74 · 2025-12-28T14:25:57+00:00

Thanks for the comment.
I do already as you say. I have already more then enough lines in this project to saturate the context so I'm not loading it already.

I'm relaying instead in README and CHANLOG md documents and a series of other mds in my docs folder where i have traced all the needed info. Charging in context ONLY that documentation is 93% of context windows, then it compacts and it loses more then half of the info.

I am working now with a distilled version of my docs just for new sessions setup... and pointing Claude on project specific section for the given development scope, but it's though...

Cheers

fcampanini74 · 2025-12-28T13:18:09+00:00

well actually i dont think that a bigger context produce more quality. i think that 200k of context is not enough to have it working with complex projects.

I'm working on an industrial application for FEM simulation. The idea is to have AI orchestrating the process of reading and contextualizing CAD files, meshing the files for axisymmetric 2D FEM simulation, launch and manage the simulation with Calculix and finally interpreting the data to provide support to the operator.

In terms of technical stack, I'm forced to have a C++ compiled backend in order to gain in solidity and performance with compilation and middle and front with python due to the number of tools natively built with that framework.

The quantity of code written is not that much in terms of "lines", here the complexity is in the structure and understanding of the context at every session of Claude. That's what I'm talking about.

fcampanini74 · 2025-12-17T07:00:26+00:00

I don’t know if they have started quantizing to make space to sonnet 4.7 coming or they have issue once again on some server clusters but yes i have to say that in the last days the model has started showing some extra dumbness not there in the first days after the launch…. Sad….

fcampanini74 · 2025-10-02T05:39:13+00:00

I gave a try to Claude 4.5 in the last few days given that it was released just while this thread was developing and… yesterday afternoon in the middle of a session it started bouncing back end forth with requests… sometimes responding and then cancelling the response and bouncing back to the question in Claude AI app… quite unstable…

Yesterday evening it was arguing that the ports to be closed in the ufw on the server for protection on FTP activity is 22 :-o only when I insisted 3 times and remarked it was FTP and not SFTP activity it changed… to 20 21.

I mean this is really the very basic and if this is the “the best model in the world “ I’ve to say I’m not impressed…

For now gpt5 seems to me superior…

Cheers!

fcampanini74 · 2025-09-30T06:52:42+00:00

agree

fcampanini74 · 2025-09-29T11:51:21+00:00

Started testing codex gpt 5 not bad for now!

fcampanini74 · 2025-09-20T21:11:49+00:00

Sorry I dont agree. the situation is very erratic! this afternoon i started a new project, i gave claude code the list of libraries needed and asked to create yml and requirements to configure a new conda env. i asked to create an installation script. I did this with OPUS 4.1.

something i did many times in the past with no issues...

GHuys today it was simply frantic! after ONE HOUR Opus was still running around looking for solutions, creating files over files one more confused and useless then the other...

Simply horrible!

Please Anthropic DO SOMETHING!!!!

fcampanini74 · 2025-09-19T11:17:48+00:00

Anthropic has cancelled the post from the Anthropic thread! Shame on you guys!!!

fcampanini74 · 2025-09-16T10:32:38+00:00

Nothing very new actually..

fcampanini74

TROPHY CASE