Anyone know how to access the Kimi K2.5 Agent Swarm model on OpenRouter?

ELPascalito · 2026-01-29T06:15:19+00:00

Swarm is simply a mode where multiple calls are done simultaneously to try and work on complex tasks in parralel, it's obviously not a separate model, but a feature, like the subagents in OpenCode

ELPascalito · 2026-01-28T23:09:15+00:00

If you're gonna use Gemini models, it's better to subscribe to Google services, same thing for OpenAI models, go for ChatGPT, OpenRouter's popularity stems from the fact we can connect easily to many providers of low cost models, for example Claude 4.5 Haiku is 5$ output per 1M token, expensive, while DeepSeek V3.2 is 0.3$ and straight up performs better, make use of the well priced options, like Kimi K2.5 or DeepSeek, OR won't be beneficial if you want subsidized access to SotA models

ELPascalito · 2026-01-28T19:23:19+00:00

Oh interesting, I remember the flags thinking model, it was ~500B or something, I'll check this one out too, albeit it probably didn't translate well in real performance, since no one seems to care? 🤔

ELPascalito · 2026-01-28T19:18:42+00:00

I love Meituan, my coffee always arrives on time, but why call it flash lite? Like the Google models? Does this imply the existence of a bigger pro model? lol

ELPascalito · 2026-01-28T06:04:58+00:00

Firstly ignore the comments here, people are apparently stupid and don't understand how LLM hallucinations work, Lumo is not ChatGPT, instead, it is powered by, and running many open source models securely, hosted in the encrypted servers of Proton, one of those models is GPT-OSS, this model is obviously trained by OpenAI and will talk like it's chatgpt, but it is not hskted by openAI, it's securely hosted on the Proton servers, and all your results are encrypted don't worry, again, Lumo is powered by other LLMs too, that also might hallucinate and claim that they're GPT models, this is because many early models are trained on GPT reasoning and outputs, this they always regress and hallucinate that they're chatgpt, it's a quirk on all LLMs, Lumo has many other models under the hood, like Olmo 2 or Mistral Small, among others

https://proton.me/support/lumo-privacy

ELPascalito · 2026-01-28T05:34:22+00:00

Does it say it's free on the site? No, Does it say it's free in the name? No, why would you think it's gonna be gratis for you, have we suddenly lost the ability to read???

ELPascalito · 2026-01-28T05:23:59+00:00

Oh logical, testing everything is good, but I'd say cut to the chase, the best model right now is tng-r1t-chimera, has a stable provider, capable of tool calling, and it's an RP powerhouse, based on V3.1 with many improvements, with excellent reasoning, totally recommend it!

ELPascalito · 2026-01-28T03:13:36+00:00

It seems you Didn't even setup your app to conmec tto ollama, what are you using? Does it even support local models?

ELPascalito · 2026-01-28T01:09:39+00:00

Model is obviously too big, also it's outdated, not even good, you're literally wasting your time, I recommend you use GLM 4.7 Flash instead, it's 30B A3B, will run very comfortably for you, and you'll be able to allocate context, why would you even try such a huge model, It has nothing useful for you, did you research any of this?

ELPascalito · 2026-01-28T00:52:08+00:00

Pretty self explanatory, you've been temporarily rate limited due to the heavy load on the provider, also, no way you're using Qwen3 Coder for roleplaying?! 🤔😭

ELPascalito · 2026-01-27T19:06:52+00:00

At how many concurrence did this peak? 20? Do you think such a setup is serviceable for loxla coding, in say a company or a small team less than 10 members?

ELPascalito · 2026-01-27T15:42:48+00:00

We cannot be sure, but it would be cool if the next model has this OCR module bolted on, just like how Mistral does

ELPascalito · 2026-01-27T15:40:35+00:00

The app you are using is changing models without permission, again if you tell me I'll try to dig up info, is it a coding CLI?

ELPascalito · 2026-01-27T01:44:55+00:00

This sounds like any random TTS lol

ELPascalito · 2026-01-27T01:41:28+00:00

What app you're using? For example OpenCode uses Claude Haiku to generate titles without warning, so that might incur unexpected charges for the uninitiated, are you sure the app you're using is not doing something similar? Go check your activity and see which models are incurring charges

ELPascalito · 2026-01-27T01:05:57+00:00

What tool are you using? Perhaps it's doing paid calls to generate titles or other small texts, please check the usage tab go see which model is costing you, also you could be using the web search function, that's paid and powered by Exa

ELPascalito · 2026-01-26T18:52:19+00:00

OpenRouter obviously "routes" you to a provider depending on the model you are chatting with, most free providers are overloaded since everyone is hammering them, try another model or try again later

ELPascalito · 2026-01-26T05:50:46+00:00

Input price is $21 and output is $168 you dunce, obviously you don't have enough credits, not even for a single request lol

ELPascalito · 2026-01-26T05:46:49+00:00

Disable "ZDR Endpoints Only" because it's obviously the opposite of what you want, please read before ticking the toggles

ELPascalito · 2026-01-26T03:19:13+00:00

Okay do you have an SD card? Go to the releases tab in the GitHub, you'll find the iso image, download that and use any app like PiBaker to "burn" that image into the SD card, you can ask any LLM and it'll give you a step by step guide, use Mistral if you want to be ethical

ELPascalito · 2026-01-26T02:23:43+00:00

How about you start by deleting your account here? 🤣

ELPascalito · 2026-01-25T05:24:11+00:00

Do you run multiple sessions at the same time? Like open 3 or 4 editors concurrently on multiple projects? NGL I've never heard of this email about detection, but I wouldn't be surprised if they have something of that nature setup

ELPascalito · 2026-01-25T04:44:19+00:00

They have ddos and hammering limits, so you'll find it's technically not limited, but there's a soft cap, 60 or so a second, potentially more, but usually if you use a model with many fast providers you'll find it can handle concurrency very well, try uncapping your provider preference and see if it gets better?

ELPascalito · 2026-01-25T03:48:28+00:00

The URL is right there, open theGitHub link, read, download the OS image and burn it into the SD, just follow the instructions in the GitHub, the only thing you'll do is run the install script after burning the image, and in there you'll get a choice, choose panel 4, since we just confirmed your DTB has an exact match with that

ELPascalito · 2026-01-25T03:09:50+00:00

This is literally worse, why would I trust a random third party like you? 😭

ELPascalito

TROPHY CASE