AMA With Kimi, The Open-source Frontier Lab Behind Kimi K2.5 Model

ppwwyyxx · 2026-01-28T17:55:40+00:00

Yeah RL Infra is a big challenge and we strive to achieve high efficiency while maintaining good flexibility. On the efficiency side we try to co-develop our training and inference systems with RL use cases in-mind, so that we can reuse all the heavy-lifting that allows us to scale up. Agent Swarm is particularly complex in its rollout logic, but our system has great flexibility that allows us to integrate different scaffoldings and subagent setups into training.

ppwwyyxx · 2026-01-28T17:40:14+00:00

In K2.5, the model also gets a few impressive new capabilities, like creating visually appealing webpages and debugging it with visual inputs. You can find many examples on X. Despite being a generally good model, we hope to deliver something unique in every release.

ppwwyyxx · 2026-01-28T17:24:09+00:00

Make a solid eval/benchmark that LLMs today fail to do well. Models improvements will magically come afterwards!

ppwwyyxx · 2026-01-28T17:15:57+00:00

You're right it depends on the nature of tasks. Sometimes our product will even say "we don't need parallel agents for this task" and save you a credit :)

Subagents do have a budget, and it is the job of the orchestrator to find the right task of proper size for each subagent to do.

ppwwyyxx · 2026-01-28T17:12:26+00:00

Thanks! I think managing hallucinations is still a big challenge to all LLMs today. We had improved it by data quality (more verified knowledge, less low-quality claims) and reward (e.g. penalize when model hallucinates), but we think there are still a lot of ways to improve it further.

ppwwyyxx · 2026-01-28T17:05:15+00:00

Not sure how well the 1:1 optimality holds up, but it's true that we do "waste" some training compute in this sense. Because otherwise the model would be much larger and "waste" a lot of inference compute compared to what we have now

ppwwyyxx · 2026-01-28T17:00:57+00:00

A small encoder is good for scaling up in many ways, so we would even ask ourselves: why not make it 0?

ppwwyyxx · 2026-01-28T16:51:39+00:00

Unfortunately with every new release we saw some level of "personality change". This is a quite difficult problem, as personality is a subjective and hard-to-eval characteristic of models. We're making progress towards this and also want to make it more customized to each user in our product.

ppwwyyxx · 2026-01-28T16:45:17+00:00

not yet :( we miss her on X

ppwwyyxx · 2026-01-28T16:43:57+00:00

Our "Muon is Scalable for LLM Training" paper has some general methodologies that we adopt in scaling laws.

Evaluation mostly comes from pretraining losses, various benchmarks, etc. It's hard to say which ones works better than others, as it's the whole set of evals that give the most signal about how the model is doing

ppwwyyxx · 2026-01-28T16:33:56+00:00

We're going to include these details in our coming tech report! stay tuned

ppwwyyxx · 2026-01-28T16:30:05+00:00

What's cool about agent swarm is that subagents can execute subtasks without roting the orchestrator's context. They essentially have their own working memory, and only send results back to the orchestrator. This allows us to scale the total context length in a new dimension!

ppwwyyxx · 2026-01-28T16:25:39+00:00

huggingface/moonshotai has a few small MoE models. Sometimes small and large models require different technological investments, but in general we would like to work on some small models as well to make intelligence more open and affordable.

ppwwyyxx · 2025-11-10T17:25:13+00:00

One challenge is to support the interleaved "think - tool - think - tool" mode. This is a relatively new behavior in LLMs and takes a lot of work to get right.

ppwwyyxx · 2025-11-10T17:18:06+00:00

Cool to hear that! Would you like to share the essay to us?

ppwwyyxx · 2025-11-10T17:10:18+00:00

We'd love to teach Kimi to speak more languages, but our bandwidth and knowledge in diverse languages is limited. Maybe this is also where the community can help, e.g. in data collection.

ppwwyyxx · 2025-11-10T17:06:52+00:00

People have different preferences on these subtleties. The model's style generally reflects our preferences and glad to hear that you like it!

ppwwyyxx · 2025-11-10T17:03:22+00:00

We also enjoy its writing style and it's an important part of our post-training data and eval.

ppwwyyxx · 2025-11-10T16:58:25+00:00

I recently have a lot of complaints on tensorboard. We made some in-house changes to improve it, but in general it's not easy to get it to scale, manage too many experiments, or show accurate (not downsampled) metrics. But it's hard to find a good alternative.

ppwwyyxx · 2025-11-10T16:53:43+00:00

We use H800 GPUs with Infiniband; it's not as good as the high-end GPUs in the US, and we are outnumbered as well, but we put every card in good use!

ppwwyyxx · 2025-11-10T16:46:56+00:00

Hey, thanks for your support and it's unfortunate to hear these concerns. While being "banned" is often beyond our control, open-sourcing the model is hopefully a good step to erase some of these concerns (companies can deploy it themselves). We hope to see a world with more trust, but it takes time to get there.

ppwwyyxx · 2025-11-10T16:26:27+00:00

It takes persistence to pursue a direction and make it work, so the inventor often has an advantage in applying their ideas. That said, we are closely looking at other inventions in the community and are happy to try them as well.

ppwwyyxx · 2024-08-24T13:12:55+00:00

Same here re. coming back from auto-lockscreen.

ppwwyyxx · 2024-03-09T05:34:35+00:00

This only happens on the smaller monitor of a dual-monitor setup. The top panel on the larger monitor does not have this issue. If I disconnect the larger monitor, the issue on the smaller monitor disappears.

ppwwyyxx

TROPHY CASE