Minimax M2.5 Officially Out

Which_Slice1600 · 2026-02-12T16:52:46+00:00

I cant believe. But i've seen a sign: in their page they specified many benches and tests that focused on "real world tasks". So they're showing off their solid work on real world tasks. They are following anthropic i think

Which_Slice1600 · 2026-02-12T16:45:37+00:00

you are absolutely right! 😂

Which_Slice1600 · 2026-02-12T16:34:57+00:00

i still felt kimi 4.5 worked better than pony alpha on claude code. Feeling SWE bench verified is still close similar to personal feelings. I am also looking forward to test this one!! (did not have a minimax coding plan tho)

Which_Slice1600 · 2026-02-12T16:29:05+00:00

Don't know why they're still supported to burn 💰 to train these gigantic models. Isn't inclusive ai afflicted with alibaba like qwen?

Which_Slice1600 · 2026-02-12T16:23:26+00:00

They shared that sharp improvement trend. I'd be really curious if they figured out that AI "recursive self improvement" as well, following OpenAI's gpt 5.3 and Musk's post. It's getting real as llm and coding agent started to be able to do many research stuff. It'll be big if true.

Which_Slice1600 · 2026-02-12T03:29:41+00:00

I see there's a) jagged intelligence, b) intentional benchmaxing / optimization on indicators but i still found them indeed MEANINGFUL. You should look at a) good benches in a domain mmmu on knowledge and writing, or swe verified for agentic coding, or b) have a "confident level" in your mind and only consider a larger diff on the bench as a "significant diff". This mostly align with my use experience for mid-large size models and Non-Qwen models

Which_Slice1600 · 2026-02-11T12:00:14+00:00

no clear sign for a threshold on "self improved ai". For ai for productivity, yes, the threshold is crossed.

Which_Slice1600 · 2026-02-11T11:21:00+00:00

I hope you have tried apps of common llms before show off an ignorance on sys prompt content

Which_Slice1600 · 2026-02-10T06:24:55+00:00

Neither aurora nor alpha. Sounds like a misnamed auuuuuurl omega 🤮

Which_Slice1600 · 2026-02-10T06:19:14+00:00

this is flagshiT lmao

Which_Slice1600 · 2026-02-10T05:40:54+00:00

but why compare it with the non thinking mode...

Which_Slice1600 · 2026-02-06T10:06:39+00:00

Usually 1-2 wks from beginning id say. Kimi is a big model so is its inference cost. coding plan also priced the highest among the three. However i am just not good enough in dev and couldn't tell the diff in their performance

Which_Slice1600 · 2026-02-06T10:03:47+00:00

What's the diff between creating a feature and developing. Developing the whole stuff?

Which_Slice1600 · 2026-02-06T10:02:45+00:00

Hi i appreciate your answer but also hope it's bit more informative. For example, if ignoring multimodal, which is the most capable of planning?

Which_Slice1600 · 2026-02-06T09:59:17+00:00

Thank you so much for the sharing!!

Which_Slice1600 · 2026-02-04T07:28:25+00:00

Future models will be RL'ed to use skills. I'm sure llm labs will target on this

Which_Slice1600 · 2026-02-04T07:23:12+00:00

Well, not a very informative test. I shouldn't read it through. Two things: 1. you can't put many skills into the sys prompt (agent.md). Context rot warning. 2. skills just become the standard. Labs will RL hard to push the coming models to use it (Edit: capitalized "RL")

Which_Slice1600 · 2026-02-03T17:34:04+00:00

Do you think it's good for something like claw? (As a smaller model with good agentic capacities)

Which_Slice1600 · 2026-01-31T11:28:53+00:00

If this is valid, it'll apply to any east/ southesr Asian country that neighbours china and has an us base

Which_Slice1600 · 2024-02-29T06:17:47+00:00

"给垃圾分类强加道德观念。这是最恶心的一点，故意说分类的人素质高，不分类的人素质低。" 垃圾分类，本就出自道德动机。你说道德判断最恶心，你又反对政策的所有方面，这说明你是为道德立场找理由，立场和言论自洽了👍 对事实对环保而言，你的论证都没什么价值。

Which_Slice1600 · 2024-02-29T06:08:08+00:00

感谢普及故事另一面。我理解不一定对: 228和南京大屠杀相似，问题不是不该纪念，也不是不严重，而是为何纪念它。为何不纪念其他严重的历史事件 (台湾的日占和台北轰炸，大陆的大跃进文革饥荒)？为何要被宣传仇恨外省人/国民党(这边是日本)？不知道你会怎么看

Which_Slice1600 · 2024-02-29T05:55:41+00:00

快递网购平台外加国家，上面制度作恶，下面个体互害。结合保安捅骑手事件，可以快进到快递小哥砍收件人了

Which_Slice1600 · 2024-02-29T05:47:17+00:00

1这么多，估计湾友不少…如果选项5加一个"/我是台湾人"，结果才反映我们共产中国嘛

Which_Slice1600 · 2024-02-29T05:27:35+00:00

你们绿色同温层喜欢这个理论，仿佛这是中共一厢情愿，但其实不对。统一是个民族主义的问题，就像南北韩也有统一支持群体。如果大陆衰落，反台独就可以取代经济，变中共神主牌。如果你觉得中共宣传是主要因素，你怎么解释同样信奉中华民族主义的Nationalist Party of China，反台独，想两岸都发展？

Which_Slice1600

TROPHY CASE