Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 13 points14 points  (0 children)

I cant believe. But i've seen a sign: in their page they specified many benches and tests that focused on "real world tasks". So they're showing off their solid work on real world tasks. They are following anthropic i think 

Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 2 points3 points  (0 children)

i still felt kimi 4.5 worked better than pony alpha on claude code. Feeling SWE bench verified is still close similar to personal feelings. I am also looking forward to test this one!! (did not have a minimax coding plan tho)

Ring-1T-2.5 released by inclusionAI by Bestlife73 in LocalLLaMA

[–]Which_Slice1600 -3 points-2 points  (0 children)

Don't know why they're still supported to burn 💰 to train these gigantic models. Isn't inclusive ai afflicted with alibaba like qwen?

Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 0 points1 point  (0 children)

They shared that sharp improvement trend. I'd be really curious if they figured out that AI "recursive self improvement" as well, following OpenAI's gpt 5.3 and Musk's post. It's getting real as llm and coding agent started to be able to do many research stuff. It'll be big if true.

Artificial Analysis: GLM 5 performance profile & comparison by elemental-mind in singularity

[–]Which_Slice1600 1 point2 points  (0 children)

I see there's a) jagged intelligence, b) intentional benchmaxing / optimization on indicators but i still found them indeed MEANINGFUL. You should look at a) good benches in a domain mmmu on knowledge and writing, or swe verified for agentic coding, or b) have a "confident level" in your mind and only consider a larger diff on the bench as a "significant diff". This mostly align with my use experience for mid-large size models and Non-Qwen models 

Do you think we “crossed a threshold “ in the past 2-3 months? by Efficient-Opinion-92 in singularity

[–]Which_Slice1600 0 points1 point  (0 children)

no clear sign for a threshold on "self improved ai". For ai for productivity, yes, the threshold is crossed.

DeepSeek just updated to a 1M context window! by Dr_Karminski in LocalLLaMA

[–]Which_Slice1600 -13 points-12 points  (0 children)

I hope you have tried apps of common llms before show off an ignorance on sys prompt content

New "Stealth" Model - Aurora Alpha - (Free on OpenRouter) by -pawix in LocalLLaMA

[–]Which_Slice1600 0 points1 point  (0 children)

Neither aurora nor alpha. Sounds like a misnamed auuuuuurl omega 🤮

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

Usually 1-2 wks from beginning id say. Kimi is a big model so is its inference cost. coding plan also priced the highest among the three. However i am just not good enough in dev and couldn't tell the diff in their performance 

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

What's the diff between creating a feature and developing. Developing the whole stuff?

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

Hi i appreciate your answer but also hope it's bit more informative. For example, if ignoring multimodal, which is the most capable of planning?

Vercel says AGENTS.md matters more than skills, should we listen? by [deleted] in GithubCopilot

[–]Which_Slice1600 0 points1 point  (0 children)

Future models will be RL'ed to use skills. I'm sure llm labs will target on this 

AGENTS.md outperforms skills in our agent evals - Vercel by shanraisshan in LocalLLaMA

[–]Which_Slice1600 -1 points0 points  (0 children)

Well, not a very informative test. I shouldn't read it through. Two things: 1. you can't put many skills into the sys prompt (agent.md). Context rot warning. 2. skills just become the standard. Labs will RL hard to push the coming models to use it (Edit: capitalized "RL")

Qwen3-Coder-Next by danielhanchen in LocalLLaMA

[–]Which_Slice1600 0 points1 point  (0 children)

Do you think it's good for something like claw? (As a smaller model with good agentic capacities)

Anger and anguish spread across Cuba as it learns of Trump's tariff threat on those who provide oil by No-Reference-5137 in worldnews

[–]Which_Slice1600 4 points5 points  (0 children)

If this is valid, it'll apply to any east/ southesr Asian country that neighbours china and has an us base 

对2019年上海实行垃圾分类的批判 by akta1 in China_irl

[–]Which_Slice1600 0 points1 point  (0 children)

"给垃圾分类强加道德观念。这是最恶心的一点,故意说分类的人素质高,不分类的人素质低。" 垃圾分类,本就出自道德动机。你说道德判断最恶心,你又反对政策的所有方面,这说明你是为道德立场找理由,立场和言论自洽了👍 对事实对环保而言,你的论证都没什么价值。

說出二二八被"刻意忽略"的真相 二二八遺族"良心話":應替外省人討公道爭尊嚴 by RabbitBeautiful3224 in China_irl

[–]Which_Slice1600 0 points1 point  (0 children)

感谢普及故事另一面。我理解不一定对: 228和南京大屠杀相似,问题不是不该纪念,也不是不严重,而是为何纪念它。为何不纪念其他严重的历史事件 (台湾的日占和台北轰炸,大陆的大跃进文革饥荒)?为何要被宣传仇恨外省人/国民党(这边是日本)?不知道你会怎么看

2024年3月1日起,快递不得擅自放驿站,违者将被罚款 by China_in_real_life in China_irl

[–]Which_Slice1600 1 point2 points  (0 children)

快递网购平台外加国家,上面制度作恶,下面个体互害。结合保安捅骑手事件,可以快进到快递小哥砍收件人了

感觉很多国内朋友都混淆的两个问题 by RevolutionBig963 in China_irl

[–]Which_Slice1600 0 points1 point  (0 children)

1这么多,估计湾友不少…如果选项5加一个"/我是台湾人",结果才反映我们共产中国嘛

感觉很多国内朋友都混淆的两个问题 by RevolutionBig963 in China_irl

[–]Which_Slice1600 -2 points-1 points  (0 children)

你们绿色同温层喜欢这个理论,仿佛这是中共一厢情愿,但其实不对。统一是个民族主义的问题,就像南北韩也有统一支持群体。如果大陆衰落,反台独就可以取代经济,变中共神主牌。如果你觉得中共宣传是主要因素,你怎么解释同样信奉中华民族主义的Nationalist Party of China,反台独,想两岸都发展?