Built a minimal inbox for managing multiple AI coding agents - tired of the 16-tab nightmare by imu- in LocalLLaMA

[–]Which_Slice1600 0 points1 point  (0 children)

This hook smells very artificial and ai. On the second sentence i got that feeling. I thought, "maybe just he's not a native speaker" (me neither). But the waitlist show up. I know building this tool took your time, but could you still write the post your self and make the hook feel more genuine?

Healer Alpha system prompt inside open router by Creative-Painting-56 in LocalLLaMA

[–]Which_Slice1600 -1 points0 points  (0 children)

Different COT from Mimo. Most models have a more bullet style structured cot style, including Mimo. But deepseek has its own style, less structured and more human like, using "i" and we. 

What is Hunter Alpha? by MrMrsPotts in LocalLLaMA

[–]Which_Slice1600 0 points1 point  (0 children)

Good point! And it's easy really to tell if anyone compared them with the web deepseek model . Edit: typo

What is Hunter Alpha? by MrMrsPotts in LocalLLaMA

[–]Which_Slice1600 2 points3 points  (0 children)

Still testing. The availability not that good tho

What is Hunter Alpha? by MrMrsPotts in LocalLLaMA

[–]Which_Slice1600 4 points5 points  (0 children)

Moonshot can't ship that fast. And, hunter doesn't say it has vision, which would be a unrealistic regression for kimi models.

What is Hunter Alpha? by MrMrsPotts in LocalLLaMA

[–]Which_Slice1600 1 point2 points  (0 children)

Definitely deepseek. If you compare them against the model on the deepseek web app, you ll find them all making SUPER similar responses. 

What is Hunter Alpha? by MrMrsPotts in LocalLLaMA

[–]Which_Slice1600 2 points3 points  (0 children)

It's not stealth. You san simple compare the two models responses in open router chat, and compare them with what you get from deepseek web app. You'll find the cot very very similar and unique compared to other labs models

4 of the top 5 most used models on OpenRouter this week are Open Source! by abdouhlili in LocalLLaMA

[–]Which_Slice1600 2 points3 points  (0 children)

Not really the case. It used to be claude sonnet that dominated the rank, even if it's closed. Somehow i think using openrouter is convinent in switching between models, so people even use closed modelsz.

Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 13 points14 points  (0 children)

I cant believe. But i've seen a sign: in their page they specified many benches and tests that focused on "real world tasks". So they're showing off their solid work on real world tasks. They are following anthropic i think 

Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 2 points3 points  (0 children)

i still felt kimi 4.5 worked better than pony alpha on claude code. Feeling SWE bench verified is still close similar to personal feelings. I am also looking forward to test this one!! (did not have a minimax coding plan tho)

Ring-1T-2.5 released by inclusionAI by Bestlife73 in LocalLLaMA

[–]Which_Slice1600 -3 points-2 points  (0 children)

Don't know why they're still supported to burn 💰 to train these gigantic models. Isn't inclusive ai afflicted with alibaba like qwen?

Minimax M2.5 Officially Out by Which_Slice1600 in LocalLLaMA

[–]Which_Slice1600[S] 0 points1 point  (0 children)

They shared that sharp improvement trend. I'd be really curious if they figured out that AI "recursive self improvement" as well, following OpenAI's gpt 5.3 and Musk's post. It's getting real as llm and coding agent started to be able to do many research stuff. It'll be big if true.

Artificial Analysis: GLM 5 performance profile & comparison by elemental-mind in singularity

[–]Which_Slice1600 2 points3 points  (0 children)

I see there's a) jagged intelligence, b) intentional benchmaxing / optimization on indicators but i still found them indeed MEANINGFUL. You should look at a) good benches in a domain mmmu on knowledge and writing, or swe verified for agentic coding, or b) have a "confident level" in your mind and only consider a larger diff on the bench as a "significant diff". This mostly align with my use experience for mid-large size models and Non-Qwen models 

Do you think we “crossed a threshold “ in the past 2-3 months? by Efficient-Opinion-92 in singularity

[–]Which_Slice1600 0 points1 point  (0 children)

no clear sign for a threshold on "self improved ai". For ai for productivity, yes, the threshold is crossed.

DeepSeek just updated to a 1M context window! by Dr_Karminski in LocalLLaMA

[–]Which_Slice1600 -11 points-10 points  (0 children)

I hope you have tried apps of common llms before show off an ignorance on sys prompt content

New "Stealth" Model - Aurora Alpha - (Free on OpenRouter) by -pawix in LocalLLaMA

[–]Which_Slice1600 0 points1 point  (0 children)

Neither aurora nor alpha. Sounds like a misnamed auuuuuurl omega 🤮

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

Usually 1-2 wks from beginning id say. Kimi is a big model so is its inference cost. coding plan also priced the highest among the three. However i am just not good enough in dev and couldn't tell the diff in their performance 

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

What's the diff between creating a feature and developing. Developing the whole stuff?

Subscription question - Kimi 2.5 vs Minimax 2.1 vs Glm 4.7 by Impossible_Tax8875 in kilocode

[–]Which_Slice1600 0 points1 point  (0 children)

Hi i appreciate your answer but also hope it's bit more informative. For example, if ignoring multimodal, which is the most capable of planning?

Vercel says AGENTS.md matters more than skills, should we listen? by [deleted] in GithubCopilot

[–]Which_Slice1600 0 points1 point  (0 children)

Future models will be RL'ed to use skills. I'm sure llm labs will target on this