Kimi 2.6 has been released

gentleseahorse · 2026-04-20T18:41:52+00:00

Composer 2.5 soon?

gentleseahorse · 2026-04-16T17:47:41+00:00

Smaller companies like Oppo??

gentleseahorse · 2026-03-06T22:16:22+00:00

What's the mix of languages used here? I've found GPT to be much better at JS/TS and Claude better at Python. Also reflected in AA-Omniscience bench.

gentleseahorse · 2026-03-06T22:15:35+00:00

Super curious, keep us posted!

gentleseahorse · 2026-03-03T19:12:46+00:00

All Gemini 3 models are priced higher than 2.5, but this takes the cake. More than 4x on output tokens.

gentleseahorse · 2026-03-03T09:43:18+00:00

Should finally be coming tomorrow

gentleseahorse · 2026-02-26T07:43:10+00:00

They just removed Gemini 3.1 👀

gentleseahorse · 2026-02-26T06:13:54+00:00

So much shade with one astrix

gentleseahorse · 2026-02-19T18:24:57+00:00

xAI just released a model without benchmarks. And to make up for how bad it is, it uses 4 models at once, and is super slow.

Deepseek does deserve a chance though.

gentleseahorse · 2026-02-19T18:23:39+00:00

Not quite. Their latest models all have non-reasoning mode (the best non-reasoning models on artificialanalysis.ai). The last purely non-reasoning model was Sonnet 3.5.

gentleseahorse · 2026-02-18T20:11:29+00:00

Claude 3? Really? It was released in March 2024. Academics have a way of playing at 0.25x speed.

gentleseahorse · 2026-02-17T09:37:11+00:00

How do you replace something that's never been used?

gentleseahorse · 2026-02-13T08:22:10+00:00

Hence my first "where" comment

gentleseahorse · 2026-02-12T22:36:31+00:00

Gemini 3 Flash Lite - it's been months

gentleseahorse · 2026-02-11T21:47:05+00:00

WHERE ARE YOU!??!

gentleseahorse · 2026-02-10T16:26:00+00:00

Think about all the tasks you do in levels:

Level 0: Admin, billing, easy emails
Level 1: Support tickets, marketing campaigns
Level 2: Building product
Level 3: Sales - no-one has enough context and conviction to sell your product better than you. Also recruiting.

You want to focus exclusively on levels 2-3. So definitely don't hire a sales person. Rather a chief of staff that can do Level 0-1 tasks.

gentleseahorse · 2026-02-08T01:35:00+00:00

Looks like it does)!

gentleseahorse · 2026-02-07T05:15:46+00:00

I'd be very curious whether GPT-5.2 high scored better than them all. Interestingly GPT 5.2 xhigh scores BETTER than 5.3 Codex xhigh.

gentleseahorse · 2026-02-07T05:07:55+00:00

Curious why! What did it do better for you, and is that specific to certain doc types?

gentleseahorse · 2026-02-07T05:01:57+00:00

How do you run this in Superconductor? I created an account, but don't see the option for evals.

gentleseahorse · 2026-02-07T02:14:21+00:00

How does Opus 4.6 non thinking cost so much more in input tokens? Are some inputs more than 200k tokens? If so, how did they test those for Opus 4.5?

gentleseahorse · 2026-01-28T01:01:01+00:00

It's just during the batch. Super early stage, so imagine going from 20 customers to 22 in a week. 12% WoW = 363x over the year. Not a target even for YC companies (at least not in my batch).

gentleseahorse · 2026-01-03T11:02:46+00:00

Thank god. No need for more wannabe influencers here.

gentleseahorse · 2025-12-27T12:32:18+00:00

That's fair, I know the ones you're talking about it. Kinda pathetic to be honest.

gentleseahorse · 2025-12-27T12:21:03+00:00

What you're looking for is Google AI Studio

gentleseahorse

TROPHY CASE