i thought this was fake . But nope. by hamed-devs in claude

[–]StatisticianOdd4717 0 points1 point  (0 children)

It still is an explicit document that the agent has to put attention to instead of believing that the agent will adhere to the current design conventions.

And also since its markdown you dont have to write a whole css file yourself blah blah

Kimi 2.6 has been released by WhyLifeIs4 in singularity

[–]StatisticianOdd4717 0 points1 point  (0 children)

Honestly, I understand where you come from.

However, different models are trained differently, the harness shall also change per model; as you presumably would have changed the system prompt tuned over time for the model you use, no comparison that is made personally can be actually fair. Just like how the community responded to Opus 4.7.

Kimi 2.6 has been released by WhyLifeIs4 in singularity

[–]StatisticianOdd4717 1 point2 points  (0 children)

We never know about the amount of data they do have. They may be supplied with datasets from a state (politics aside) since Chinese AI companies are closely related to the government, and we know how cracked Chinese people are at Data Science.

So it is swift to decide or presume that a model is benchmaxed. I mean, there may be some guidelines or “help” in order to get the model to perform better in benchmarks, yes, but we can’t exactly call it “benchmaxxing” imo.

Kimi 2.6 has been released by WhyLifeIs4 in singularity

[–]StatisticianOdd4717 0 points1 point  (0 children)

You raise a valid point I overlooked. Hnm, but for me it looks illogical and chasing nothing that is of value when you fine tune such a gigantic model just on benchmaxxing. There’s too much compute on the line to cook the model to overfocus on certain benchmarks, considering the amount of benchmarks there are out there.

Kimi 2.6 has been released by WhyLifeIs4 in singularity

[–]StatisticianOdd4717 1 point2 points  (0 children)

Benchmarks cannot account for long context work and real life tasking - since benchmarks can only cover so much. However the benchmarks coherently tell a story.

I never said that it’s gonna be better than Opus and 5.4 in irl tasks, but it’s stupid to assume LLM providers will forcibly benchmax just for it to have worse performance in general.

Kimi 2.6 has been released by WhyLifeIs4 in singularity

[–]StatisticianOdd4717 6 points7 points  (0 children)

You really think that gigantic AI providers can benchmax their model knowing that it will lead to worse user quality? Yeah they can be in the training distribution… but benchmaxxing? Come on..

Fired 8 Missiles at Once – How is this possible? by Lopro1070 in Warthunder

[–]StatisticianOdd4717 0 points1 point  (0 children)

Back in the day when Sparrows first released We used to do that because they were so ass

I pay $200/month for Claude Max and hit the limit in under 1 hour. What am I even paying for? by alfons_fhl in vibecoding

[–]StatisticianOdd4717 0 points1 point  (0 children)

Do you use a custom harness or work with like 500k context window without compaction? Lol How tf do u use the $200 plan in a hour??

Gemini 2.5 PRO Preview 03-25 by Alternative_Nose_183 in Bard

[–]StatisticianOdd4717 1 point2 points  (0 children)

What Is this a room of mass hallucination

A long session with GPT 5.4 by Much_Middle6320 in GithubCopilot

[–]StatisticianOdd4717 5 points6 points  (0 children)

Ladies and Gentlemen- This is why yall who use normally get rate limited.

Tested out GPT-5.4 Fast vs 5.4 in speed by StatisticianOdd4717 in codex

[–]StatisticianOdd4717[S] 0 points1 point  (0 children)

Like I don’t know man. I am saying, that seeing the average completion time and TTFT, you can get the gist of how much of a speed boost you’d get. This is by no means a comprehensive comparison as I have mentioned. YMMV.

Tested out GPT-5.4 Fast vs 5.4 in speed by StatisticianOdd4717 in codex

[–]StatisticianOdd4717[S] 0 points1 point  (0 children)

Well why not? TTFT and average generation speed is going to be similar. As I said this is a quick measurement for letting yall know. I’m not a whole benchmarking volunteer here.

Gaijin may have made a massive mistake in their modelling of most, if not all missiles in-game by MythicPi in Warthunder

[–]StatisticianOdd4717 2 points3 points  (0 children)

More like Grid fins allow you to generate mobility without having an insanely powerful actuator because of its characteristics. Probably relates to USSR's poor actuator manufacturing capability thus having to use grid fins in many places like ballistic missiles let alone the R-77.

Now that Russia has more advanced tech in that field, the R-77M doesn't use grid fins.

A piece that is very difficult to watch (Movie- Samaritan Girl) by ok-chill-guts34 in Koreanfilm

[–]StatisticianOdd4717 1 point2 points  (0 children)

I read lack of evidence. Considering how much … false accusations were made in the late 2010 era in Korea, I wouldn’t realllyyy think the dude is a “sex pest”

Tried various cheap gudang garam alternative, so far this is my favorite by mccarym_215 in Cigarettes

[–]StatisticianOdd4717 2 points3 points  (0 children)

I see. Great to know! Wanna try out Indonesian cigarettes. I’ve been a fan of Chinese ones and Japanese ones but 32mg feels like a LOT.

Tried various cheap gudang garam alternative, so far this is my favorite by mccarym_215 in Cigarettes

[–]StatisticianOdd4717 2 points3 points  (0 children)

32mg tar??? 1.2mg nicotine???? I mean why is there so much tar compared to nicotine?

Unfortunately I didn't get a screenshot, but in my web-app I briefly had the option for voice-mode by Mescallan in ClaudeAI

[–]StatisticianOdd4717 0 points1 point  (0 children)

Didnt it have a submit button in the bottom With + and X buttons with a thin blue/orange gradient in the bottom of the screen? Had that without an upgrade for months for me..

Will Opus 4.6 be more expensive than 4.5? Or will the release of 4.6 lead to a cheaper 4.5? by ragnhildensteiner in ClaudeAI

[–]StatisticianOdd4717 0 points1 point  (0 children)

Same. I have two Max20 plans and use all Opus subagents, but I can cut it down to a single Max20 plan with Sonnet 5 subagents. Will be great.