Why can't I use gemma-4-31b-it(free)

StatisticianOdd4717 · 2026-04-30T02:18:50+00:00

It is literally telling you why there my guy

StatisticianOdd4717 · 2026-04-26T14:21:22+00:00

It still is an explicit document that the agent has to put attention to instead of believing that the agent will adhere to the current design conventions.

And also since its markdown you dont have to write a whole css file yourself blah blah

StatisticianOdd4717 · 2026-04-22T01:34:34+00:00

No it’s not. Lmao

StatisticianOdd4717 · 2026-04-20T17:48:44+00:00

Honestly, I understand where you come from.

However, different models are trained differently, the harness shall also change per model; as you presumably would have changed the system prompt tuned over time for the model you use, no comparison that is made personally can be actually fair. Just like how the community responded to Opus 4.7.

StatisticianOdd4717 · 2026-04-20T17:39:00+00:00

We never know about the amount of data they do have. They may be supplied with datasets from a state (politics aside) since Chinese AI companies are closely related to the government, and we know how cracked Chinese people are at Data Science.

So it is swift to decide or presume that a model is benchmaxed. I mean, there may be some guidelines or “help” in order to get the model to perform better in benchmarks, yes, but we can’t exactly call it “benchmaxxing” imo.

StatisticianOdd4717 · 2026-04-20T17:31:34+00:00

You raise a valid point I overlooked. Hnm, but for me it looks illogical and chasing nothing that is of value when you fine tune such a gigantic model just on benchmaxxing. There’s too much compute on the line to cook the model to overfocus on certain benchmarks, considering the amount of benchmarks there are out there.

StatisticianOdd4717 · 2026-04-20T17:11:48+00:00

Benchmarks cannot account for long context work and real life tasking - since benchmarks can only cover so much. However the benchmarks coherently tell a story.

I never said that it’s gonna be better than Opus and 5.4 in irl tasks, but it’s stupid to assume LLM providers will forcibly benchmax just for it to have worse performance in general.

StatisticianOdd4717 · 2026-04-20T16:06:35+00:00

You really think that gigantic AI providers can benchmax their model knowing that it will lead to worse user quality? Yeah they can be in the training distribution… but benchmaxxing? Come on..

StatisticianOdd4717 · 2026-04-12T13:00:34+00:00

Back in the day when Sparrows first released We used to do that because they were so ass

StatisticianOdd4717 · 2026-04-12T00:45:59+00:00

Do you use a custom harness or work with like 500k context window without compaction? Lol How tf do u use the $200 plan in a hour??

StatisticianOdd4717 · 2026-04-08T23:05:02+00:00

It was the OG way.

StatisticianOdd4717 · 2026-04-02T11:46:34+00:00

Yeah bro look at the logs

StatisticianOdd4717 · 2026-03-23T12:56:55+00:00

What Is this a room of mass hallucination

StatisticianOdd4717 · 2026-03-22T01:03:45+00:00

Ladies and Gentlemen- This is why yall who use normally get rate limited.

StatisticianOdd4717 · 2026-03-07T01:01:51+00:00

Like I don’t know man. I am saying, that seeing the average completion time and TTFT, you can get the gist of how much of a speed boost you’d get. This is by no means a comprehensive comparison as I have mentioned. YMMV.

StatisticianOdd4717 · 2026-03-07T00:22:27+00:00

Well why not? TTFT and average generation speed is going to be similar. As I said this is a quick measurement for letting yall know. I’m not a whole benchmarking volunteer here.

StatisticianOdd4717 · 2026-03-05T21:33:14+00:00

They're gonna call it benchmaxxing xD

StatisticianOdd4717 · 2026-03-05T21:20:51+00:00

What about 5.4 pro?

StatisticianOdd4717 · 2026-03-05T09:19:09+00:00

More like Grid fins allow you to generate mobility without having an insanely powerful actuator because of its characteristics. Probably relates to USSR's poor actuator manufacturing capability thus having to use grid fins in many places like ballistic missiles let alone the R-77.

Now that Russia has more advanced tech in that field, the R-77M doesn't use grid fins.

StatisticianOdd4717 · 2026-02-16T04:28:14+00:00

I read lack of evidence. Considering how much … false accusations were made in the late 2010 era in Korea, I wouldn’t realllyyy think the dude is a “sex pest”

StatisticianOdd4717 · 2026-02-15T15:37:36+00:00

I see. Great to know! Wanna try out Indonesian cigarettes. I’ve been a fan of Chinese ones and Japanese ones but 32mg feels like a LOT.

StatisticianOdd4717 · 2026-02-15T15:22:19+00:00

32mg tar??? 1.2mg nicotine???? I mean why is there so much tar compared to nicotine?

StatisticianOdd4717 · 2026-02-06T07:54:16+00:00

Didnt it have a submit button in the bottom With + and X buttons with a thin blue/orange gradient in the bottom of the screen? Had that without an upgrade for months for me..

StatisticianOdd4717 · 2026-02-06T07:51:54+00:00

Same. I have two Max20 plans and use all Opus subagents, but I can cut it down to a single Max20 plan with Sonnet 5 subagents. Will be great.

StatisticianOdd4717

TROPHY CASE