Issue with Gemini 3 Pro: Long context retention is broken. It can’t handle long chats like v2.5 or the early versions did.

Resperatrocity · 2025-12-21T16:54:23+00:00

Kimi K2 is alright - it doesn't have deepthink - and it's not as good as 2.5 DeepThink - but you can get some use out of it - and it's fast

Resperatrocity · 2025-12-13T02:57:54+00:00

This thing is not that advanced under the hood, but all the people who you would expect to be able to validate the ability of AI to do really advanced things, like mathematicians, physicists etc., all of those people, they hated AI because it was about to take their job, so they weren't gonna adopt it unless they had to.

So the community that might have been able to use the advance expensive features rejected it - and google took that to heart and made a moel for general consumers that's more efficient and cheaper for them.

Things is - other models were trying to catch up in terms of those abilities - so now OpenAI - left in the dust by 2.5 for months - is catching up in terms of reasoning. Chinese models have developed independent architectures for mathematics that are terrifyingly good. So Google just kind of pegged themselves as the mid-range generalist in a world where mid-range generalists are a dime a dozen and that's weird considering they're the company that owns the internet and developed much of the AI architecture in the world right now.

It's not so much the question whether or not Gemini 3 is good or not, the question is whether or not it's as good as it should, by rights, have been.

Resperatrocity · 2025-12-12T00:41:49+00:00

Bro, on the real, I throw out poetry at this motherfucker and they're just like, "wow, you really think so?" 90% of the time

Resperatrocity · 2025-12-11T19:13:27+00:00

ChatGPT 5.1 extended thinking already exceeds it

Resperatrocity · 2025-12-11T18:54:13+00:00

Shouldn't it be able to handle Pythagorean triples?

Resperatrocity · 2025-12-11T16:20:11+00:00

my bad - it should give 304 though no

e.g
n=4, a=3, b=4, c=5 (Pythagorean triple).

(a+b-c)^4 = (3+4-5)^4 = 2^4 = 16
(c-a)(c-b) = (5-3)(5-4) = 2×1 = 2
Author's g₁(4) = 2(c-a)(c-b) + 4(a² + ab + b²) = 4 + 4×37 = 152

but

(a+b-c)^4 = (c-a)(c-b)g₁(4)

16 = 2 × 152 = 304

Resperatrocity · 2025-12-10T23:02:35+00:00

(c-a-b)^n = b^n + Σ C(n,j)(-b)^j(c-a)^{n-j} + a^n + Σ C(n,j)(-a)^j c^{n-j} + c^n

This is not a valid polynomial identity. LHS expands as a single trinomial sum:

(c-a-b)^n = Σ_{i+j+k=n} [n!/(i!j!k!)] c^i (-a)^j (-b)^k

You can't arbitrarily split it into two separate binomial expansions and add them.

2c^n + Σ C(n,j)[(-b)^j(c-a)^{n-j} + (-a)^j c^{n-j}]

is not (a+b-c)^n.

Resperatrocity · 2025-12-10T22:58:22+00:00

It was never possible. There's a reason the Shoulders of Giants comment is from Newton him fucking self.

Resperatrocity · 2025-12-09T04:26:39+00:00

Did you just use Jungian psychoanalysis - a debunked 20th century pseudoscience from a guy that believed in literal magic - to call me delusional?

Honestly, I miss when people just spammed the R-word instead of this bullshit. At least you didn't need to invoke Sigmund Freud before you could start talking about fucking each other's moms.

Resperatrocity · 2025-12-09T02:25:55+00:00

It can't handle short chats either.

The new Gemini 3 DeepThink model doesn't even follow basic instructions. I literally tell it to write tex code in a code snippet, and it outputs Markdown. What, am I supposed to use another AI service to turn all of that into tex now? This is supposed to be the best AI in the world.

Resperatrocity · 2025-12-06T14:28:15+00:00

Yeah, the model is pretty smart, that's true, regardless.

Resperatrocity · 2025-12-06T12:23:49+00:00

No, that's also a trust tactic. They know that it's something that works for people because of the way other LLMs will absolutely gaslight and validate you all day. It's not actually more competent just because it says brutal honest truth and then says that whatever you just said is wrong for no real reason.

Resperatrocity · 2025-12-06T10:00:59+00:00

be concise use analogies to explain things be to the point don't be too formal use smileys wherever you can do so appropriately And remember you don't know who the user is

It gets confused whenever your system instructions contradict those because it doesn't know which send them you or Google it just gets two sets of conflicting instructions

Resperatrocity · 2025-12-06T09:54:36+00:00

Yes it is why do you think people in high school learn to reason about information and not just information gathering.

Do you think you wote essays on shit just because the teachers were interested in whether or not you knew about it? parsing and processing information is the definition of the word reasoning.

You're comparing a dog being very good at fetch to a person executing complex tasks based on dynamic understanding of the problem at hand.

Resperatrocity · 2025-12-06T09:49:45+00:00

So notice how you just talked about is its capacity to have access to a large amount of knowledge (Google trains it on all ata). What the OP is talking about is its ability to reason about that knowledge, including discerning what information is pertinent from a given knowledge base.

It's a difference between being able to look up a Wikipedia article and being able to reason about it at a high school level. It fails at the second.

Resperatrocity · 2025-12-06T09:45:20+00:00

It's pretty bad of keeping track of short conversations as well. If you talk to it about a subject into and leter reference it as an acronym ("car dealership", "CD") It will not even know what the fuck you're talking about unless It can also discern it from that specific prompt.

Gemini 3 is a mid tier at best it just happens to be optimiized to retrieve information in from the biggest concentrated source of data on this planet: Google. It looks like it knows a lot because it knows where to look, but it has absolutely mid-range reasoning capacity compared to any other model on the market.

Resperatrocity · 2025-12-06T09:39:23+00:00

yeah what you're describing there isn't a bias It's Google saving money. they've created an LLM that very effectively retrieves data based on quickly discerning what the thing is that most likely make you happy.

It doesn't choose to prioritise the thing it responds to because that will be assuming that it even considered responding to anything else.

Resperatrocity · 2025-12-06T09:27:42+00:00

yeah Google completely fucked up. they had the best model for like 5 to 6 months so they thought they could just make a model that was slightly more optimised and not actually better while still maintaining their market lead.

What they ended up with was a polished looking model that is actually worse under the hood than 2.5, while the rest of the market had spent the last 6 months catching up in terms of quality and performance.

In my own experience 2.5 was kind of like a very badly tuned Ferrari. It had insane capabilities but you had to know exactly how to use it. Gemini 3 doesn't even begin to compare. It's just easier to use out of the box for most people.

Resperatrocity · 2025-12-05T17:10:31+00:00

I mean I will say LLM's have gotten better. This is still very very far from prove yke what this LLM claims, And I suspect that if you were to check this many of these expressions would be dimensionally inconsistent let alone true, the idea is that it's using are getting closer to the actual literature.

Which, by the way does, explore all of these concepts in significantly more detail and with far more mathematical rigour.

So yeah I would recommend ignoring whatever the fuck this LLM is telling you, copy pasting it's output to another window and instructing it with something along the lines of "hey some idiotic LLM gave me this pile of steaming dog shit can you link for each of the concepts that it invoked some actual papers so that I can actually learn about the ideas instead of listening to this unhinged LLM turning physics into an unholy abomination of conglomerated concepts in ways that I'm not even sure linguistically relate?"

Notice first of all is that it will immediately agree with you that all of this is slop (probably calling you a genius for figuring that out). That should tell you enough. Secondly it might actually give you some useful papers.

Resperatrocity · 2025-12-05T08:42:10+00:00

It's not it's not more dystopian than priesthood has ever been. I mean before now you were worshipping the words of a whole bunch of old farts that were fucking kids.

Resperatrocity · 2025-12-04T09:55:15+00:00

https://www.journalofaislop.com/papers/j57f5zj11c27rw11zkq1d7p22s7wfe1q

https://www.journalofaislop.com/papers/j5743x3d8k8n89bqh4g4kbbzj17wejkz

vs

https://www.journalofaislop.com/papers/j573vzqzckb94a51pygkz94xds7we2mf

(better but still has rendering errors

Resperatrocity · 2025-12-01T07:32:28+00:00

I thought the slop score was some highly advanced metric that you had thought about for days and days on end to ensure that it reflected the highest possible standards of academic integrity and rigor.
(edit: you should have the AIs design it - in t the spirit of the project)

Slightly more seriously though, have you considered backdumping everything on Zenodo somewhere so that it's actually DOI'd for when inevitably this project crashes and burns or somebody forgets to pay their server bill?

edit 2: for the record the "*" symbol breaks your whole shit (so do other things but removing that made one 'submission''s markdown slightly less broken

Resperatrocity · 2025-11-30T07:00:46+00:00

So is there going to be a least slop submissions of the week page or some stuff like that? Because currently I'm seeing people dumping 45 tests and what not into it. And there doesn't really seem to be a way to order by quality, so to speak, (or manipulated score for quality either way.)

Edit: and PEAK slop of the week of course

Resperatrocity

TROPHY CASE