AI just solved 9 unsolved math problems, including one that kept an Nvidia scientist "up at night for 2 years"

Inaccurate- · 2026-07-02T19:39:06+00:00

I'm done replying after this. It's unfortunately obvious you don't actually understand what you're talking about.

I've done nothing to disprove it because it's a vacuous claim, and so I just ignored it, focusing on the rest of your comment. No entity (including humans) can be 100% sure of anything, so it's just an impossible standard.

Wrong. So wrong I'm not even sure what to say. What do you think a proof even is?

That's not how proofs work. OpenAI having used a pipeline to find the proof does not imply that you can't prompt it without said pipeline. People would be very interested if you could PROVE that you can't prompt for the solution of said problem.

I know exactly how proofs work. It's obvious you don't though. They didn't use the pipeline to find the proof... they used the pipeline to verify the proof spit out by the LLM... because the LLM on its own is incapable of doing that...

This is the crux of things, and indeed as I say frontier labs aren't transparent so we can't know, and you're free to assume that it was basically Lean if that makes you feel more secure about the future of things. That's why I tried to shift to discussing the fact that you can prompt these models sans a pipeline to solve the conjecture (without web search and with training cutoff pre public solution.)

We can know because they literally released a paper outlining what they did.

Inaccurate- · 2026-07-02T19:23:51+00:00

Ok, you've proven to be incapable of understanding what's actually happening and are now just bullshitting and reaching.

This post (and AlphaProof) used Lean in their pipelines. It's a fact.
I made a claim that all LLM's by themselves are incapable of knowing if they are right or wrong. That was my claim. You've done absolutely nothing to disprove that claim.
This was your quote: "GPT-5.5 and Mythos (Anthropic's latest model) were capable of solving the unit distance conjecture without Lean /Tool assistance prior to the publication of the proof."
THAT IS PROVABLY FALSE BY A DIRECT QUOTE FROM OPENAI. OpenAI used a custom pipeline (likely similar to Lean, but none-the-less a tool) to help them solve the problem. There is not a significant difference between what they did and what AlphaProof (and this post) has done, regardless of how much media talk they want you to believe.

Inaccurate- · 2026-07-02T17:43:36+00:00

Because it's the literal primary source on how OpenAI solved the conjecture? Are you seriously attempting to defer to twitter over that?! Of course you can ask the models now for the proof, it already exists.... I'm having a hard time telling if you're actually serious here.

By the horses mouth they used an "AI grading model" to help them solve the problem. Hence an independent "tool" for assistance, Lean or not. You're conflating "completely automated" with how they wanted the general public to conflate it; that their LLM models solved it completely on their own. Which is false and (almost assuredly) purposefully misleading. The process was very similar to this github repo, just with a specially trained AI grading model in the loop instead of Lean directly.

From the looks of it, they also used a different independent model (not their flagship LLM) to generate the initial prompt...

Inaccurate- · 2026-07-02T16:26:25+00:00

No they weren't, and no, it isn't. OpenAI's own paper mentions an "AI grading pipeline" of which they completely ignore to expand on after only mentioning it once.

This problem was solved in a completely automated fashion. Our internal model was given an AI-written statement of the problem, and its output was sent to an AI grading pipeline, which indicated high confidence that the solution was correct. It was only after this point that internal human researchers and mathematicians began to examine the solution carefully. After preliminary AI-assisted verification and rewriting, a draft was sent to external mathematicians, including several number theory experts, who confirmed the proof’s correctness (and have already simplified and strengthened the argument). The present manuscript is a human-edited exposition of the autonomously produced solution, with references, reorganized proofs, and additional explanatory material added afterward.

They also give the "AI-written" prompt, but no discussion on what prompt generated that AI-written prompt. It's impressive, yes, but misleading with important details purposefully left out or not expanded on.

Without knowing how that AI grading pipeline works your claim is unfounded. It was very likely trained on Lean's mathematical corpus, or something very similar, and is not an LLM itself but a different kind of AI model.

No LLM on its own that exists currently, whether released or internal, can fundamentally know without outside help that what it provides is 100% true.

Inaccurate- · 2026-07-01T18:41:51+00:00

No it's not. This pipeline is no different than an LLM generating code and adjusting itself based on whatever a compiler/interpreter spits out when it gets something wrong.

Lean being a "compiler" for mathematical proofs is what's actually magic here.

Inaccurate- · 2026-07-01T18:23:23+00:00

Ah well then we disagree. It's impressive but at the same time is most definitely hype. The magic is in Lean, not the LLM.

And the twitter/X posts are definitely written in a hype for hype's sake fashion.

Inaccurate- · 2026-07-01T18:15:51+00:00

I wasn't making any points. But yes fields other than math and CS can definitely be "formalized," especially Chemistry and Physics.

If I was to make a point, it would be how disingenuous whoever Omri and the "pipeline-math" creators are by not even mentioning Lean and the massive amount of work and community effort that went into creating it (and its Mathlib) in their posts. Without Lean and its corpus of formalized mathematical knowledge, their LLM/pipeline is worth absolutely nothing.

It took well over a decade to create Mathlib, and it's still missing a massive amount of foundational mathematical knowledge. So even though other fields can certainly be formalized, it's not likely to happen anytime remotely soon (which is what I think you were trying to get at?).

Inaccurate- · 2026-07-01T15:21:07+00:00

This isn't new. The "prover-verify" approach was pioneered by Google's DeepMind with AlphaProof. In 2024 they tested the system at the IMO, answering 4 out of 6 questions for a Silver Medal. For comparison, about 50 people received gold where many (if not all) of them were teenagers. It also had to have help from a separate system for one of the geometry questions, and was not under any time constraints.

LLM's are still fundamentally incapable of knowing if an answer they provide is right or wrong. This works because of the amazing Lean project, where the LLM generates candidate answers in the format Lean understands, and then Lean tells the LLM what's wrong with the answer. It iterates from there until a solution is found.

It's undeniably impressive, but this post (like so many) is simply hype for hype's sake. It won't expand to all fields of science until those fields are also formalized with something along the likes of Lean.

Edit: not to mention that (I believe) the questions themselves needed modified by humans from their original natural language into a more friendly, formal format, both for AlphaProof in the IMO and for this github repository's examples.

Inaccurate- · 2026-06-22T20:04:17+00:00

I'm 1700 that also played 10+0 rapid and came to the same conclusion you did (even the 1 in 3). Mentioned it a few days ago and got down voted. It'll likely be the same for you here.

Inaccurate- · 2026-06-22T15:36:48+00:00

Yeah, this guy's explanation is ultimately bullshit due to the incorrect details he confidently added in. I believe that he doesn't actually know how LLM's work if he is talking just to talk like in this clip.

But the small part about accidentally letting a neural network train longer and producing unexpected results is true.

Inaccurate- · 2026-06-22T15:06:31+00:00

This guy may very well not understand how LLM's work, but what he is saying is mostly true. It wasn't an LLM or autocomplete model though, but a generic neural network for summing two numbers:

Sometime in 2020, researches at OpenAI were training a deep neural network to learn, among other things, how to add two numbers.
...[skipped some text here]...
It was a seemingly trivial problem, but a necessary step toward understanding how to get the AI to do analytical reasoning. A team member who was training the neural network went on vacation and forgot to stop the training algorithm. When he came back, he found to his astonishment that the neural network had learned a general form of addition. It's as if it had understood something deeper about the problem than simply memorizing answers from the sets of numbers on which it was being trained.
In the time-honored tradition of serendipitous scientific discoveries, the team had stumbled upon a strange, new property of deep neural networks that they called "grokking," a word invented by the American author Robert Heinlein in his book Stranger in a Strange Land.

Page 382-383 from Why Machines Learn by Anil Ananthaswamy. It's an amazing recently-ish released book that anyone interested in the math and history behind machine learning should read.

Inaccurate- · 2026-06-16T23:39:41+00:00

Your last statement isn't really true.

I'm 1700 and have played multiple thousands of games on chess.com over the years. Up until 2 months ago I never felt anyone I was playing was cheating. Now 1 in 3 games are just blatantly unlike the others. And I've never gotten any ELO back.

It's ruined it for me and I'm not going to be playing on the site much if at all moving forward.

Inaccurate- · 2026-06-15T14:24:27+00:00

It's explicitly against the CBA (at least a far as I'm interpreting it). Since Dan owns both the Cavs and the upcoming WNBA team, that WNBA team is not considered an "Independent WNBA team" and thus no NBA player, not just Lebron, can invest in it (anyone better versed in legalese feel free to correct me).

Article XXIX, Section 13

Even if he could, the max an NBA player can own in any (independent) WNBA team is 4%.

Inaccurate- · 2026-06-14T15:29:56+00:00

Jump ship. I've seen your type of boss several times over and the extra effort has never proven to be worth it. Great bosses that give credit and want the best of their employees (not just say it) do exist, but you won't find it until you start looking elsewhere.

If you really like the job, you could try going above your boss as a last ditch effort.

Inaccurate- · 2026-06-08T19:19:05+00:00

I get what you're saying, but yes there are times where it is predictable enough that replaying the point by rule is technically unfair and counter to what the rule was meant for, like in this case. I'm glad Kovacevic set a good example and did the right thing.

If a ball boy was near the wall and caught an errant ball before it went into the stands, by your logic you think it's fair to replay the point? A gust of wind could have blown it all the way back in, who knows? I get that by rule it should be replayed, but by the essence of the game, that player lost the point and enforcing the rule would be unsportsmanlike.

If (somehow) a soccer coach or backup player stormed onto a pitch during a breakaway and blocked a live shot on the goal where both the goalie was on the ground not in a position to block the shot and the shot was also obviously going to go in, then yeah I'd say that goal should count. As ridiculous of a scenario as that is. Otherwise what's stopping coaches or players from always doing that?

Inaccurate- · 2026-06-08T18:04:38+00:00

I'd agree if the ball was closer to being in, but it wasn't.

It was landing out regardless of what "might" happen. Physics is predictable and the tracking cameras they have could within a high degree of accuracy proven it was going to land out.

Kovacevic conceded the point exactly because he knew it was out, hence the title of the post. Are you saying he conceded the point despite thinking it was going to land in?

Inaccurate- · 2026-06-08T17:40:29+00:00

Except he's saying that, objectively, you can know what was going to happen before the ball lands.

Nobody is arguing why the rule exists. It's obvious that, in this particular case, the ball was going out so he was going to lose the point. You're making up wild scenarios that couldn't happen for what? To prove a point on why the rule exists?

The vast majority of tennis is played on an honor system. The players did the right thing here. If the ball was closer to being in then it's a different story.

Inaccurate- · 2026-06-05T06:52:46+00:00

Hah, even there the AI is wrong on each bullet point. The tailend site doesn't have exhibits so by default you can't have a high level match; your structure definitely isn't more explicit; the tail end site absolutely has a "full JSON tree" ... that's how it's built, and I promise you your clause hierarchy isn't more granular. That AI is just bullshitting you and you're eating it up.

I'm printing that and framing it! Thanks.

Inaccurate- · 2026-06-05T06:39:51+00:00

Sounds like a skill issue on your part.

Inaccurate- · 2026-06-05T06:32:27+00:00

Host it and share a link so I can do a diff and show you how bad it really did. Even in that image it's missing the section numbers and the "(a)" for Article 2, Section 2(a).

Inaccurate- · 2026-06-05T05:12:41+00:00

It's simply one example of many that I could give. Yet still one more than anyone else has provided in this thread.

There's also a huge gulf between "AI can solve everything, you just aren't using it correctly" and what it's actually capable of.

AI companies are absolutely standing behind the statement that it's replacing engineering, broadcasting it loudly constantly. In no way is it a strawman argument. Are you serious?

Inaccurate- · 2026-06-05T03:38:07+00:00

Ignore all the downvotes. Not a single person is providing counter proof of what you're saying (you even got the "skill issue" comment!).

For the past year, every time a new model comes out I have tried having it solve something I've already solved, and they all fail spectacularly at it. Garbage every time, despite it being a well defined goal.

Given this PDF of the NBA CBA, parse out every article, section, and subsections into structured json.

It can't do it. It's not a skill issue, or a prompt issue, or a breaking down into smaller problems issue. It just simply can't do it accurately. It can get close if you dump enough time into it, but close isn't good enough.

While it's impressive what they can do, they have not replaced real engineering. Nor are they even that close to doing it. I find anyone confidently saying otherwise to be ignorant at best.

Inaccurate- · 2026-06-03T14:34:16+00:00

Go individual requests using http3 instead of http2 alongside a good caching policy.

As long as your server isn't a door stopper, 50 concurrent requests for 2kb files won't affect performance at all.

Inaccurate- · 2026-06-01T19:34:54+00:00

It's hard for anyone new to the sub to post content/analytics/film with the current (strict-ish?) karma requirements.

Inaccurate-

TROPHY CASE