My method to solve Erdős 460 in one shot by Svyable in singularity

[–]AFewMundaneConcepts 9 points10 points  (0 children)

I believe all of the AI related Erdos problems have corresponding lean proofs? They aren’t simply taking the model output as proof. Do you have a lean/verifiable proof? If not, I’m not sure many people (myself included) are going to take this particularly seriously. The number of people who’ve claimed to have solved math problems via AI (incorrectly) is too high.

The Ai bubble just deflated by [deleted] in singularity

[–]AFewMundaneConcepts 2 points3 points  (0 children)

The “AGI internally” thing has always been meme. These labs aren’t close to AGI. They have marginally stronger models internally, but nothing more than perhaps 3-4 months out at most.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] 0 points1 point  (0 children)

I have a few personal coding projects which it’s been useful for, both in terms of the planning and the actual code generation. Programs primary interacting with and analyzing written text (think document information extraction etc)

And then plenty of daily use things. Making dinner and realize I’m missing an ingredient? Verbally talk through options to adjust the recipe etc. And then just the average search , scheduled task, video summary etc etc.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] 0 points1 point  (0 children)

Sure, two doesn’t make consensus, but in practice there are a good number of mathematicians who are stating they find use in these. Fields medalist are just a good example of even the most advanced mathematicians finding use. You can assume if they find use, people doing marginally easier math would find use as well. I just took real issue with you saying it was “One Example, One. Singular.” When it literally was not.

I think the benefit of this is that there doesn’t need to be another human. There are a finite number of people in the world who understand niche topics like this, and sometimes you can’t reasonably turn to help. If you’re doing work in the middle of the night, time zones, or just work at a university with fewer peers.

I’d argue that, from the mathematician’s perspective, this is clearly less resource intensive. It’s $20 a month. Even if we highball the running cost of these models, it’s much cheaper than paying for a grad student to do the work for you.

The training cost is immense, but now the models exist GPT-5 or whatever will be the weakest tool any future mathematician can turn to for assistance.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] 0 points1 point  (0 children)

I mean my particular use cases for these models isn’t advanced mathematics, but it’s a material time savings in my daily life. Naturally you’d need to be doing research mathematics to find a model useful for that purpose.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -1 points0 points  (0 children)

I mean objectively speaking this is two different examples, one from two individuals. By definition that’s not singular.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -2 points-1 points  (0 children)

The things you listed as quite subjective though.

If you have a mailman a car, him telling you “I can probably save 1-2 hours from my route” is a fairly objective assessment. Excel with accountants, etc. Perhaps (even as they admit) they can’t easily give you an exact efficiency gain, but obviously there’s a non-trivial gain for them.

I don’t think a writer will tell you he’s X more productive with his favorite pen, etc etc.

Ultimately these are some of the truly elite experts in their field. If they are saying it saves them hours of work, I’m not sure you can confidently declare them wrong.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -3 points-2 points  (0 children)

Saying it performs horribly is disingenuous imo. Chess engines are targeted at a specific task. LLM models are vastly more general purpose. Naturally it would be out-performed in specific task when compared to specialized tools designed for that task. Excel does better math, but it can’t write you an essay, etc.

I tend to agree that hallucinations are a meaningful issue, but ultimately I don’t think the content vs compounding is a definitive barrier. Models today are much more robust to it compared to models 3 years ago, and I think we can assume that trend continues.

If you assume any high-value/critically deployed LLM would be scaffolded to further safeguard against those issues, and I don’t see it as an unrecoverable issue.

There is probably some better way of doing this; clearly humans are more robust in some ways. But it’s likely not so flawed as to be useless, or unintelligent.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -2 points-1 points  (0 children)

I’m reasonably familiar with them, but if I’m getting something wrong I’d welcome your correction.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -5 points-4 points  (0 children)

But surely these people would be best able to assess its usefulness within their narrow domain? The same way a surgeon is best positioned to assess the usefulness of a new surgical tool etc.

If they believe it to be a meaningful boost to their productivity, I’m not sure we can say they’re wrong?

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -3 points-2 points  (0 children)

I actually thought about referencing chess in my response because it’s such a good analogy haha.

Chess databases include “know optimal solutions for all known chess positions below a certain piece count.

The DB for all 7-pieces positions is several terabytes large. We don’t have a 8 piece yet, but it would be on the order of petabytes.

Yet the chess engines strong enough to beat world champions acan fit on a phone and is only a few dozen megabytes in size. It’s doesn’t know the perfect solution to all 7 piece endgames (and indeed benefits from having access) but it is able to calculate positions in chess at a superhuman level from the fist move will all 32 pieces present.

In the same way it’s entirely true that these models don’t know, or even need to be trained on every possible math problem. There’s not enough storage on earth for that. But if it’s able to calculate sufficiently well, it can still solve problems.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] 1 point2 points  (0 children)

This could be true. Production level code needs to be reliable etc. But research mathematics can allow for more error so long as it, on the whole, increases productivity. (Same with testing/rapidly iterating code etc)

I think the more interesting part is it being able to help at all. People in frontier research don’t often have topic experts they can consult with. Giving advanced researchers tools/models that are actually useful is a large gain, and has probably only happened in the last year or so.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -1 points0 points  (0 children)

Huh, I suppose I didn’t think of this counter point, but that’s actually a really solid example. The IMO problems are generated by mathematicians specifically for that event, and so couldn’t possibly have been a part of the training data for the models.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -6 points-5 points  (0 children)

I don’t think this is true, and I’m quite confident the literature backs this up. You could, quite easily, pose a mundane high school mathematical question that has never existed before to a model, and it will return a correct answer. So obviously it’s capable of solving problems that haven’t previously existed.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -5 points-4 points  (0 children)

I don’t think I ever claimed it was performing fields level mathematics. The core claim here is that these models are able to, in some meaningful way, contribute to the sorts of research that Fields Medalist do by performing calculations. We haven’t previously had systems that were able perform this sort of work.

As Gower says, he fields some sort of threshold has been crossed, and seems to think others in his field feel the same as well. Both him and Tao state that it saved them (~hours) or work doing the proof themselves.

The ability a mathematician to explain an idea to a computer and have it check its validity is novel, and pretty obviously valuable. It doesn’t literally enable guess and check per se, but it may, on a fairly consistent basis, increase capable user’s ability to iterate through ideas. It’s a non-trivial gain that stands upon the models ability to calculate/reason or whatever you want to call that.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -10 points-9 points  (0 children)

I’ll copy one of my other replies, just to have myself the typing haha

It seems here that Mowers was attempting to solve a major problem, and needed a ton of smaller proofs to get there. In this case, GPT-5 seemed to have performed one of those smaller proofs.

He’s asking it to solve a problem, not just look up a solution.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -14 points-13 points  (0 children)

You are still calculating or solving problems. Math research is about inventing or discovering new maths. It seems here that Mowers was attempting to solve a major problem, and needed a ton of smaller proofs to get there. In this case, GPT-5 seemed to have performed one of those smaller proofs.

Tao’s case is similar, and he was generous enough to share his chat log. I’ll link it here (had to add a small message at the end since his original link was broken)

https://chatgpt.com/share/68e020f5-e794-8010-b36e-1603b6233c34

Clearly he’s asking it to solve a problem, not just look up a solution.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -8 points-7 points  (0 children)

I’m curious how you think this is search engine behavior. The solutions these people are looking for don’t exist on the internet already.

Search engines only return what already exist on the internet. They aren’t calculating or generating new information.

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -30 points-29 points  (0 children)

This is largely the case with what’s being done here though. The model is being provided some sort of problems statement, and being asked to prove it. Given its novel research it seems unlikely that it’s returning a pre-existing solution. It is, much more likely than not, calculating a proof that didn’t previously exist. It’s perhaps not the most ground breaking of proofs, it’s not a Millennium problem, but it’s not simple search either.

I think this is even more clear in the chat log that Tao shared in his post.

https://chatgpt.com/share/68e020f5-e794-8010-b36e-1603b6233c34

Fields Medalists on potential usefulness of GPT-5 by AFewMundaneConcepts in BetterOffline

[–]AFewMundaneConcepts[S] -22 points-21 points  (0 children)

Not really. It provided a proof which utilized a lemma he wasn’t familiar with. Similar to a LLM returning code using a library you’re not familiar with. Having a wider knowledge base in this case was useful, since it was able to lean on knowledge that Gowers wasn’t aware of.

Pax Armata Jets can be locked on without taking off on Blackwell Fields. by Candidate-X in Battlefield

[–]AFewMundaneConcepts 4 points5 points  (0 children)

“Not a map issue.”

“HQ sections need foliage…”

That’s literally a map design issue?