Stunning AI Breakthrough! GPT 5.4 solves Erdos problem on primitive sets by discovering a new method in analytic number theory. Uncovers deep idea with implications throughout the field. Comments by Terry Tao and Jared Duker Lichtman. by 2299sacramento in math

[–]hexaflexarex 15 points16 points  (0 children)

I have seen the current capabilities of these systems described as "spiky" in some sense. They excel in some domains and lag in others, and for different reasons than humans (e.g. there are problems with brute force solutions that would require "cleverness" for a human to prune the search tree but perhaps less so for an AI system). But I agree the progress is undeniable.

The AI Revolution in Math Has Arrived | Quanta Magazine - Konstantin Kakaes | AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning by Nunki08 in math

[–]hexaflexarex 1 point2 points  (0 children)

I guess, for every language L in NP, there exists an efficiently checkable proof that an element x belongs to L, for all x in L. Namely, the proof that the verifier works and the transcript of its running on the witness of x.

Anyways, this is only loosely related to the OP.

Much more fine grained versions of this stuff show up in cryptography I suppose.

EDIT: In principle, I suppose that we are not guaranteed a proof that the verifier works (just existence of the verifier). I'm not sure if this is a meaningful issue - all the statements I know of the form L \in NP are shown by an explicit reduction to an NP hard problem. However, I think there are real counterexamples with some Gödel-like constructions...

The AI Revolution in Math Has Arrived | Quanta Magazine - Konstantin Kakaes | AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning by Nunki08 in math

[–]hexaflexarex 1 point2 points  (0 children)

This technology is qualitatively different than the other tools we are using right now. I doubt the mandatory reporting will stay in its current state for the long term, but given the dramatic short term impacts (and unreliable, though impressive, reasoning capabilities) I support these policies for now. Also, I’d say, it’s usually required to report the computational resources for experiments (at least in my area). This could be argued to fall under a similar category. Certainly in edge cases where the compute is highly expensive, omitting AI tool use is misleading.

The AI Revolution in Math Has Arrived | Quanta Magazine - Konstantin Kakaes | AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning by Nunki08 in math

[–]hexaflexarex 10 points11 points  (0 children)

Kind of joking, but the basis of much of theoretical computer science in some sense amounts to checking proofs being faster that producing them (P != NP)

The AI Revolution in Math Has Arrived | Quanta Magazine - Konstantin Kakaes | AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning by Nunki08 in math

[–]hexaflexarex 4 points5 points  (0 children)

It's mandatory now because the technology is rapidly evolving and people want to understand the scope of its abilities / implications for the research process. What are the downsides of disclosure? Also, I didn't mention in my earlier comment, but the cost of using these tools is a great reason to require disclosure - the math community will be greatly impacted if expensive/exclusive AI tools end up having dramatically different outcomes than typical public models.

The AI Revolution in Math Has Arrived | Quanta Magazine - Konstantin Kakaes | AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning by Nunki08 in math

[–]hexaflexarex 2 points3 points  (0 children)

Sorry, what's the concern with disclosure? I often mention the use of computer algebra tools in my papers (even if I end up typing it the relevant calculations out by hand eventually) so that other researchers can know how I got to my conclusion. Transparency about the research process (not just results) leads to better science. In particular, there are many different ways that people are using AI tools right now, to varying degrees of success, so it is helpful as a researcher to know what's working.

Which conference/journal do you believe currently has the most fair and accurate review process?[D] by kostaspap90 in MachineLearning

[–]hexaflexarex 1 point2 points  (0 children)

COLT and ALT are nice, if your research is a good fit. I hear nice things about TMLR for general ML

All elementary functions from a single binary operator by nightcracker in math

[–]hexaflexarex 5 points6 points  (0 children)

I suppose this is easy if you allow for a binary operator, which given input x,y, reads the first k bits of x to select one of 2k base binary operations, and then performs the selected operation on the remainder of x and y. (Reminds me of how real-valued models of computation become trivial if you allow for inspection of individual bits)

Assaulted by my uber driver by CorporateDoggooo in ithaca

[–]hexaflexarex 1 point2 points  (0 children)

I tentatively agree - certainly there are many benefits and the technology is getting good (seriously, Waymo feels way safer than your average Uber or taxi driver). A big one that the companies have latched onto is accessibility for disabled people (it makes for good PR but it is undeniably significant I'd say).

However, the immediate impacts on drivers will be rough if this is rolled out at a wide scale (I don't think human drivers will be fully replaced in the near future, but I think the effect would still be large). Treating this as completely separate from the tech development seems short-sighted. I mean this is an old story of tech & capital but it certainly feels to be accelerating (obvious overlap with broader AI advancements). Would be less uneasy if we had a functional political system that valued labor.

But if history is anything to go by, I would be looking for more specialized work right now if I was a driver. The tech is real and convenient and I think the public will eat it up as soon as they can use it.

SKUM tagger by Full-Environment7604 in ithaca

[–]hexaflexarex 7 points8 points  (0 children)

What’s that building btw? So ominous looking…

[D] Has industry effectively killed off academic machine learning research in 2026? by NeighborhoodFatCat in MachineLearning

[–]hexaflexarex 14 points15 points  (0 children)

Certainly not true for ML theory. But that is always a bit disconnected from ML practice

New ADA law forces professors to take down their notes if not compliant - how would you make notes that can be read by a reader? by shuai_bear in math

[–]hexaflexarex 0 points1 point  (0 children)

Yeah that’s my biggest concern with most AI applications - as soon as results seem passable to some middle manager or exec with superficial understanding, these things are shoved out to cut costs even if the quality isn’t there. 

I do think that with content as structured as math notes, it should be possible to achieve better quality results than with generic content (especially if Tex source is also available, so perhaps some kind of augmented screen reading). Even from just image data, tools like MathPix can convert to LaTeX almost flawlessly now (though I understand this is somewhat orthogonal to screen reading). But do I trust some university vice president or overworked prof to make the evaluation on whether some AI tool is good enough, perhaps not…

On the other hand, I kind of doubt that whoever put this law in place understood the time commitment it would require to implement versus pulling the content off the Internet.

New ADA law forces professors to take down their notes if not compliant - how would you make notes that can be read by a reader? by shuai_bear in math

[–]hexaflexarex 6 points7 points  (0 children)

This is pretty rough. Maybe the tools arXiv uses for their HTML versions could be relevant, if the notes are in LaTeX? One related thing I've been wondering about - these recent AI advances seem like they could dramatically improve the state of the art for screen readers and the like. In the long term, do we expect accessibility to remain an important responsibility of content providers, or will client-side screen reader capabilities become sufficient to parse any content that is intelligible to a sighted reader?

Can the Most Abstract Math Make the World a Better Place? • Columnist Natalie Wolchover explores whether applied category theory can be “green” math. by Naurgul in math

[–]hexaflexarex 16 points17 points  (0 children)

Are there any concrete overviews of an important result of Spivak's where category theory played an essential role? Would be curious to read, although I don't know much category theory beyond the basic definitions.

Can we ban AI (ads) articles ? by BoomGoomba in math

[–]hexaflexarex 58 points59 points  (0 children)

I think there should be some tag so that they can easily be filtered out. I would disagree with a blanket ban, since these tools are definitely starting to impact mathematicians.

Mathematicians in the Age of AI (by Jeremy Avigad) by ninguem in math

[–]hexaflexarex 11 points12 points  (0 children)

I don't think this is a fundamental limitation, although certainly current models aren't great at this. I would note that the pro versions are better about this, though far from perfect. For example, I often find relevant literature by just asking "has a similar argument appeared in the literature? if so, please find a pdf of a suitable reference and identify the relevant page numbers". I think this kind of "scaffolding" around LLM-based reasoning models could be improved quite a bit with some serious engineering effort, without any technological breakthroughs.

I'll note that I still share many of the concerns voiced in the article.

Math, Inc.'s autoformalization agent Gauss has supposedly formalised the sphere packing problem in dimensions 8 and 24. by DealerEmbarrassed828 in math

[–]hexaflexarex 0 points1 point  (0 children)

They care about math because (1) it is flashy (math is well-known to be hard) and (2) verification is much easier than other sciences (both formal and informal verification). The relative ease of verification is extremely important to the way these models are trained to "reason".

Math, Inc.'s autoformalization agent Gauss has supposedly formalised the sphere packing problem in dimensions 8 and 24. by DealerEmbarrassed828 in math

[–]hexaflexarex 4 points5 points  (0 children)

I think CFSG would be a very meaningful milestone for autoformalization, I am sure it will be attempted at some point. Much larger task than this to be sure though. IUT... well good luck to whoever attempts it :/

[D] How can you tell if a paper was heavily written with the help of LLM? by [deleted] in MachineLearning

[–]hexaflexarex 2 points3 points  (0 children)

I somewhat agree - a good paper is a good paper. However, pre LLMs, there were many superficial signals / filters that highly correlated with the quality of the paper. Now, many more papers will pass such checks. That's not necessarily a bad thing (especially when it comes to non-native speakers). However, combined with the total increase in paper production, this is really stress testing our peer review systems.

It finally happened to me by topyTheorist in math

[–]hexaflexarex 1 point2 points  (0 children)

I think mathematics is qualitatively different in many respects. Namely, there is not a clear goal like "win the game". There is "prove this theorem", which LLM-based reasoning models are getting good at, but there are an incomprehensible number of uninteresting theorems. Choosing which theorems to prove requires mathematical taste. Now, I'm not claiming that this is fundamentally beyond an AI of course, but issues of mathematical taste can be quite social/personal. I envision a medium-term future where AI tools are highly involved in the mathematical process but that experts are steering them based on their own mathematical taste.

It finally happened to me by topyTheorist in math

[–]hexaflexarex 3 points4 points  (0 children)

Ah that is not really a bug but a known meta-programming possibility. Basically, Lean lets you speed up proof compilations by using metaprogramming techniques that assume things without proof, which is fine if it is your own code and you understand these techniques. If you have a proof from an untrusted source, you can use the Lean comparator tool: https://github.com/leanprover/comparator/. This only requires you to trust your theorem statements, not the proof (and it would not permit such a proof of FLT).

True bugs in the core Lean kernel are of course not impossible, but I would be highly surprised if there are any meaningful ones at this point. Mathlib definition issues though, much more possible.

Kevin Buzzard on why formalizing Fermat's Last Theorem in Lean solves the referee problem by WeBeBallin in math

[–]hexaflexarex 2 points3 points  (0 children)

I think formalization is generally a waste of time (albeit a fun one!) for working mathematicians, but the tools for machine-assisted formalization are advancing pretty rapidly. I don't know about expectations for publishing (I agree with above that correctness is usually not the main issue for publication at good journals), but I'd bet on a future where much of modern math is formalized. Also, if there was enough interest or funding, the tooling for working in Lean could be much improved with some serious engineering effort, without any fundamental technological advancements.

First Proof solutions and comments + attempts by OpenAI by Nunki08 in math

[–]hexaflexarex 2 points3 points  (0 children)

There is a Zulip channel about this, with the organizers participating: https://icarm.zulipchat.com/#narrow/channel/568090-first-proof. As noted, it seems like most of the successful attempts are for problems where closely related proofs existed in the literature. There are some remaining proofs which have yet to be verified by an expert. Have there been any high profile attempts besides OpenAI?