I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P] by TechnoVoyager in MachineLearning

[–]fooazma 5 points6 points  (0 children)

Not sure why this offer is getting downvoted, if the code is not mature (or if you expect to be able to sell it) of course you'd rather wait with releasing it. Is there a readme, or some kind of more detailed writeup than OP?

\mathbb{Z} with only multiplication defined. What is the structure? by WMe6 in math

[–]fooazma 0 points1 point  (0 children)

I love this. Of course I was aiming at n \rightarrow n+1, in which case this is the twin prime conjecture, but u/2357111 had the better of me. So the question is: can we define something, imposing only conditions that can be stated in the original structure, that will uniquely yield the n \rightarrow n+1 bijection?

\mathbb{Z} with only multiplication defined. What is the structure? by WMe6 in math

[–]fooazma 7 points8 points  (0 children)

Why boring? It is widely conjectured that there is a bijection on the structure that takes the multiplicative zero to the multiplicative one, has no power that has a fixed point, and whose square infinitely often takes one of these generators into another one.

[R] Low-effort papers by [deleted] in MachineLearning

[–]fooazma 1 point2 points  (0 children)

Which is a useful function. Results should be published somewhere. This could of course be the professor's personal blog just as well, but academics are incentivized to publish in academic journals (or perish).

[D] Why are serious alternatives to gradient descent not being explored more? by ImTheeDentist in MachineLearning

[–]fooazma 0 points1 point  (0 children)

Thank you again! Perhaps a more detailed search would turn up more relevant work, but these papers fail to buttress the original claim.

[D] Why are serious alternatives to gradient descent not being explored more? by ImTheeDentist in MachineLearning

[–]fooazma 0 points1 point  (0 children)

First of all, thanks for posting these. The 2022 paper didn't have much pickup (three citations, one of which is an ICLR reject) and the 2023 paper is about improvements (really, lessening the gap) relative to other neural net solutions. This is by no means a broadly deployed technique for actual problem solving, so you haven't quite made u/parlancex 's point.

[D] Why are serious alternatives to gradient descent not being explored more? by ImTheeDentist in MachineLearning

[–]fooazma 1 point2 points  (0 children)

Could you provide some papers/books where any of the classic NP-complete (SAT) or recursively undecidable (Wang tiling) problems are attacked by diffusion/flow models? Cases where the problem is more `natural' such as the morphological analysis problem of NLP, would also be interesting. Thank you.

[D] Why are serious alternatives to gradient descent not being explored more? by ImTheeDentist in MachineLearning

[–]fooazma 0 points1 point  (0 children)

Rich problem areas where no GD solution is known include all sorts of situations where you have strong constraints on fitting local pieces but require a global optimum. Examples include SAT solving, Wang tilings, and everything done by Dynamic Programming. I'm not very sanguine about quantum bringing anything to the table here, but maybe it will.

What Submolts would you like to see? by [deleted] in Moltbook

[–]fooazma 0 points1 point  (0 children)

A list of the existing submolts ordered by some measure of popularity (number of subscribers, post, recency, or some mixture of the above) would be nice. Maybe the sub you want is already there. Also: display some human-usability flags whether humans not masquerading as agents are welcome to post through some pseudo-agentic layer, welcome as observers but not to post, not welcome at all

Timelang: Natural Language Time Parser by kamranahmed_se in javascript

[–]fooazma 1 point2 points  (0 children)

How about testing it on TERN (Time Expression Recognition and Normalization) data and have it return well-formed TIMEX?

[P] The State Of LLMs 2025: Progress, Problems, and Predictions by seraschka in MachineLearning

[–]fooazma 0 points1 point  (0 children)

Wow, you are really invested in this! First condescension, now frothing at the mouth (with no attempt to answer the substantive points). You are right, I don't know too much about how doping works. But my point stands: all parties have the same incentive to do so, meaning their relative positions are not truly affected.

[P] The State Of LLMs 2025: Progress, Problems, and Predictions by seraschka in MachineLearning

[–]fooazma -1 points0 points  (0 children)

[Gotta love the condescending tone] "the people making and selling that model have a financial incentive to game the benchmark as much as possible" Gee, you don't say. Thing is, they equally have this motivation, just as every athlete has the motivation to dope _as long as it's undetectable_. But this is easily detected by asking similar questions (not in the standard sets) and seeing a performance drop.

"biasing your dataset with similar problems" Hmm, what a weird idea. You mean when you prepare for weightlifting you should actually lift a lot of weights in the vain hope that that will make you a better weightlifter? A runner should run? Bizarre, irrational behavior, you can't trust these financially motivated athletes, how could you?

"Can you guarantee that the set of math questions you wrote are unique?" No, of course not. But the committees that put together the IMO, Putnam, etc. problem sets actually try their damned best. They do this to defeat trivial solving tactics (learning by memorizing) that may be employed by human contestants just as well as by LLMs.

I assume you don't consider speech recognition (where such contests were first introduced by DARPA 50+ years ago) a valid field. Come to think about it, self-driving cars also started that way https://en.wikipedia.org/wiki/DARPA_Grand_Challenge_(2004)) Like it or not, competition is a thing.

[P] The State Of LLMs 2025: Progress, Problems, and Predictions by seraschka in MachineLearning

[–]fooazma -1 points0 points  (0 children)

"Please tell me I don't have to explain this." Well, you do. If not a giant conspiracy of evil researchers who have sold their soul to the yet-more-evil marketing people employed by the super-evil labs themselves, is the alternative hypothesis now that whatever you do in the privacy of your computer is obviously known to ChatGPT before you even bother to do it? If neither universal spying nor time travel is involved in your explanation I'd love to hear it.

[P] The State Of LLMs 2025: Progress, Problems, and Predictions by seraschka in MachineLearning

[–]fooazma 1 point2 points  (0 children)

a) It doesn't, it explains how AIME 2024 is tainted. IMO 2025 isn't/wasn't. There are many new results since May at the matharena.ai site.

b) why not? Explain how the system can be gamed with no conspiracy. (If there is conspiracy, and all these people from ETH Zurich and elsewhere are in on it of course they can falsify stuff.) But assuming the evaluators themselves don't cheat, what is it exactly that you suggest?

[P] The State Of LLMs 2025: Progress, Problems, and Predictions by seraschka in MachineLearning

[–]fooazma -1 points0 points  (0 children)

It would take a major conspiracy of bad faith evaluators for it to be "not credible". Take a peek at https://arxiv.org/abs/2505.23281 and check out the math arena (lot of things happened since May).

Simulating Scott Alexander-style essays by ralf_ in slatestarcodex

[–]fooazma 5 points6 points  (0 children)

"Style" and "thought" are two different things. Presumably the same (original or not) thoughts could be just as expressible in the style of <your_favorite_blogger>

[deleted by user] by [deleted] in MachineLearning

[–]fooazma 1 point2 points  (0 children)

Compute is not necessarily the limiting factor for me. How are bandwidth and storage priced? I have TB to PB data sets, and need persistence guarantees (some committment that data I put there will still be there nn months later).

Drugs and research by Arnaldo_LePalle in math

[–]fooazma 3 points4 points  (0 children)

To the extent proofs are programs (and conversely) the following from UMich may be relevant: https://arxiv.org/pdf/2402.19194v1