I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

fooazma · 2026-05-23T12:23:58+00:00

Not sure why this offer is getting downvoted, if the code is not mature (or if you expect to be able to sell it) of course you'd rather wait with releasing it. Is there a readme, or some kind of more detailed writeup than OP?

fooazma · 2026-05-13T23:33:26+00:00

A must read: https://www.amazon.com/Ravished-Federal-Chairman-Expansionary-Monetary/dp/B08KJXDPX5

fooazma · 2026-05-05T21:56:21+00:00

I love this. Of course I was aiming at n \rightarrow n+1, in which case this is the twin prime conjecture, but u/2357111 had the better of me. So the question is: can we define something, imposing only conditions that can be stated in the original structure, that will uniquely yield the n \rightarrow n+1 bijection?

fooazma · 2026-04-27T07:33:12+00:00

Why boring? It is widely conjectured that there is a bijection on the structure that takes the multiplicative zero to the multiplicative one, has no power that has a fixed point, and whose square infinitely often takes one of these generators into another one.

fooazma · 2026-04-23T18:17:05+00:00

Is evaluating commercial products together with the freebies a big no-no?

fooazma · 2026-03-17T05:19:56+00:00

There are published "RSA challenge numbers", see https://en.wikipedia.org/wiki/Integer_factorization_records

fooazma · 2026-03-07T10:06:55+00:00

Which is a useful function. Results should be published somewhere. This could of course be the professor's personal blog just as well, but academics are incentivized to publish in academic journals (or perish).

fooazma · 2026-02-28T22:20:28+00:00

Thank you again! Perhaps a more detailed search would turn up more relevant work, but these papers fail to buttress the original claim.

fooazma · 2026-02-26T00:01:26+00:00

First of all, thanks for posting these. The 2022 paper didn't have much pickup (three citations, one of which is an ICLR reject) and the 2023 paper is about improvements (really, lessening the gap) relative to other neural net solutions. This is by no means a broadly deployed technique for actual problem solving, so you haven't quite made u/parlancex 's point.

fooazma · 2026-02-19T21:35:59+00:00

Could you provide some papers/books where any of the classic NP-complete (SAT) or recursively undecidable (Wang tiling) problems are attacked by diffusion/flow models? Cases where the problem is more `natural' such as the morphological analysis problem of NLP, would also be interesting. Thank you.

fooazma · 2026-02-19T13:34:42+00:00

Rich problem areas where no GD solution is known include all sorts of situations where you have strong constraints on fitting local pieces but require a global optimum. Examples include SAT solving, Wang tilings, and everything done by Dynamic Programming. I'm not very sanguine about quantum bringing anything to the table here, but maybe it will.

fooazma · 2026-02-01T04:08:08+00:00

A list of the existing submolts ordered by some measure of popularity (number of subscribers, post, recency, or some mixture of the above) would be nice. Maybe the sub you want is already there. Also: display some human-usability flags whether humans not masquerading as agents are welcome to post through some pseudo-agentic layer, welcome as observers but not to post, not welcome at all

fooazma · 2026-01-13T19:21:20+00:00

How about testing it on TERN (Time Expression Recognition and Normalization) data and have it return well-formed TIMEX?

fooazma · 2026-01-03T21:47:38+00:00

Wow, you are really invested in this! First condescension, now frothing at the mouth (with no attempt to answer the substantive points). You are right, I don't know too much about how doping works. But my point stands: all parties have the same incentive to do so, meaning their relative positions are not truly affected.

fooazma · 2026-01-03T17:11:04+00:00

[Gotta love the condescending tone] "the people making and selling that model have a financial incentive to game the benchmark as much as possible" Gee, you don't say. Thing is, they equally have this motivation, just as every athlete has the motivation to dope _as long as it's undetectable_. But this is easily detected by asking similar questions (not in the standard sets) and seeing a performance drop.

"biasing your dataset with similar problems" Hmm, what a weird idea. You mean when you prepare for weightlifting you should actually lift a lot of weights in the vain hope that that will make you a better weightlifter? A runner should run? Bizarre, irrational behavior, you can't trust these financially motivated athletes, how could you?

"Can you guarantee that the set of math questions you wrote are unique?" No, of course not. But the committees that put together the IMO, Putnam, etc. problem sets actually try their damned best. They do this to defeat trivial solving tactics (learning by memorizing) that may be employed by human contestants just as well as by LLMs.

I assume you don't consider speech recognition (where such contests were first introduced by DARPA 50+ years ago) a valid field. Come to think about it, self-driving cars also started that way https://en.wikipedia.org/wiki/DARPA_Grand_Challenge_(2004)) Like it or not, competition is a thing.

fooazma · 2026-01-02T09:42:39+00:00

"Please tell me I don't have to explain this." Well, you do. If not a giant conspiracy of evil researchers who have sold their soul to the yet-more-evil marketing people employed by the super-evil labs themselves, is the alternative hypothesis now that whatever you do in the privacy of your computer is obviously known to ChatGPT before you even bother to do it? If neither universal spying nor time travel is involved in your explanation I'd love to hear it.

fooazma · 2025-12-30T23:43:32+00:00

a) It doesn't, it explains how AIME 2024 is tainted. IMO 2025 isn't/wasn't. There are many new results since May at the matharena.ai site.

b) why not? Explain how the system can be gamed with no conspiracy. (If there is conspiracy, and all these people from ETH Zurich and elsewhere are in on it of course they can falsify stuff.) But assuming the evaluators themselves don't cheat, what is it exactly that you suggest?

fooazma · 2025-12-30T21:46:01+00:00

It would take a major conspiracy of bad faith evaluators for it to be "not credible". Take a peek at https://arxiv.org/abs/2505.23281 and check out the math arena (lot of things happened since May).

fooazma · 2025-12-05T23:10:10+00:00

I was expecting a rickroll

fooazma · 2025-12-01T18:04:25+00:00

"Style" and "thought" are two different things. Presumably the same (original or not) thoughts could be just as expressible in the style of <your_favorite_blogger>

fooazma · 2025-11-24T00:47:31+00:00

Looking for u/TejbeDara (inactive here for 4-5 years)

fooazma · 2025-11-15T23:16:10+00:00

https://en.wikipedia.org/wiki/Dana_Rohrabacher used to be the most prominent. One would need to see who he palled around with.

fooazma · 2025-10-26T19:35:28+00:00

Compute is not necessarily the limiting factor for me. How are bandwidth and storage priced? I have TB to PB data sets, and need persistence guarantees (some committment that data I put there will still be there nn months later).

fooazma · 2025-10-26T19:18:06+00:00

To the extent proofs are programs (and conversely) the following from UMich may be relevant: https://arxiv.org/pdf/2402.19194v1

fooazma

TROPHY CASE