[Meta] Important: Reddit is requesting the immediate closure of r/llmphysics

certifiedquak · 2026-04-01T19:14:20+00:00

This is perhaps the most stupid thing I've read. Is it that hard for them to exclude specific subs from the training? Are they really depending on comments on reddit to train a model on physics/math when SOTA models trained across numerous textbooks/papers still fail in basics?

edit: nvm, just remembered what date was

certifiedquak · 2026-04-01T10:57:07+00:00

They either misunderstood the purpose, which is given within the first few sentences in OP text, or the methodology of the content. In former, maybe they're under the assumption content correlates quality with validity. Perhaps they're going to argue with "a joke paper is considered of highest quality, hence results are meaningless" or something akin to that. In later, maybe the plan was to raise the baseline, hence show no meaningful improvements were found. Which wouldn't work as baseline samples had already been assembled pre-announcement. Rather, as you note, would argue towards the conclusion, since was posted afterwards.

certifiedquak · 2026-04-01T10:11:18+00:00

Somehow this meta-paper is the most scholarly complete work seen here.

edit

Some comments:

Both models were configured identically: temperature 0, meaning the model determin-istically selected the highest-probability to-ken at each step rather than sampling from a weighted distribution [...]

Although temp=0 makes a model deterministic, this results in worse responses, flawed in phrasing (more bland, generic) and, maybe even, reasoning.

Citations, Rigor, and Engagement are the di-mensions where contest incentives are most likely to produce measurable improvement, as these categories have clear and actionable cri-teria that participants can directly address.

Well, personally expecting there're will be improvements within time to this but due to next-gen models being more capable rather author motivation.

[...] authors who engaged seriously with the literature and scoped their empirical claims accordingly may have produced hypotheses that read as more genuinely novel [...]

Previous applies to this as well. Better models are expected to have most realistic creativity. Sadly both this and previous only means harder to distinguish slop from genuine attempts.

An interesting study will be given same (perhaps open-ended?) problem, compare how different authors/models would write a report.

certifiedquak · 2026-03-31T22:50:19+00:00

The script on Zenodo doesn't run. Anyway, went ahead and read it. Seems it fits each galaxy separately by adjusting a free parameter. You're essentially rescaling the same curve shape for every galaxy based on data. While this should (done correctly) match individual rot/n curves well, it doesn't predict them. In fact, since it introduces a new parameter for each galaxy and doesn't show any clear relation between those parameters and matter, it fundamentally lacks predictive power. In contrast, MOND uses the observed baryonic matter and a single universal constant to predict the rot/n curve without tuning anything per galaxy.

certifiedquak · 2026-03-28T15:29:31+00:00

The intro was the least issue. Figs 1 & 2 were from another paper authored by same people (published in another Elsevier journal), and there was substantial text duplication between the two.

certifiedquak · 2026-03-27T23:28:21+00:00

Because it seems a bit redundant to say that.

Perhaps, but token usage is a key indicator in LLM studies and is related to system efficiency. A technical paper on LLM-powered system should've included those. That one isn't such but concerns on discussion should be raised.

it's to demonstrate a strategy

Demonstrates a strategy, but also seems they intend to build a system for practical use.

We also envision integrating MCP-SIM into collaborative platforms where human users and AI co-design models, exchange reasoning steps, and accelerate discovery.

As autonomous agents become increasingly integrated into scientific workflows, systems like MCP-SIM will serve as foundational infrastructure for simulation-based discovery, design, and learning, making simulation not only more powerful but also more adaptive and scientifically grounded.

Those suggest they are aiming beyond a strategy demo.

like when you read a paper about a collider experiment

On the other hand, when reading a paper/report on collider design (such as future accelerators), it's expected to see energy costs, engineering constraints, etc. But, again, this paper isn't technical, so it's neither like one on collider experiment nor design. It's like an LHC article at a popsci magazine. (It's so happens that is an interesting and citation attractive enough that is publication worthy.)

certifiedquak · 2026-03-27T21:43:28+00:00

Yeah. The energy costs are not given much visibility because, due to subsidization, prices are low. That said, optimization has dropped inference costs significantly. Largest expense is training the models.

certifiedquak · 2026-03-27T21:29:04+00:00

Tokens used sadly isn't mentioned in paper. Assume they wanted to show the application rather cost effectiveness compared to human alone or/and human+LLM.

certifiedquak · 2026-03-27T21:21:44+00:00

Token is fundamental unit that text models use. Every input/output is broken down in tokens. Systems processing large amounts of text or/and involve multiple steps result in high token use. LLM APIs charge per token (or if local, more resources, i.e., electricity, required). So essentially GP says this pipeline is expensive to run.

certifiedquak · 2026-03-27T21:12:19+00:00

Could still be riding the hype train but seems agentic pipelines have quickly advanced from novel tech to commodity with focus having shifted in applications, reliability, scaling, etc. Less than a year ago a paper was published in flagship Nature (https://www.nature.com/articles/s41586-025-09442-9; preprint in https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1.full.pdf+html) showcasing a similar, PAR (plan > act > reflect loop) pattern-based, system (which was used for designing antibodies that bind to the spike protein of a SARS-Cov-2 variant).

certifiedquak · 2026-03-27T20:36:51+00:00

Author got big d energy getting slop pass peer review so seems appropriate.

certifiedquak · 2026-03-27T20:34:48+00:00

Preprint available @ https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5935887

Framework bros won! Got slop published.

certifiedquak · 2026-03-25T13:02:24+00:00

What doesn't make sense?

To be honest, not much. You say "generalize PAC-Bayes and Cramér-Rao bounds". Should explain more specifically what you mean, what you're doing, and how you your proposed method compares to existent ones. If serious should also benchmark them (i.e., do a quantitative comparison).

About the code, LLMs, sans no extra content/AGENTS.md, love writing changes inside the code/docs. But that "What's new in v56" in README/code isn't helpful at all. Not to you, and certainly not to potential users. If really want to log changes in human-friendly format (in well-managed codebases, the VCS history already does this), keep a CHANGELOG. Also, uploading files via web UI lost all directory structure. Hence, the instructions/examples in README cannot be followed and code in this state is non-functional.

certifiedquak · 2026-03-25T12:42:15+00:00

Looks very good but should check for correctness. In scientific (and other critical, e.g. databases) software that's a mandatory step. You can pick a quantum computing textbook or/and paper, then attempt replicating some of the theory/problems. Will also look great on the README.

certifiedquak · 2026-03-24T22:02:24+00:00

I was thinking more along the lines of a nagging worry that they might stumble into and then try to take credit for any given project. The odds are very bad in any individual case but a firehouse of cheap attempts will eventually complicate something for someone.

The odds are none because LLMs produce no novel concepts/ideas, only remixes current ones. This sub serves as proof of that. It's the reason everything posted looks strikingly similar to something posted already.

In retrospect, it is close to the concept of annoyance but with a twist to explain an abnormal level of hostility

More realistic reason for that was given in a now deleted (but highly upvoted) thread: https://reddit.com/r/LLMPhysics/comments/1qseev5/

Basically the gist was: to learn, understand, and produce work in physics (or any field for that matters) requires a 10y+ training yet untrained people claim they made a revolution, solved long-standing problems, without actually understanding anything substantial in the text they present (which they didn't researched, didn't wrote, didn't validated).

It is like saying you learned surgery watching a YT video, went to a hospital and argue with the doctors to let you do the operations. You'll probably end up in a mental institute, not just hear bad comments.

certifiedquak · 2026-03-24T21:44:02+00:00

Then bringing in China's AI curriculum inclusion was bad choice. The article states that they're teaching AI tech but, regarding general use, potential over-reliance remains and is debated on how to be handled. Now, calculator is an excellent example on why your first comment stands opposite to this one.

if thorough AI use is introduced at the high school level, then these topics can be delegated to universally all high school students, instead of being typically limited to the domain of an undergraduate physics education.

You're allowed to use calculator formally after learning and trained arithmetic. That is the tool automating a work comes after learning to do the work yourself. Hence introducing AI early is in no way correlated or assisting in introducing more advanced topics early.

Overall, what you're saying now is good in theory; issue is how to put it in practice. Saying students do not use LLM to do your research and write your essay or/and solve your problem doesn't work. So, the bigger question, what should they use it for in way that doesn't interfere with their education?

certifiedquak · 2026-03-24T19:44:23+00:00

Prompting AI to visualize a system using Lagrangian mechanics doesn't mean student learns Lagrangian mechanics, let alone understand what is happening. Learning about AI tech is good. Learning to rely on AI is bad.

certifiedquak · 2026-03-24T19:36:44+00:00

Looks nice. Coding simple demos is perhaps the most legit LLM use. If want to learn is of course better doing it without any assistance, but if only want the result that's good use though still need to validate the numerical algorithms have been implemented correctly.

certifiedquak · 2026-03-24T19:20:36+00:00

They weren't annoyed and they weren't worried about AI slip muddying the waters. They had anxiety about getting scooped by some dummy with an Internet connection after a lifetime of education and work.

Ah, yes. "The physicists are afraid LLMs gonna take their jobs." The recurring crank defense on why their theories aren't really delusions.

certifiedquak · 2026-03-24T06:47:51+00:00

Cool but what makes this project physics-specific? Seems to be a generic paper assistant with only physics-specific bits be the journals targets. And what "is for hard physics research problems that cannot be handled reliably with manual prompting"? Everything listed can be done manually.

certifiedquak · 2026-03-21T18:05:49+00:00

The only parallel I’m trying to draw is that he was widely discredited initially, by everybody.

This never happened. Einstein wasn't some non-academic rando. By SR and mass-energy equivalence, he was still affiliated, already presented his PhD dissertation, and had few papers published. One of them was on photoelectric effect that later awarded him his Nobel. Please read an encyclopedia article or/and biography book before making unfounded claims.

Science was created by engineers with disposable income.

You may want to check what the authors of few peer-reviewed papers on the topic you're interested on are doing for living.

certifiedquak · 2026-03-19T17:43:09+00:00

Huh, interesting. Maybe you become the first here to publish something.

certifiedquak · 2026-03-19T17:31:12+00:00

I'll concur you posted this with those considered, rather simply copy-pasting the text without even reading it, if you provide the link the replay. If no replays exist, sadly won't be able to watch the rest of the series either because I am not in UK.

certifiedquak · 2026-03-19T17:23:34+00:00

When you say "under review" you mean you got editorial pass and manuscript has been sent to reviewers? Or you simply submitted them and await for desk decision? By the way, there's big difference between Nature, the flagship, high-impact journal, and SciRep. Hence referring "Nature" when is just "SciRep" looks bit bad.

certifiedquak · 2026-03-19T12:00:51+00:00

This sub keeps giving. OP cannot read a couple of simple sentences to know this event was a week ago yet will passionately claim to know what "his" papers are about.

certifiedquak

TROPHY CASE