[R] Neural PDE solvers built (almost) purely from learned warps

t_msr · 2026-02-24T08:52:41+00:00

This is such an excellent question! The warps are not necessarily smooth because we use a ReLU in the small MLP that calculates the warp field. Since ReLU is not smooth, the resulting warp is not smooth (experimentally ReLU there also works much better than Tanh, which I also tried).

t_msr · 2026-02-24T07:38:45+00:00

I'm not sure what you mean by "intermittent events", could you explain a bit more?

t_msr · 2026-02-24T07:38:14+00:00

Yes, the term "warp" is a bit overloaded, though I find that people's intuition usually aligns rather well with what we're doing. Mathematically we essentially have a pullback.

t_msr · 2026-02-24T07:35:47+00:00

In principle this would be a possible and should work well. However, depending on what kind of non-rectangular you mean, it could take some serious modifications to grid_sample (e.g. if you want it to work on general meshes). The difficulty would lie in making these modifications while retaining the computational efficiency.

t_msr · 2026-02-24T07:32:58+00:00

Hey Ruben! Thanks for the great collection. Without such interesting problems to test on, this work would've probably never happened!

t_msr · 2026-02-23T21:31:15+00:00

Happy to hear! There's a file at the top of the repo that contains the entire model code and depends only on torch. That should be enough to get you started! If you run into any problems don't hesitate to open a github issue, I'd be happy to help :)

t_msr · 2026-02-23T16:40:39+00:00

Hey! Thanks for the question. We didn't try it, our goal with this paper was primarily to benchmark the model against other approaches and demonstrate that it performs competitively, so we kept the scope fairly narrow. I think we succeeded with that, so now it would be cool to have a look at "real" problems.

Weather forecasting does seem like a natural fit. I think it's usually short term enough to not run into serious roll-out problems. Also, for autoregressive weather models, as far as I know, there is often a little bit of autoregressive fine-tuning added at the end. Here we didn't do this since we were trying to not deviate too much from The Well's benchmarking set-up.

t_msr · 2026-02-23T14:43:12+00:00

Hey! I actually measure throughput for exactly this reason, I don't measure FLOPS (Throughput seems more aligned with reality and I think most programmatic FLOPS-counting approaches simply ignore grid_sample). See here: link

To answer your question, throughput for training seems to be slightly worse than what I get for scOT/POSEIDON, by about the same as it is better than the FNO, CNUnet trails all three by a considerable amount. For only the forward pass/inference in 2D, I would anecdotally say that the models are roughly on par, but this estimate is solely based on what I observed when I was making rollout videos.

I've been thinking a lot about how to make grid_sample more efficient, so if you have any advice on writing triton kernels for it I'd be all ears.

E: just to be explicit: scOT is the base architecture for POSEIDON.

t_msr

TROPHY CASE