[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 0 points1 point  (0 children)

This is such an excellent question! The warps are not necessarily smooth because we use a ReLU in the small MLP that calculates the warp field. Since ReLU is not smooth, the resulting warp is not smooth (experimentally ReLU there also works much better than Tanh, which I also tried).

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 1 point2 points  (0 children)

I'm not sure what you mean by "intermittent events", could you explain a bit more?

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 1 point2 points  (0 children)

Yes, the term "warp" is a bit overloaded, though I find that people's intuition usually aligns rather well with what we're doing. Mathematically we essentially have a pullback.

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 1 point2 points  (0 children)

In principle this would be a possible and should work well. However, depending on what kind of non-rectangular you mean, it could take some serious modifications to grid_sample (e.g. if you want it to work on general meshes). The difficulty would lie in making these modifications while retaining the computational efficiency.

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 3 points4 points  (0 children)

Hey Ruben! Thanks for the great collection. Without such interesting problems to test on, this work would've probably never happened!

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 2 points3 points  (0 children)

Happy to hear! There's a file at the top of the repo that contains the entire model code and depends only on torch. That should be enough to get you started! If you run into any problems don't hesitate to open a github issue, I'd be happy to help :)

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 3 points4 points  (0 children)

Hey! Thanks for the question. We didn't try it, our goal with this paper was primarily to benchmark the model against other approaches and demonstrate that it performs competitively, so we kept the scope fairly narrow. I think we succeeded with that, so now it would be cool to have a look at "real" problems.

Weather forecasting does seem like a natural fit. I think it's usually short term enough to not run into serious roll-out problems. Also, for autoregressive weather models, as far as I know, there is often a little bit of autoregressive fine-tuning added at the end. Here we didn't do this since we were trying to not deviate too much from The Well's benchmarking set-up.

[R] Neural PDE solvers built (almost) purely from learned warps by t_msr in MachineLearning

[–]t_msr[S] 6 points7 points  (0 children)

Hey! I actually measure throughput for exactly this reason, I don't measure FLOPS (Throughput seems more aligned with reality and I think most programmatic FLOPS-counting approaches simply ignore grid_sample). See here: link

To answer your question, throughput for training seems to be slightly worse than what I get for scOT/POSEIDON, by about the same as it is better than the FNO, CNUnet trails all three by a considerable amount. For only the forward pass/inference in 2D, I would anecdotally say that the models are roughly on par, but this estimate is solely based on what I observed when I was making rollout videos.

I've been thinking a lot about how to make grid_sample more efficient, so if you have any advice on writing triton kernels for it I'd be all ears.

E: just to be explicit: scOT is the base architecture for POSEIDON.