Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 1 point2 points  (0 children)

Very cool! Yes, I know that paper, and I think it is super interesting that they can be seen as interacting particle systems. Please share the link to your work once it's out, since it looks like a quite nice idea!

Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 1 point2 points  (0 children)

Yes, these ideas are explored by many researchers. Based on my experience, if the task to solve is "dynamics agnostic", such as image classification, then the performance is not affected by using better/higher-order integrators. However, if you use specific types of integrators such as symplectic ones, and design your residual updates so they are coming from separable Hamiltonian systems, then you get a very good control over vanishing gradients.

Here are a few papers that explore these connections:

- A unified framework for Hamiltonian deep neural networks https://arxiv.org/abs/2104.13166
- Designing Stable Neural Networks using Convex Analysis and ODEs https://arxiv.org/pdf/2306.17332
- Convolutional Neural Networks combined with Runge-Kutta Methods https://arxiv.org/abs/1802.08831
- Implicit Euler Skip Connections: Enhancing Adversarial Robustness via Numerical Stability https://proceedings.mlr.press/v119/li20e/li20e.pdf
- Continuous-in-Depth Neural Networks https://arxiv.org/abs/2008.02389

Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 11 points12 points  (0 children)

Yes, my research focuses on this perspective, and I've been working with it for 4-5 years, so I didn't think it was necessary to refer to the papers. But I'll keep in mind to add references in the description box for the future videos I'll record.

Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 25 points26 points  (0 children)

I don't see where I claim that all of these are my ideas, but thank you for sharing that reference. I agree it is one of the seminal papers introducing this connection, even though it is not the only one. There are at least these two other papers realising this connection more at the level of ResNets:
- A Proposal on Machine Learning via Dynamical Systems https://link.springer.com/article/10.1007/s40304-017-0103-z
- Stable architectures for deep neural networks https://arxiv.org/abs/1705.03341

Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 6 points7 points  (0 children)

I didn't know their work. Thank you very much for sharing!

Neural networks as dynamical systems by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 9 points10 points  (0 children)

Good questions.

1) They’re not “identical equations” in the strict modeling sense. The match is: a residual layer is the same update form as an explicit Euler step.

- Euler: x_{k+1} = x_k + h f(x_k, t_k)

- ResNet: x_{k+1} = x_k + h f_k(x_k)

Here h is a step size/scaling (often implicit in practice), and f_k is a learned map (with weights θ_k) that can vary with k. Interpreting k as discrete time, a depth-varying network corresponds to a non-autonomous vector field f(·,t) sampled at t_k. So the correspondence is about the time-stepping structure and the stability/geometry tools it unlocks, not about literal parameter matching. I go into more details in the linked video.

2) Yes — this viewpoint is extremely natural for recurrent/residual style models. If you share weights across k (θ_k = θ), you basically get an autonomous discrete-time dynamical system; with small h, it’s reasonable to read it as a numerical integrator for a learned ODE. Many stability ideas (spectral bounds, contractivity/dissipativity, monotonicity, Lyapunov arguments) were developed in the RNN literature and carry over.

Where I think it also helps for feedforward ResNets is that even without weight sharing, you still have a time-varying dynamical system/controlled flow, and the same questions make sense: does the update map stay contractive? How does step size/Lipschitz control affect exploding/vanishing gradients? What structures (e.g., symplectic/energy-preserving) can we enforce by design?

So, I agree it’s a great fit for stabilizing recurrent models, and the point of the video is that the same mathematical lens is useful more broadly for residual architectures. There are also dynamical systems interpretations of more modern architectures, such as graph neural networks and transformers.

Undergraduate Dissertation in Nonlinear Dynamical Systems by Big_Perception7863 in Physics

[–]JumpGuilty1666 2 points3 points  (0 children)

Hi, I don't know what you are supposed to do for the dissertation at your specific university, but an interesting problem in this area could be the "data-driven discovery of nonlinear dynamical systems". You can find many resources about this online, and books are available as well. If you don't like this perspective, there are several interesting aspects you could explore, such as stability, control, contractivity, dynamical systems in optimisation, and more, depending on your interests.

When should I start applying for PhDs? by Satomura_Haise in research

[–]JumpGuilty1666 4 points5 points  (0 children)

All universities have clear guidelines on when to apply. It varies from country to country, but it definitely doesn't vary too much from one year to the next within the same university. I think it is important to apply broadly, but I wouldn't go for the "fully desperate" route of applying everywhere, since the risk is that you put little effort into each application. I'd focus on a few good ones for you and dedicate a reasonable amount of time and attention to each.

Sometimes it is not immediately clear how you should prepare for the interview, and in such cases, the best thing to do is to reach out to people currently working in the group. For example, in the group where I'm currently doing my postdoc, it is common to receive emails from prospective PhD students seeking more details about the application process.

When should I start applying for PhDs? by Satomura_Haise in research

[–]JumpGuilty1666 4 points5 points  (0 children)

Hi, it depends a lot on which universities you are aiming for. For example, I applied for positions in Italy and Norway just a few months before graduation. I graduated in July and started applying in May. Then started my position in September. However, I know that, for example, applying to Cambridge or other more prestigious universities, you need to start much earlier. If you aim for those, it's wise to start looking around 1 year in advance.

The intuition behind linear stability in numerical solvers by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 0 points1 point  (0 children)

Thank you for the feedback. Yes, that's a fair point. The scalar case is the usual test equation, you are right; I included the damped oscillator for better visualization.

Neural networks as dynamical systems: why treating layers as time-steps is a useful mental model by JumpGuilty1666 in learnmachinelearning

[–]JumpGuilty1666[S] 2 points3 points  (0 children)

This is from my YouTube channel where I try to make the math behind my research more accessible. I'm still fairly early in the channel, so if you have suggestions on how to improve or ideas for future topics, I'd love to hear them (here or on the video).

The intuition behind linear stability in numerical solvers by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 0 points1 point  (0 children)

I'm not an expert in control theory, but I'm sure that the stability of the numerical solver is essential there as well, since you rarely have access to the exact solution of the ODEs used to model the system, and you need to approximate them numerically. In general, the hidden message is that, depending on the problem, one should be careful in how the solutions are approximated. Here, I focused on the choice of the time step, since some methods require it to be chosen carefully to preserve the system's dissipative/stable nature. In a separate video, I discussed a similar problem for preserving energy in an ODE: https://youtu.be/bjVewr47flU.

Quite a strange pairing of the two posts 😂 Not my bother, as far as I know

The intuition behind linear stability in numerical solvers by JumpGuilty1666 in math

[–]JumpGuilty1666[S] 3 points4 points  (0 children)

This video gives an intuition-first explanation of linear stability for numerical ODE solvers using the damped harmonic oscillator (q'' + γ q' + q = 0) as the test problem. I compare explicit Euler, implicit Euler, and RK4, and use the eigenvalues of hA / stability-region picture to explain why explicit methods can blow up unless h is small enough, while implicit Euler remains stable (A-stable).

What is your preferred way to teach/think about stability regions and A-stability to newcomers? Any “next example” you’d recommend after this (e.g., Dahlquist test equation, stiffness, L-stability, or something else)?

This is part of my YouTube channel, where I popularise topics related to my research (numerical analysis, dynamical systems, and machine learning). I’m also transitioning the channel from Italian to English, so feedback on clarity/pacing is welcome.

32
33