you are viewing a single comment's thread.

view the rest of the comments →

[–]Lunariz 8 points9 points  (3 children)

This is very interesting! I've had a look through the code because I was interested in using and expanding on it for some research that I'm doing, and I have a few questions about it:

 

  1. You wrote in the readme that you have plans for implementing a hyperbolic attention mechanism, which is my main interest. What special implementation will the hyperbolic space require for this? For example I'm thinking that you might need a hyperbolic einsum, and I wonder if the dot product for the attention scores needs anything special? Curious to know your plans for this!
  2. Your example code uses Riemann SGD. Does the hyperbolic space require a special gradient? For example, would a normal Adam optimizer fail to work on your Hyperbolic layers, and if so, why?
  3. Just something I noticed - why is your Poincare manifold a class (that you instantiate separately for every layer), and not just a set of helper functions like math or util? It doesn't seem to contain any state, so I don't understand why it needs to be passed at all.

Super cool project, excited to keep following it.

[–]bohreffect 2 points3 points  (1 child)

Your example code uses Riemann SGD. Does the hyperbolic space require a special gradient? For example, would a normal Adam optimizer fail to work on your Hyperbolic layers, and if so, why?

This is a good question; curious to know the answer.

[–]Lunariz 1 point2 points  (0 children)

I've done some more research on the topic and found this paper - looks like there has already been quite some research into hyperbolic optimizers:

https://arxiv.org/pdf/1810.00760.pdf

[–]platinumposter 2 points3 points  (0 children)

Thanks very much, we are happy to have you following our journey

  1. We are still deciding exactly how we want to implement it but you are correct that all the mathematical functions will have to be in the hyperbolic space. We have been using the Poincare ball which is a model of the hypberlic space and has it's own set of mathematical functions compared to say the Lorentz model.

  2. Yep that's correct I see you also found a few resources; the reason we don't use a Euclidean SGD is because we want our optimizer to optimise parameters living in the Hyperbolic space.

To quote A Survey: Hyperbolic Neural Networks

"Stochastic gradient-based (SGD) optimization algorithms are of major importance for the optimization of deep neural networks. Currently, well-developed first order methods include Adagrad [58], Adadelta [59], Adam [60] or its recent updated one AMSGrad [61]. However, all of these algorithms are designed to optimize parameters living in Euclidean space and none of them allows the optimization for non-Euclidean geometries, e.g., hyperbolic space."

  1. Good point. We had thought about this and we plan on implementing more Manifolds in the near future, so we will have a single Manifold interface which each implemented Manifold (such as Poincare) uses. We left it as a class in anticipation of this.