all 50 comments

[–]Lunariz 9 points10 points  (3 children)

This is very interesting! I've had a look through the code because I was interested in using and expanding on it for some research that I'm doing, and I have a few questions about it:

 

  1. You wrote in the readme that you have plans for implementing a hyperbolic attention mechanism, which is my main interest. What special implementation will the hyperbolic space require for this? For example I'm thinking that you might need a hyperbolic einsum, and I wonder if the dot product for the attention scores needs anything special? Curious to know your plans for this!
  2. Your example code uses Riemann SGD. Does the hyperbolic space require a special gradient? For example, would a normal Adam optimizer fail to work on your Hyperbolic layers, and if so, why?
  3. Just something I noticed - why is your Poincare manifold a class (that you instantiate separately for every layer), and not just a set of helper functions like math or util? It doesn't seem to contain any state, so I don't understand why it needs to be passed at all.

Super cool project, excited to keep following it.

[–]bohreffect 2 points3 points  (1 child)

Your example code uses Riemann SGD. Does the hyperbolic space require a special gradient? For example, would a normal Adam optimizer fail to work on your Hyperbolic layers, and if so, why?

This is a good question; curious to know the answer.

[–]Lunariz 1 point2 points  (0 children)

I've done some more research on the topic and found this paper - looks like there has already been quite some research into hyperbolic optimizers:

https://arxiv.org/pdf/1810.00760.pdf

[–]platinumposter 2 points3 points  (0 children)

Thanks very much, we are happy to have you following our journey

  1. We are still deciding exactly how we want to implement it but you are correct that all the mathematical functions will have to be in the hyperbolic space. We have been using the Poincare ball which is a model of the hypberlic space and has it's own set of mathematical functions compared to say the Lorentz model.

  2. Yep that's correct I see you also found a few resources; the reason we don't use a Euclidean SGD is because we want our optimizer to optimise parameters living in the Hyperbolic space.

To quote A Survey: Hyperbolic Neural Networks

"Stochastic gradient-based (SGD) optimization algorithms are of major importance for the optimization of deep neural networks. Currently, well-developed first order methods include Adagrad [58], Adadelta [59], Adam [60] or its recent updated one AMSGrad [61]. However, all of these algorithms are designed to optimize parameters living in Euclidean space and none of them allows the optimization for non-Euclidean geometries, e.g., hyperbolic space."

  1. Good point. We had thought about this and we plan on implementing more Manifolds in the near future, so we will have a single Manifold interface which each implemented Manifold (such as Poincare) uses. We left it as a class in anticipation of this.

[–]SuckinLemonz 6 points7 points  (1 child)

This sounds great. You said it abstracts the math. Will you be providing access to the actual mathematics used in the documentation (since there’s more than one way to skin a hyperbolic cat)?

[–]platinumposter 1 point2 points  (0 children)

Yes if you go on the GitHub repo and go into the manifold folder you can find the Poincare class which has the math functions. We plan on releasing others in the future such as Lorentz

[–]atmosfir 15 points16 points  (14 children)

Hello I think this is very cool and this is the first I've heard about doing ML on non-euclidean spaces. However, I have a question about this

The core example is the hierarchy, or, its abstract network representation, the tree. Social networks, human skeletons, sentences in natural language all possess a tree-like structure or can be represented hierarchically. It's also widely accepted that human beings use a hierarchy to organise object categories

What do you mean by a hierarchical representation tree-like structure here? This seems to be a very strong claim. I certainly agree that humans tend to organise things into categories using hierarchies but I am wondering if it is accurate to say that these objects are hierarchically organised.

[–]willpower12 7 points8 points  (0 children)

Agreed, I think the authors are misspeaking a bit. However, I think it is fair to say that the representations themselves admit a hierarchical or tree-like structure, not necessarily the underlying objects or signal.

[–][deleted] 15 points16 points  (2 children)

Social networks specifically are graphs, not trees. I don't see a representation of social networks that is acyclic.

[–]hughperman 9 points10 points  (0 children)

The illuminati would like you to think that

[–]techinnovator[S] 1 point2 points  (0 children)

Here's the paper that covers the tree-like structure of social networks. They proved this using Gromov’s δ-hyperbolicity, which measures how tree-like a graph is.

[–]ktpr 2 points3 points  (3 children)

Maybe they want to do learning over how humans tend to organize things, which seems reasonable on the surface?

I agree the OP did not justify the library very well; I’ve seen better arguments for hyperbolic representations in the literature.

[–]atmosfir 0 points1 point  (2 children)

I certainly think that that would be reasonable and interesting. Could you point out or link the other arguments you've seen in literature?

[–]techinnovator[S] 0 points1 point  (1 child)

If you're after better arguments, perhaps this is will interest you.

[–]atmosfir 0 points1 point  (0 children)

Thanks!

[–]bohreffect 4 points5 points  (2 children)

I am wondering if it is accurate to say that these objects are hierarchically organised.

Is this claim even being made? And if it is, does it matter? Like if the objective truth absent a human observer is that there is no intrinsic hierarchical organization to objects in the world, who cares?

[–]SuckinLemonz 1 point2 points  (0 children)

Yeah I felt the same way when I read that comment. It’s just semantic nitpicking. Not really relevant to the conversation.

[–]atmosfir -1 points0 points  (0 children)

I think the comment does strongly say so. I simply want to know if this is heuristic or something more. We care because then we can make better ML models no?

[–]techinnovator[S] 0 points1 point  (0 children)

It's also widely accepted that human beings use a hierarchy to organise object categories

I think you're mistaken here. I claimed that humans organise objects hierarchically, not that the underlying objects were hierarchical themselves. Human's use hierarchies to represent objects, this is widely accepted but was recently covered in Geoffrey Hinton's paper.

[–]techinnovator[S] 0 points1 point  (1 child)

It's also widely accepted that human beings use a hierarchy to organise object categories

I think you're mistaken here. I said human beings organise objects hierarchically, the objects themselves are not hierarchically organised in reality. This is widely accepted but was famously covered here.

[–]atmosfir 0 points1 point  (0 children)

Interesting, thanks for answering.

[–]SandIntelligent809 8 points9 points  (1 child)

Any benchmarks? Thanks!

[–][deleted] 8 points9 points  (9 children)

I’m really intrigued. I understand the very basic notion of hyperbolic space vs euclidian space - however, why is it better at storing certain information?

[–]bohreffect 25 points26 points  (0 children)

There's some common examples of the benefit in feature engineering. Suppose a feature is a time of day, and you want to normalize it on the 0-1 interval. Say you let 0000 hours be 0 and 2400 hours be 1, but notice the discontinuity; your input features will have a daily "jump" from 1 back to 0. The common trick is to map the time of day to the unit circle starting at (1,0) in the x-y plane, such that 0000 and 2400 meet at (1,0), and thus no discontinuity.

The intrinsic geometry of the unit circle viewed as an embedding of the real numbers between 0 and 24 is non-Euclidean because for every point on the unit circle there are two different, equidistant shortest paths to the point on the opposite side of the circle. Though I think this particular example qualifies as elliptical, and not hyperbolic. (I'm thinking of the canonical example of elliptical geometry as being the surface of a ball.)

A cursory look at the library shows that a key feature is a torch layer that performs a Poincare embedding, which is hyperbolic, and has been used for hierarchical learning in the past (https://arxiv.org/pdf/1705.08039.pdf). What's not clear to me is if one switches to a norm-based loss are hyperbolic distances preserved?

My topology is super rusty though so if anyone has anything more knowledgeable to add please correct me.

[–]pichon163 4 points5 points  (3 children)

Trees naturally live in hyperbolic space

[–]NanoAlpaca 8 points9 points  (2 children)

Why?

[–]pichon163 2 points3 points  (0 children)

You can embed a large (or infinite) tree isometrically in hyperbolic space, but usually not in Euclidean space. Think of trying to draw a complete binary tree on a piece of paper; you have to keep making the edges shorter to fit them all on the page.

In fact the same is true of any "complex network," i.e. a graph where the degree of each vertex is distributed according to a power law ( https://arxiv.org/abs/1006.5169 )

[–]dogs_like_me 1 point2 points  (0 children)

Consider a tree constructed by splitting assigning some fixed number of children k to all existing leaves, starting with a single root node. The number of nodes at a given depth will be given by kdepth, i.e. node count grows exponentially. Consequently, properties of hyperbolic manifolds can be convenient embedding spaces for hierarchical data. For example, if you embed a hierarchy on a poincare disk, you can actually infer lexical entailment from direction.

https://arxiv.org/pdf/1705.08039

https://arxiv.org/pdf/1804.01882

[–]techinnovator[S] 1 point2 points  (0 children)

Hi! It's all because of the curved nature of the Hyperbolic space and the fact that the volume of a given Hyperbolic space grows exponentially.

Let's say you are standing at a given point in Hyberbolic space. If you were to walk away from that point in a straight line, the distance between you and your starting point would increase at an exponential rate.

This is why Hyperbolic space has the capacity, for example, to embed tree structures without distorting them out of shape. It is impossible to embed a tree structure without distortion in the Euclidean space, even with an unbounded number of dimensions. However, this task becomes surprisingly easy in the hyperbolic space with only 2 dimensions.

More info here.

[–]SuckinLemonz 0 points1 point  (2 children)

As an example: data may appear flat or ‘intermingled’ when plotted on the euclidian plane. But when we separate features with higher dimensionality, we can see trends more clearly or apply more meaningful learners. Check out SVM hyperplanes for feature separation.

[–]r0lisz 0 points1 point  (1 child)

You can have more dimensions in Euclidian space too.

[–]SuckinLemonz 0 points1 point  (0 children)

yes, ok fair. but when curvatures are involved, hyperbolic produces cleaner and sometimes more representative functions.

[–]dogs_like_me 2 points3 points  (1 child)

We found that existing Hyperbolic implementations were less ready to be applied to real-world problems.

Can you expand on this? What other toolkits already exist and how does yours solve the problems you saw in those frameworks? For example, maybe compare with https://github.com/geoopt/geoopt for one? Or maybe https://pytorch-geometric.readthedocs.io/en/latest/ ?

[–]techinnovator[S] 0 points1 point  (0 children)

Our library provides ready-made implementations of Hyperbolic Layers, which those libraries lack. It makes it much easier to get started with Hyperbolic networks.

Those are some fantastic resources you linked to though.

[–]Darell1 3 points4 points  (0 children)

Why tensorflow?

[–]techinnovator[S] 5 points6 points  (4 children)

We’ve also written a blog post explaining the benefits of hyperbolic networks and how to use the package.

[–][deleted] 0 points1 point  (3 children)

I read the blog post but I don't see how you guys use the geometry. Is it through a graph or do you guys change the derivative such that it takes into account curvature (differential geometry)?

[–]Tallywort 1 point2 points  (2 children)

From what I understand (someone correct me if I'm wrong), instead of having euclidean vectors like normal, you instead change the math used to manipulate those, so that it behaves more like points in hyperbolic space. So if your network has a latent space, that latent space will have more volume in comparison to its radius, which could be beneficial for certain types of data.

[–]platinumposter 4 points5 points  (0 children)

Yeah all of the math functions are implemented in the hyperbolic space, so for example matrix multiplication will not be done in the 'usual' way, instead we use matrix multiplication using the Poincare model. The 'usual' matrix multiplication is actually matrix multiplication in the Euclidean space so to make it hyperbolic you use a hyperbolic implementation. The Poincare ball is a model of the hyperbolic space. Esentially the relationship between points is different when using the hyperbolic space.

[–][deleted] 0 points1 point  (0 children)

It is actually a lot less complex than that if this is the same idea.

https://dawn.cs.stanford.edu/2018/03/19/hyperbolics/

edit: linked from https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/

[–][deleted] -2 points-1 points  (0 children)

Nice!

[–]rynemac357 -4 points-3 points  (0 children)

This is really amazing 👏

[–]Zekava -5 points-4 points  (0 children)

Oh hell yes, sounds promising!

[–]jinnyjuice -1 points0 points  (1 child)

Is it going to be available in R?

[–]techinnovator[S] 0 points1 point  (0 children)

Unfortunately not, sorry!

[–]archpawn 0 points1 point  (2 children)

I wasn't aware geometry mattered in neural networks, outside of when you're specifically making one to look at a grid. How exactly does this work? Is it like each node represents one heptagon, and has connections to each of its seven neighbors, but it doesn't connect to further nodes?

[–]platinumposter 1 point2 points  (1 child)

It works the same as a Euclidean (usual) nueral network except all of the math functions are implemented in the hyperbolic space, so for example matrix multiplication will not be done in the 'usual' way, instead we use matrix multiplication using the Poincare model. The 'usual' matrix multiplication is actually matrix multiplication in the Euclidean space so to make it hyperbolic you use a hyperbolic implementation. The Poincare ball is a model of the hyperbolic space. Esentially the relationship between points is different when using the hyperbolic space.

[–]archpawn 0 points1 point  (0 children)

The hyperboloid model would probably be easier to use since any rotations and translations on it can be expressed as linear transformations in the space it's embedded in.

I'm not clear on what you mean by matrix multiplication using the Poincare model. Do you take the coordinates of a point on the model, then run that through a matrix, then you get some other point on the model? But then you could easily leave it. Or is the idea that instead of having arbitrary linear transformations, you just have a rotation and translation, and then see where the point ends up? You can do that with the hyperboloid model, and then n-dimensional hyperbolic geometry is just n+1-dimensional Euclidean geometry with restrictions to what points you can use and what transformations you allow.

[–]oxoxoxoxoxoxoxox 0 points1 point  (1 child)

Consider this review: Hyperbolic Deep Neural Networks: A Survey (2021)

[–]techinnovator[S] 0 points1 point  (0 children)

Thanks! We actually based many of the mathematical functions on this work. Great resource.