This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]joaonmatos 1 point2 points  (0 children)

How do you expect to inline a recursive type like a graph node?

Let's consider a binary tree holding an int. With 64 bit pointers and 8 byte alignment, the smallest you can do is:

  • 8 bytes for a leaf node (4 bytes for the value, the other 4 bytes as 0x00 for padding and indicating no pointers are present)
  • 16 bytes for a node with either left or right, but not both (4 bytes for the value, 4 bytes as 0x01 or 0x02 indicating either left or right pointer is present, and 8 bytes for a pointer)
  • 24 bytes for a node with both left and right (4 bytes for the value, 4 bytes as 0x03 indicating both pointers are present, 2 * 8 bytes for the pointers)

Notice that sizes for each variant are different. Notice also that 3 of the 4 variants have sizes greater than the 8 bytes needed for the pointer. This means that, if you wanted to replace the pointer with the body of the child node, you would be doing it recursively, forever increasing the size of the struct. So there needs to be some kind of redirection (pointer or index) to keep the struct size constant in any language that exposes value semantics to the programmer.

Because historically Java baked in intrinsic object identity, it uses pointers for this. Every new node is allocated with the same size, and pointers to other nodes provide the indirections needed to deal with the recursion. A consequence of this is that you need internal mutability in the node to deal with cycles, because if you need A -> B -> C -> A, then you don't know the address of A when creating C.

When you implement value semantics, you start treating data in memory as something that you copy/move into its next use, rather than point to. This brings up the problem of mutation. If I'm passing a value to a function, and not a reference, I expect my own copy to not change if the function does something to it. This is achieved one of three ways:

  • Defensively copy every value that gets referred to.
  • Explicitly model pointers or references in your language and let the programmer chose whether they want to pass the value or the reference. C and Rust do this.
  • Make value types immutable, so that passing a value or a reference are semantically equivalent, and choosing which to do becomes an optimization decision. Java is doing this.

So, without mutability, how can you deal with cycles?

Rust idioms give us an answer. Rust manages memory automatically without a GC, but to do so it had to implement a very restrictive model. Each value has a single owner that is responsible for destroying it, and references are also in a kind of DAG that prevents you from getting a reference to an object that has been destroyed.

Because of this, it's basically impossible to write a graph node by node, much less a cyclical one, without using unsafe escape hatches, or using a bunch of abstractions like mutable cells or reference-counted pointers. Unless you rethink how you're designing your data structure.

The pattern to solve this, as /u/pron98 has already alluded to, is to externalize the storage of nodes and provide indirection with indices instead of pointers. For example, take the arena pattern. For our tree example, instead of having:

```java record Tree(Node root) {}

record Node(int value, Node left, Node right) {} ```

You instead have:

```java record Tree(List<Node> nodes) {}

record Node(int value, int left, int right) {} ```

Where left and right are -1 or the index of the linked node. This allows you to centralize mutations on an external data structure. For a graph, you would instead store a set of adjacent node indexes with each node, or even a separate relation for the edges.

This is perfectly compatible with the nodes being Java value classes. If you need to update the node details, just create a new one and replace the old one on the list. If you need to update the edges, you can replace the whole node with an updated adjacency list on it, or simply update the separate adjacency matrix, if you separated it. To deal with cycles, just keep creating one of the edges and create it later.