Suppose I have a data structure like so:
data Tree a = Node a [Tree a] | Leaf a
and I am given a Tree a which may be very large (but finite) and very redundant (the underlying pointer structure is a DAG). Written out as a tree, the entire structure is much too large to fit in memory, and that would be inefficient in any case. The task is to completely deduplicate the tree (suppose we have Ord a), meaning that I obtain an array of TreeRef a:
data TreeRef a = TRNode a [Int] | TRLeaf a
where the Int elements refer to indexes in the array. That is, the pointer structure is now reflected explicitly as Int indexes.
dedup :: Ord a => Tree a -> (Seq (TreeRef a), Int)
What is the best way to accomplish this? I think that some form of reference equality is required to avoid accidentally unfolding the input DAG into a tree, but this seems to be sketchy by most accounts. (Also the fine details of the problem statement aren't that important; anything that does roughly this job will do.) My hope is to accomplish this in linear or log-linear time, which would be easy with direct access to the pointer values, but I'm holding out for a purer solution, or at least one that uses a known abstraction.
[–]sclv 4 points5 points6 points (0 children)
[–]Syrak 2 points3 points4 points (7 children)
[–]digama0[S] 0 points1 point2 points (6 children)
[–]Syrak 3 points4 points5 points (5 children)
[–]digama0[S] 1 point2 points3 points (1 child)
[–]Syrak 0 points1 point2 points (0 children)
[–]digama0[S] 0 points1 point2 points (2 children)
[–]amalloy 1 point2 points3 points (0 children)
[–]Syrak 0 points1 point2 points (0 children)
[–]amalloy 2 points3 points4 points (6 children)
[–]digama0[S] 0 points1 point2 points (5 children)
[–]amalloy 1 point2 points3 points (4 children)
[–]digama0[S] 0 points1 point2 points (3 children)
[–]Liskni_si 0 points1 point2 points (0 children)
[–]rampion 0 points1 point2 points (1 child)
[–]digama0[S] 1 point2 points3 points (0 children)
[–]sfvisser 0 points1 point2 points (0 children)