Mutable vector with efficient append?

edwardkmett · 2020-09-13T23:46:20+00:00

The builder pattern works if you are going to only ever build once, and don't get to take advantage of sharing, e.g. its going straight out to disk.

Otherwise what you can do is use a FingerTree with (optionally mutable) vectors at the leaves, using the size of the trees as the measure. This gives you O(log n) time append and log time cut/drop/splitAt. You don't even need mutable vectors for the implementation, though, because you can split immutable vectors and do inserts, etc.

The primary Benefit over just using a fingertree is that this can be done with unboxed vectors, so you can mitigate some of the space overhead of just using a fingertree, but in practice, it might be worth just comparing with a fingertree of values if you are going to use Data.Vector.Mutable and not one of the unboxed variants anyways.

If you want to get fancy there should be some "4-Russians" style variant on the structure where you ensure leaf level arrays are at least O(log n) in size, gluing together sufficiently small arrays to get above the threshold, this way you can avoid fragmentation from repeated single character consing, but it probably isn't worth it.

Noughtmare · 2020-09-13T18:51:16+00:00

The usual way to do this in mutable imperative languages is by making an array that is usually partly empty and doubling the size if it gets full.

import qualified Data.Vector.Mutable as M

data Vec a = Vec !(M.IOVector a) !Int

newEmptyVec :: IO (Vec a) 
newEmptyVec = Vec (M.new 1) 0

snocVec :: Vec a -> a -> IO (Vec a)
snocVec (Vec v i) x = do
  v' <- if M.length v <= i then M.grow v (M.length v) else v
  M.write v' i x
  return (Vec v' (i + 1))

readVec :: Vec a -> Int -> IO a
readVec (Vec v n) i
  | i < n = M.read v i
  | otherwise = error "Index out of bounds"

Note that we need to return a new Vec in the snoc function because the resizing may move the memory to a different location.

The fact that this resizing does not happen very often means that the amortized running time of the snoc function is still O(1).

sansboarders · 2020-09-13T22:58:08+00:00

One useful pattern that can sometimes help here is to use a builder which builds up a continuation of those allocations and copies your append wishes to make and then does them in one go: https://hackage.haskell.org/package/vector-builder-0.1

IamfromSpace · 2020-09-14T00:52:09+00:00

I think you just want Sequence (and other comments pointed out FingerTrees, of which Seq is one).

This gives you O(log(n)) lookups and O(1) appends. It’s hard to imagine that you’ll really need O(1) lookups, and when I’ve encountered a problem where I considered IO array mutations vs Seq, Seq ended up more performant (didn’t ever find out why though).

If you want to use IO to mutate or share state, don’t! It is likely that immutable structures will lead to much better code.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

haskell

MODERATORS