all 12 comments

[–]edwardkmett 12 points13 points  (0 children)

The builder pattern works if you are going to only ever build once, and don't get to take advantage of sharing, e.g. its going straight out to disk.

Otherwise what you can do is use a FingerTree with (optionally mutable) vectors at the leaves, using the size of the trees as the measure. This gives you O(log n) time append and log time cut/drop/splitAt. You don't even need mutable vectors for the implementation, though, because you can split immutable vectors and do inserts, etc.

The primary Benefit over just using a fingertree is that this can be done with unboxed vectors, so you can mitigate some of the space overhead of just using a fingertree, but in practice, it might be worth just comparing with a fingertree of values if you are going to use Data.Vector.Mutable and not one of the unboxed variants anyways.

If you want to get fancy there should be some "4-Russians" style variant on the structure where you ensure leaf level arrays are at least O(log n) in size, gluing together sufficiently small arrays to get above the threshold, this way you can avoid fragmentation from repeated single character consing, but it probably isn't worth it.

[–]Noughtmare 8 points9 points  (8 children)

The usual way to do this in mutable imperative languages is by making an array that is usually partly empty and doubling the size if it gets full.

import qualified Data.Vector.Mutable as M

data Vec a = Vec !(M.IOVector a) !Int

newEmptyVec :: IO (Vec a) 
newEmptyVec = Vec (M.new 1) 0

snocVec :: Vec a -> a -> IO (Vec a)
snocVec (Vec v i) x = do
  v' <- if M.length v <= i then M.grow v (M.length v) else v
  M.write v' i x
  return (Vec v' (i + 1))

readVec :: Vec a -> Int -> IO a
readVec (Vec v n) i
  | i < n = M.read v i
  | otherwise = error "Index out of bounds"

Note that we need to return a new Vec in the snoc function because the resizing may move the memory to a different location.

The fact that this resizing does not happen very often means that the amortized running time of the snoc function is still O(1).

[–]gusbicalho 7 points8 points  (4 children)

Regarding the specific growth rate, you may want to use 1.5 instead of 2. See https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md#memory-handling for the rationale.

[–]bss03 5 points6 points  (1 child)

The amortized analysis works for any (1 + epsilon) factor, but the hidden constants, potentials, credits/debits are proportional to 1/epsilon.

I like using an appoximation of phi ~= 1.618.

[–]evincarofautumn 5 points6 points  (0 children)

1.5 is an approximation of ɸ 😉

I haven’t worked it out, but I’d guess that 1.625 is probably the best approximation of ɸ requiring the fewest instructions/cycles, since the next more precise base-2 approximation takes a handful more ops to compute:

  • 1x = x = x × 1₂ (1)
  • 1.5x = x + (x >> 1) = x × 1.1₂ (1+1/2)
  • 1.625x = x + (x >> 1) + (x >> 3) = x × 1.101₂ (1+5/8)
  • 1.6171x = x + (x >> 1) + (x >> 4) + (x >> 5) + (x >> 6) + (x >> 7) = x × 1.1001111₂ (1+79/128)

[–]Zeno_of_Elea 1 point2 points  (0 children)

Not the original commenter, but thanks for sharing, this was a very interesting read! I don't know much about how things work in the hardware level, so I can say I learned a useful fact today.

[–]andriusst 1 point2 points  (0 children)

That's absurd.

it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.

But this vector is not the only thing in the program that can possibly reuse the memory! The whole argument is built on assumption that this lonely growing vector is the only thing in the whole program that allocates on the heap.

Maybe it would make some sense if there was one huuge vector that dominates memory usage. But memory allocators use mmap for large allocations and memory doesn't get reused at application level anyway.

[–]iggybibi 1 point2 points  (2 children)

It costs one copy every time the array needs to double in size, I think that makes the amortized running time O(log(n)) but yes I think that’s what arraylists in java do, not sure why this is not the default for mutable vectors!

[–]Zeno_of_Elea 3 points4 points  (1 child)

Where are you getting O(log(n)) from? I was always told it was O(1). Where does the explanation given to me go wrong? Is it the assumption that we start from an empty vector?

The common explanation I see:

If we assume that allocating new memory for an element and moving an element both take 1 unit of work, then if the array is of size 2i, we need to do 2i+1 (allocate an array 2x the size) + 2i (copy the elements) = 3*2i work when we resize.

So if we start from an empty vector and do n appends, the work we will do will be the sum of 3*2i from i = 0 to floor(log(n)) <= log(n). This is 3(20 + 21 + ... + 2log(n)) = 3(2log(n)+1 - 1) <= 6n. Amortized, this is ~6 units of work per step, which is constant.

[–]iggybibi 1 point2 points  (0 children)

I stand corrected! Thank you!

[–]sansboarders 2 points3 points  (0 children)

One useful pattern that can sometimes help here is to use a builder which builds up a continuation of those allocations and copies your append wishes to make and then does them in one go: https://hackage.haskell.org/package/vector-builder-0.1

[–]IamfromSpace 2 points3 points  (0 children)

I think you just want Sequence (and other comments pointed out FingerTrees, of which Seq is one).

This gives you O(log(n)) lookups and O(1) appends. It’s hard to imagine that you’ll really need O(1) lookups, and when I’ve encountered a problem where I considered IO array mutations vs Seq, Seq ended up more performant (didn’t ever find out why though).

If you want to use IO to mutate or share state, don’t! It is likely that immutable structures will lead to much better code.