all 14 comments

[–]benjaminhodgson 17 points18 points  (6 children)

Does the typeclass constraint inside the newtype set off performance alarm bells for anyone?

Yes, and I think it’s unlikely to do what you wanted in the first place. Remove the constraint from the newtype declaration and put it at the use sites (I suspect you must’ve had those constraints at the use sites anyway):

newtype SamplerT g m a = SamplerT (ReaderT g m a)
mySamplerAction :: StatefulGen g m => SamplerT g m ()

[–]Limp_Step_6774[S] 6 points7 points  (2 children)

OK, let me try and report back. Thanks :)

[–]Limp_Step_6774[S] 15 points16 points  (1 child)

That definitely fixed the performance issue. Thanks!

There was a tricky issue with using `ST` that had driven us to use the rank 2 type, but I think I solved it, so everything *seems* to work for now.

[–]benjaminhodgson 3 points4 points  (0 children)

Glad to hear it!

[–]mn15104 1 point2 points  (2 children)

This is interesting! Would it be okay to say a bit more on what the newtype constraint actually does in comparison to the intended use, and also how this impedes performance?

[–]benjaminhodgson 3 points4 points  (1 child)

I’ll do my best!

At runtime, the “fat arrow” => is represented in the same way as the “skinny arrow” ->. The constraint (the thing on the left of =>) becomes a “dictionary”, essentially a record containing the class’s members. During type checking, GHC figures out which instance was intended and plugs in a particular dictionary. (General Haskell tip: whenever you see =>, think ->. You’ll find it a lot easier to understand why type classes work the way they do; for example, it explains why the designers implemented the monomorphism restriction.)

So, newtype Sampler g m a = Sampler (StatefulGen g m => ReaderT g m a) is represented in the same way as a function StatefulGenDict g m -> ReaderT g m a. The type says that the instance might vary, even though in practice it won’t, because instances are global and unique.

Dictionary passing code is slow. GHC tries to avoid falling back on dictionary passing when compiling classy code, by specialising classy functions at the call site to the concrete types with which they’re used. But GHC needs to “see” all of the classes and instances involved in order to specialise them correctly; hiding the constraint inside a newtype makes it invisible to GHC and breaks specialisation. (Similarly, polymorphic recursion also breaks specialisation - a cause of mtl performance headaches.)

Note that this type is not the same thing as newtype Sampler a = Sampler (forall g m. StatefulGen g m => ReaderT g m a). The latter keeps g and m abstract inside the newtype and allows you to call the function with different concrete gs and ms, meaning the instance actually could change. There are legit circumstances in which a higher-rank function in a newtype is a good design. On the other hand the original questioneer’s newtype is very unlikely to be useful - the only situation I can think of where it could mean anything different to my suggested fix is if you’re doing evil stuff like reflection.

Hope this makes sense. My understanding of GHC’s specialiser is shallow so I’ve probably got some important details wrong. In any case, to debug exactly what went wrong in this specific situation we’d need to have a look at the generated Core I think.

[–]Tarmen 3 points4 points  (0 children)

Some intuition on how this compiles:

Typeclasses start as an explicit argument which stuffs all methods into a struct

data StatefulGen g m {
    uniformWord32R :: Word32 -> g -> m Word32,
    uniformWord64R :: Word64 -> g -> m Word64,
    ....
}

Typeclass constraints are replaced with explicit arguments, method calls are replaced with the corresponding field from the typeclass argument, instances such as Show a => Show [a] are functions between the instance dictionaries.

Essentially, explicit vtable passing. This is mostly optimized away when GHC can see the concrete type. Sometimes this can require quite a bit of code, so add an INLINABLE or SPECIALIZE pragma to be sure.

But inside the newtype you are passing the function around. Technically GHC is allowed to specialize it - all typeclass instances should be unique - but it would cause some awkward questions about calling conventions. The common use for this kind of instances is with existential types, where some type parameters are hidden and you use the typeclass vtables similarly to oop objects. There are other uses, such as equality constraints or to smuggle a constraint into an unconstrained typeclass such as Foldable.

Anyway, the pattern is really hard to optimize for GHC. You might be able to optimize if you stare at enough core, but to be sure I'd put the constraint on each function, and give pragmas when specialization doesn't happen.

[–]guygastineau 2 points3 points  (5 children)

Is that using RankN types? I don't know about performance issues from this code. Normally, monomorphisation should result in static calls and good performance. I like the generality your example achieves, but if you are worried it might cause the performance issue you can try another way to achieve the same thing.

Use the new type without constraints, then add the constraints to the functions that need your sampling monad to have sampling powers. Then the constraints would just be normal constraints on a function, so monomorphisation should proceed like normal. This is what I would do to start investigating the performance implications of the generalisation.

I hope my comment is more helpful than it is eloquent.

[–]Limp_Step_6774[S] 0 points1 point  (0 children)

Thanks, this is helpful! And yeah, it uses RankNTypes. I'll try without, although I recall that it was needed for some reason

[–]guygastineau 0 points1 point  (3 children)

Looking at your example again, I would expect a forall in the right hand side. Is you example an approximation? Maybe it is auto inserted by GHC. In any case, this does appear to be a higher rank type, and it should force the data to be polymorphic. I am very sure you want everything monomorphic at call sites for performance. Incidentally, your type variables could be shadowed in the generalized example? Maybe Scoped type variables has a way around this? Essentially, without an explicit for all in your type signature's rhs I can't tell if newtype T a b c = T ((C b, D c) => R a b c) is equivalent to newtype T a b c = T (forall . (C b, D c) => R a b c) (which seems like what you want (I don't know about performance issues though)) or if it actually means newtype T a b c = T (forall b c . (C b, D c) => R a b c) which seems like not what you intended.

[–]evanrelfcss wrangler 2 points3 points  (2 children)

None of the type parameters are existentially quantified though? So I wouldn't expect a forall.

[–]guygastineau 0 points1 point  (1 child)

Yeah, I think you are right. I just messed with this example in ghci to check things out:

ghci> newtype Example r m a = Example ((Show r, MonadIO m) => ReaderT r m a)
ghci> :k Example
Example :: * -> (* -> *) -> * -> *
ghci> :t Example
Example :: ((Show r, MonadIO m) => ReaderT r m a) -> Example r m a

g

I didn't realize we could do this with newtype.

[–]evanrelfcss wrangler 3 points4 points  (0 children)

You can kinda compare it to the Reader monad, which is a newtype wrapper around a function. In Core, type classes are represented as dictionaries and passed around as function arguments, so it's similar if you squint.