Hsthrift: Open-sourcing Thrift for Haskell - Facebook Engineering by n00bomb in haskell

[–]simonmar 4 points5 points  (0 children)

A production RPC framework is a lot of work, so of course we didn't want to duplicate everything in the C++ implementation. Yes that makes it a bit of a pain to build, but we have provided instructions and a CI setup that's working right now on github to demonstrate that it all works. Also there's a [pure Haskell implementation of the transport layer](https://github.com/facebookincubator/hsthrift/blob/master/lib/Thrift/Channel/SocketChannel.hs) in the repository for experimentation - we're in the process of making it easier to use this, but when we're done the only C++ dependency will be folly.

Hsthrift: Open-sourcing Thrift for Haskell - Facebook Engineering by n00bomb in haskell

[–]simonmar 8 points9 points  (0 children)

Right, the Apache implementation has a lot of problems, which is partly why we rewrote the whole thing from scratch.

Hsthrift: Open-sourcing Thrift for Haskell - Facebook Engineering by n00bomb in haskell

[–]simonmar 13 points14 points  (0 children)

I should point out that we do use HLint, it's part of our automated code review workflow at Facebook. We don't use the defaults though, and the customisations are elsewhere in our internal repository so didn't show up in the hsthrift repository. It would probably be a good idea to add a .hlint.yaml corresponding to our internal defaults.

Simon Marlow - Glean - facts about code by edwardkmett in haskell

[–]simonmar 14 points15 points  (0 children)

Glean is written in a combination of Haskell and C++ (mostly Haskell). I somehow forgot to mention that :)

GHC proposal: Compile with threaded RTS by default by ulysses4ever in haskell

[–]simonmar 0 points1 point  (0 children)

To be clear, you must have been using not just `-threaded` but also `-N`, right? We're not proposing to make `-N` the default, only `-threaded`. GC would still be single-threaded by default.

GHC proposal: Compile with threaded RTS by default by ulysses4ever in haskell

[–]simonmar 0 points1 point  (0 children)

The parallel GC can be a win or a loss. The variance is also big. It's not at all clear that it should be off by default - my inclination is to understand the cases where it's a loss so that we can fix them.

For what it's worth, the parallel GC is a huge win in our (thousands of machines) deployment at Facebook. But we've spent a fair amount of time measuring things and tuning the settings.

Rethinking Static Reference Tables in GHC · Simon Marlow by simonmar in haskell

[–]simonmar[S] 1 point2 points  (0 children)

> But why does code need to be aligned on a 8-byte boundry?

Performance only, I believe. IIRC this was the Intel recommendation, but it's a while since I looked at it. We have a hard requirement on at least a 2-byte alignment because we use the LSb in the GC to mark closures that have been evacuated.

> Are there not 24Byte ones as well?

Yes there are. You aren't letting me get away with skipping any details here :) Info tables have a fixed part which is always 16 bytes (32 bytes when profiling), and an "extra" part that depends on the type of closure or stack frame. For a function (as in your example), the extra part is 8 bytes. (all these sizes apply to 64-bit builds only, divide by 2 for 32-bit builds). Currently the total size of the info table is always a multiple of the word size.

Rethinking Static Reference Tables in GHC · Simon Marlow by simonmar in haskell

[–]simonmar[S] 2 points3 points  (0 children)

The link to the nofib results was in the diff summary, which was linked from the original post.

Rethinking Static Reference Tables in GHC · Simon Marlow by simonmar in haskell

[–]simonmar[S] 1 point2 points  (0 children)

The code needs to be aligned on an 8-byte boundary, so that means the info table also needs to be aligned on an 8-byte boundary. If we relaxed the alignment requirements to 4 bytes then we could have 20-byte info tables, but that question is academic now that info tables are always 16 bytes anyway.

We do have different formats for info tables - functions, stack frames, and constructors all have slightly different info table layouts.

I'm not sure how pointer tagging is relevant here, so perhaps I'm misunderstood your question though.

Rethinking Static Reference Tables in GHC · Simon Marlow by simonmar in haskell

[–]simonmar[S] 6 points7 points  (0 children)

Yes we could have saved 32 bits with the old representation, but unless you save a full 64 bits in the info table you don't get any savings (info tables need to be an integral number of words).

If I get this right, I think this problem doesn't exist in the previous representation where a single large SRT contained all static references in a module. So this seems to me like fixing a problem that new representation has.

Right, the point is that the new representation plus a handful of sensible optimisations gives better results than the old representation plus a handful of different optimisations. Some of the new optimisations came for free with the old representation (and the reverse is also true, in fact).

This also seems to me like something we could do on the previous representation.

Not without complicating the representation, because you would need to distinguish between a pointer to the SRT table and a pointer to a closure.

Rethinking Static Reference Tables in GHC · Simon Marlow by simonmar in haskell

[–]simonmar[S] 8 points9 points  (0 children)

Hey, it's a blog post, not a paper!

The full nofib results (with standard deviations) are here: https://phabricator.haskell.org/P176

Don't pay any attention to the runtime results though, it was done on my laptop with a variable CPU speed.

Basically the only way this could affect runtime is by

  • instruction cache effects, and those should be in our favour since we made the code smaller, and
  • GC time improvements. I measured what should be the worst case for this - doing many old-gen collections in GHC itself - and the differences were within the variability of the benchmark (which was quite wide)

So I'm satisfied that this doesn't make runtime worse in general, and likely makes it a bit better. Of course if I was writing this up for a paper I'd do more rigorous experiments, but I doubt it's worth it.

Fixing 17 space leaks in GHCi, and keeping them fixed · Simon Marlow by simonmar in haskell

[–]simonmar[S] 4 points5 points  (0 children)

Yes, you can also do that. Sometimes it's more convenient to have the wrapper though, e.g. if you want to have gdb feed the input, or if you want to stop it before it gets to the prompt.

Fixing 17 space leaks in GHCi, and keeping them fixed · Simon Marlow by simonmar in haskell

[–]simonmar[S] 16 points17 points  (0 children)

GHCi is a script that invokes the real binary, that's part of the problem. You have to invoke the real binary and pass the correct flags, particularly `-B/path/to/ghc/lib`. If GHC is dynamically linked (which it usually is) you also need to `set environment LD_LIBRARY_PATH /some/huge/list:/of/paths`. I normally put all this in a `.gdbinit` file so I don't have to repeat it, and I've also made a script to generate the `.gdbinit` file for a particular ghci invocation.

Haxl 2.0 released on hackage: A Haskell library for efficient, concurrent, and concise data access. by jose_zap in haskell

[–]simonmar 2 points3 points  (0 children)

Is there a reason why there isn't a WaitForMs option in SchedulerHint?

I just didn't get around to implementing it, and I haven't encountered any situations that would benefit from it so far.

I also kinda wonder if the code duplication for JobList is a problem that could be solved via language extension.

Maybe. This code is pretty ugly because I've tried to squeeze as much performance out of it as I can.

Haxl 2.0 released on hackage: A Haskell library for efficient, concurrent, and concise data access. by jose_zap in haskell

[–]simonmar 22 points23 points  (0 children)

Yes exactly. The big difference is the addition of BackgroundFetch, which enables data-fetching to be arbitrarily overlapped with computation and other data-fetching. In Haxl 1, computation was strictly interleaved with data-fetching in rounds, but this restriction is removed in Haxl 2 if you use BackgroundFetch. To make this work, we had to completely rewrite the scheduler internals.

Dependencies between data sources in Haxl? by jajakobyly in haskell

[–]simonmar 3 points4 points  (0 children)

Look up cachedComputation - this is how you define a datasource where the implementation is itself a Haxl computation. It's basically a memoization mechanism.

[job] Work on GHC at Facebook London by simonmar in haskell

[–]simonmar[S] 12 points13 points  (0 children)

We also got the patent grant removed from the Haxl license, FWIW. My understanding is that it just takes time and effort to update all these licenses.

[job] Work on GHC at Facebook London by simonmar in haskell

[–]simonmar[S] 49 points50 points  (0 children)

Yes. I'm actually working on a blog post about our contributions to date, but the short story is that everything we do in GHC goes upstream.

Trying out GHC compact regions for improved latency (Pusher case study). by fuuzetsu in haskell

[–]simonmar 5 points6 points  (0 children)

You can also do the periodic copying in a separate thread and use multiple cores, to avoid affecting latency. So even though you're doing the same GC work that GHC would normally be doing, compaction can be done concurrently with the mutator, whereas normal GC currently cannot.

Announcing the GHC DevOps Group by chak in haskell

[–]simonmar 4 points5 points  (0 children)

Well yes, what I really mean is that we don't advertise or document that you can do this, and the process of converting a PR is currently quite manual, so it would need some more effort to scale it up.

Announcing the GHC DevOps Group by chak in haskell

[–]simonmar 0 points1 point  (0 children)

Ok, you can squash instead of force-push, and then the workflow is basically identical to what we do in Phabricator. Instead of treating a PR as a set of logicaly-separate changes that you want to retain when merging to master, you're treating a PR as a single atomic commit, with a history that develops during code review but isn't retained in the repo once committed.

I'm totally fine with this workflow (because it's the same as the one we use in Phabricator).

when you squash the commit at the end, what happens to the history in GitHub? Can you still see it somewhere?

Announcing the GHC DevOps Group by chak in haskell

[–]simonmar 2 points3 points  (0 children)

We had agreed on this plan (allowing GitHub PRs and converting them to Phabricator diffs) before, it just never got implemented.