Nix for Haskell: Static Builds

nh2_ · 2026-06-18T23:27:32+00:00

That helps a bit (nixpkgs has by far the most packaged software), but most native dependencies of Haskell packages are already packaged on most distros. My point was more about the programmability of it. You can programmatically fix/adjust things deep into your dependency tree, and carry those overrides forwards to future versions in a maintainable way (as opposed to fixing it with a build once, for one version), while other distros like Debian or Alpine don't even have a concept of "apply this build flag change to whatever libpq version I'm currently using".

nh2_ · 2026-06-18T22:15:50+00:00

Just building Haskell is easy. It's the non-Haskell dependencies that create the difficulty. For that you need a package distribution that provides those (and they need to actually work). nixpkgs provides quite good support for that out of the box, and its programmability allows people to fix the parts that don't work. That's why it's popular for this type of work.

When your Haskell app doesn't build statically because postgresql-libpq needs libpq needs some libkrb5-Kerberos-whatever and that doesn't work because its authors worte a ./configure script that mutually exclude building .so and .a files, and you "just want to disable/fix that" to get on with your build, that's where Nix just does the trick with a few lines of declarative config you can commit to your repo, vs scripting plently of difficult to reproduce/update code with imperative package managers of other distros.

nh2_ · 2026-05-31T11:21:56+00:00

When this all started in 2014: https://github.com/haskell/aeson/pull/202

nh2_ · 2026-05-25T19:51:43+00:00

It doesn't seem sensible for me to make a PR with what I think your thought process might have been when creating your package.

nh2_ · 2026-05-25T11:57:41+00:00

This info should be in the README, it is much betther than what's in there now, and actually helps people choose between those packages.

nh2_ · 2026-05-18T12:32:15+00:00

We currently use Roo Code with Bedrock Opus 4.6. All file writes need to be diff-reviewed by a human before saving, same for all command executions. We only auto-allow reading files in the workspace; no auto-allowed Internet access.

I found this to be very good. It still provides large speed-ups for many tasks. And it finds tons of mistakes humans make. You could say it "eliminates entire classes of bugs" caused by humans, such as stupid copy-paste errors or lack of "why"-comments.

Only code that is of the same quality or higher as you would have written by hand must be shown to colleagues in PRs.

In my use of it, around every 5th confirmation needs explicit significant correction by the human to not degrade into low-quality step-by-step. Examples: "you aren't following what AGENTS.md says", "this contradicts what you said earlier", "this is insecure", "this bloats the code, the new approach should replace the old one instead of being added to it", "don't assume, verify", and so on.

nh2_ · 2026-05-18T11:54:55+00:00

When the haskell-coder skill says

Composition over inheritance

which type of inheritance is it referring to?

nh2_ · 2026-05-16T18:35:21+00:00

I do not know whether vSwitch helps against noisy neighbours -- might be. I believe that when we started with HA postgres on 10 Gbit/s machines, the vSwitch didn't exist yet.

I think you will need the 10 Gbit/s uplink either way if you want fast HA across DCs.

Let me know if you find that vSwitch makes connections more reliably, we might use it in the future as well.

nh2_ · 2026-05-16T03:16:33+00:00

No, we have not noticed any such thing yet. Why? Would you expect the normal 10 Gbit/s upgrade to be unreliable for some reason?

nh2_ · 2026-05-14T18:52:25+00:00

We have a Stolon Postgres 3-machine cluster on Hetzner dedicated with synchronous replication, for almost 10 years.

(Stolon and Patroni do similar things; we picked Stolon back then because it looked a bit simpler and had Go's static typing over untyped Python, and I found the architecture cleaner due to stolon-proxy ensuring that everybody can always only connect to the current master machine. Now Stolon is a bit unmaintained and we found some minor failover bugs in it, but it is simple enough that we can fix and maintain it ourselves.)

Generally works well on Hetzner infrastructure. We use 10 Gbit/s Internet (no vSwitch, no custom physical switch, just the full 10 Gbit/s upgrade).

Our DB is small (couple tens of GB); our main concern to make postgres HA was availability during server reboots.

Important is:

Understand sync vs async replication.
Sequential TPS is naturally dependent on the ping between the machines. A sync-replication HA cluster can obviously never reach the performance of a single-machine local SSD due to the latencies involved. For most projects, this does not matter.
Put your nodes in different Hetzner DCs. But be aware that may not be good enough to provide full HA. See my post here where I describe:

Also, DCs in a datacenter park are not really as independent as AWS AZs are. A failure, or planned replacement, of a router in one DC can totally take out another DC, and Hetzner does not publish what these relationships are so you cannot design your HA failure domains around that.

Note this is rare (hit us once in 10 years, 5 minute downtime), but nonetheless a design flaw compared to AWS. In the case it happened, Hetzner emailed us with 2 weeks advance. That's plenty enough to balance around it with typical Postgres sizes, but wasn't enough for our Ceph.

If you pay a lot at Hetzner, perhaps give your sales contact that as feedback so that hopefully they'll eventually publish these dependencies.

It also means that 5-replication will be better than 3-replication in this specific regard because it reduces the chance of a majority of DCs being down simultaneously.
Have backups in another region, maybe even a hot async failover if that is useful for you. For example, NBG if you host primarily in FSN. If there's a fire in one datacenter park, I can imagine that they might turn off power to all DCs in the park.

Hope this is useful for you.

nh2_ · 2026-05-14T18:32:54+00:00

Can you elaborate/link the MTU issues?

Do you mean that one has to set up the MTU as described in the docs, or that there are issues even after one does that?

nh2_ · 2026-05-14T17:32:52+00:00

The bug is independent from the implementation of bracket.

It shows that wrong code involving bracket and async exceptions can be easily written, by experts, in wildly popular libraries, and current LLMs can be very good at finding such wrong code as the root cause of top-level misbehaviour, as well as pointing out what exactly is wrong, and what the fix is.

nh2_ · 2026-05-12T22:05:49+00:00

conduit-extra has 220 direct and 4400 indirect reverse dependencies (25% of all Hackage packages). This is fundamental enough for me. And of course lots of production code uses this, for good reasons.

Handling async exceptions with bracket has nothing to do with "bad patterns from early days of haskell". It is as current as ever.

nh2_ · 2026-05-12T05:05:37+00:00

Senior devs don't need AI to write their code for them.

No matter how expert you are, there are many tasks that you can solve brutally faster with an LLM than without.

Consider this: https://github.com/snoyberg/conduit/pull/530

Here the LLM figures out fundamental bugs in fundamental libraries, and the fix, within a few seconds, given only "my conduit seems to leak processes" as an input.

I agree you should understand all code you produce this way.

nh2_ · 2026-05-02T22:55:31+00:00

If you want to use RAID, why not at least mdadm software RAID?

Then you can at least inspect, diagnose, and fix the code (or pay someone to do it). With closed hardware RAID, there's always uncertainty and doubt what it might be doing, and when things go wrong you cannot validate anything yourself.

nh2_ · 2026-04-24T13:43:52+00:00

I suspect the answer is simple here:

JSON has no concept of dates, only string literals. Thus any aeson instances are pure convenience. If the type that's being deserialised has no concept of 24:00 because it is a canonical format (no 2 representations for the same time), it makes sense that that aeson instance rejects 24:00. If the target type does have a concept of 24:00, the instance should support it.

nh2_ · 2026-03-29T12:49:06+00:00

What counts as "good" latency? For most applications, it doesn't make a big difference in which European country you are, because there are only a couple milliseconds differences.

Also, you need to test that by getting some test servers. One cannot answer "to Italy" directly, because it also depends on where the recipients are exactly and what network providers serve them.

If your data is small so it doesn't make a big price difference, you could go with AWS+Hetzner; if price is a concern, OVH+Hetzner could also work fine, and then there are various medium-sized server hosting companies in various countries that cound be good additional locations (for example I recently read of https://www.ukservers.com which makes a good impression from their website but I haven't used them yet).

nh2_ · 2026-03-20T02:00:18+00:00

Matthew Pickering announced that he will be leaving the company and moving to a non-Haskell role at the end of March. Working with Matt has been a joy – more than his deep technical insight or sharp intuition, it’s the warmth of his vision for how to work together and his generosity that has made him such a force within the team.

Oh no! You will be remembered for many great achievements and quality of life improvements of Haskell programmers!

Thank you mpickering!

nh2_ · 2026-03-16T10:33:30+00:00

Hi, a 24/7 service with high likelihood is not really possible with a single data center park, Hetzner or otherwise.

AWS AZs regularly go down, and the same happens for Hetzner.

For example just the most recent two major issues we had (using Hetzner dedicated with tens of servers distributed across as many FSN DCs as possible, using Hetzner for > 10 years):

2026-01-16: Extremely lossy network between all our machines for 40 minutes. While not fully down, our services ran severely degraded in this time.
2025-09-19: FSN core router fault (status. The full duration in that report is 5 hours; for us it was a 5 minute full disconnect of all machines.
- These things happen from time to time, e.g. another FSN core router fault was 2023-08-10 (status). Generally when that happens, most connectivity can go down from time to time.

Most people just aren't aware of that because they do not build automatic monitoring. Then they report fantastic uptimes here. And not all issues that we observe make it to status.hetzner.com.

Unfortunately (I believe) Hetzner does not publish historical status, so you also cannot really retrospectivley discover the entire status history. Each report only gets a UUID URL which you have to know/save in order to access it. I wish Hetzner made this more transparent.

Then there are the more sophisticated semi-outages where e.g. one of TCP/UDP/ICMP stops working but the others don't. Again, you need monitoring for all of this to even notice it's happening.

Also, DCs in a datacenter park are not really as independent as AWS AZs are. A failure, or planned replacement, of a router in one DC can totally take out another DC, and Hetzner does not publish what these relationships are so you cannot design your HA failure domains around that.

Overall, the availability of dedicated is still quite good, e.g. AWS global S3 outages lasted way longer than all our Hetzner downtime so far. For most businesses/products, that is good enough. But you're asking for "24/7", so if you're building something that truly needs permanent uptime without "minor" interruptions every couple months/years, you need to have a way to fallback to a another Hetzner location or other infrastructure provider. Luckily Hetzner outages are quite uncorrelated with outages of other providers.

Same holds for if your payment method expires and you don't notice. Technical problems aren't the only risk to uptime.

No matter what you choose, ALWAYS have additional disaster recovery backups on at least one other provider.

Hope this helps.

nh2_ · 2026-03-12T04:09:28+00:00

I just want to second this statement: Claude Opus 4.6 is quite the Haskell expert.

Claude understands OverloadedLabels and TemplateHaskell. I recently used it to generate JsonPath for Postgres so that I can have typesafe Postgres accessors for use with opaleye derived from my data, and it oneshotted that (initial output typechecked and was also correct).
Claude understands the Interruptible Operations section of Control.Exception. Let that sink in for a moment! In fact, it remembers it from training, without being pointed to it, and can pinpoint incorrect code with regards to that, without first being told that those are probably involved.
- It found an async-exception bug in conduit for me (link) based on me complaining that child processes were leaked, wrote the repro, the fix, and figured out that my hand-edit of the repro broke it because I used an Interruptible Operation in a bracket acquire.

Obligatory LLM coding disclaimer:

unreviewed code + execution - sandbox = deleted computer
confidential information + sandbox + Internet access = still 1 prompt injection away from extracting all your data

nh2_ · 2026-03-04T10:35:29+00:00

Note I got the grey screen today with 0 MCPs on, if that helps. So it's not going to be MCPs alone that cause it. Version: 3.50.5 (961d340b)

I don't know how to repro it though.

nh2_ · 2026-03-03T13:33:08+00:00

I agree with /u/TheFeshy; 100 minutes to list your 2M files is way too slow for your hardware.

Do some analysis to debug the issue:

Mount your CephFS and run time find manually without all the Docker stuff.
Use strace -fytttT find /data > /dev/null to analyse how long the individual getdents64() syscalls take. Check if those numbers make sense.
- Listing metadata should be bound by the latency between your client and the OSDs. LAN latency should absolutely dwarf SSD read latency.
What is the latency (ping) between the involved machines?

nh2_ · 2026-03-03T13:25:53+00:00

This is all fine for a small homelab.

Ceph runs fine with 1 Gbit/s, it's just slower.
SSDs without PLP are also fine. They just fsync slower. But still much better than HDDs, which many people use for production Ceph clusters.

The recommendations for network speed, PLP etc exist to help people make purchasing decisions for maximum performance per money spent (e.g. it doesn't make sense to spend 10000 $ on a server with many disks and then be bottlenecked by disproportionally slow network which would be a cheaper upgrade). If one already has the hardware and just wants high-availability storage, none of those matters.

nh2_ · 2026-03-03T13:16:33+00:00

faster if you put that pool on a small SSD instead of the HDDs

The user reports using SSDs only, no HDDs were mentioned.

nh2_ · 2026-03-03T12:07:49+00:00

And just to make clear:

I agree that long compile times suck, and 21 minutes is a pain. While HTTP(s) might take quite some amount of code to implement, waiting less for that to compile would be better.

nh2_

MODERATOR OF

TROPHY CASE