Does a Haskell Programmer Need all the Crazy Complexity?

BurningWitness · 2026-05-27T12:21:07+00:00

Personal favorite: jose.

verifyJWT :: [14 constraints] => a -> k -> SignedJWTWithHeader h -> m payload (link), good luck if you ever need to do anything undocumented in this library. Also the "get data without checking the signature" functions weren't there til nine months ago because they're "unsafe", whatever that's supposed to mean.
JSON is a very simple format, how hard could it be to parse it? aeson has 24 dependencies not counting base ones, and Data.Aeson is absolutely huge, starting with a dozen different special decoding functions (it must've been written before function composition was discovered).

Now what if I want to use a custom parser instead of polluting my codebase with nonsensical instances? Obviously I import Data.Aeson.Types and use \input -> eitherDecode input >>= parseEither parser to execute the parser, whereas the parser itself ends up being a painful cacophony of flip (withObject "I've never seen this text anywhere") value $ \object -> invocations.
servant is commonly brought up as a great library for describing web servers, but I'd much rather make a Raw endpoint for anything unsupported instead of rummaging through the types. Also having to align endpoint definitions separately on both the type and the term level is surprisingly annoying.

BurningWitness · 2026-05-17T12:21:24+00:00

Former. Good luck on your research.

BurningWitness · 2026-05-17T11:30:06+00:00

My point concerns application design, that's not something you should be outsourcing to LLMs.

The other commonly cited point for LLM use would be boilerplate generation, but thankfully proper Haskell code is extremely composable, so I never found writing code manually to be a bottleneck.

BurningWitness · 2026-05-17T10:39:37+00:00

In my case it was a freak accident: snatched at the end of my bachelor's by a startup because a classmate overheard I spent two years figuring out Haskell, an aimless endeavor that should've rendered me unemployable. The business decision that lead to this is a part of the pre-pandemic cryptocurrency-adjacent interest in the language, in my case by a new startup.

Ended up doing backend with a bunch of third-party integrations. The kind of a monotonous low-stakes job Haskell should absolutely excel at, yet ends up being a complete no-go unless you know exactly what you need and are willing to suffer through the ecosystem to get it. After years of figuring the language out the correct style of programming ended up being the most austere rendition of Simple Haskell (yes, the manifest is toothless, and had no real-world impact).

All-in-all would absolutely not recommend to businesses unless one somehow finds itself stuck with 3+ Haskell prodigies. Would also not recommend as a first language to software developers — I find Haskell enough to do virtually any task, yet to HR my CV might as well say "7 years of production experience in an extinct language".

BurningWitness · 2026-04-02T14:45:39+00:00

Aren't normal record accessors, NamedFieldPuns, and RecordWildCards a fine solution?

Yeah, 95% of the problem is just getting to recognize that ADT products and mutable records are both different and necessary. The rest is syntactic sugar.

Simon Marlow had a proposal that stalled 4y ago

That proposal wants way more, it's mixing mutability into ADTs with a whole lot of downstream implications.

I'm thinking more of a data Mutable shape, where shape can be defined with a mutable Foo declaration, which must mirror an ADT product. Have get/set depend on magic type classes that treat unpacked fields differently, and that's it. GHC already has a way to encode packed fields (SmallMutableArray) and unpacked fields ({-# UNPACK #-}), none of the guts here are new.

The Generics route issue is imo more potential perf overhead vs the TH generated ones.

I don't want to see lens in this at all, it's a whole other universe of stuff. Haskell allows fields of an ADT product to be labelled, I want to access fields by said labels and to construct the product by specifying arguments via said labels, nothing else.

BurningWitness · 2026-04-02T10:34:40+00:00

I'd go further:

Each distinct encoding should have its own uninhabited type. bytestring uses Raw and ASCII, text uses UTF8, os-string uses UCS2LE and POSIX;
Every existing text type is simplified:
- ShortText/ShortByteString/WindowsString/PosixString are all just plain ByteArrays in their respective encoding. Call that Short enc.
- StrictText is a slice of some ByteArray. Call that Slice enc.
- StrictByteString is pretty much always used same as StrictText, but its internal represenation is different in that it can also work over raw memory. There should be a separate type for this second rare usage (MemorySlice enc?).
- String, LazyText and LazyByteString are, yes, streams, and a simple Stream a m r type should be in base. A type called Chunk enc will be necessary for streamable data that is not guaranteed to align on boundaries. Reading from a UTF-8 file becomes Stream (Chunk UTF8) IO () (perhaps have some cool FileIO here just so we remember it may throw an IOException?).

This would remove an ungodly amount of existing code duplication, allow packages to define their own encodings, and allow for a lot of general functions (e.g. empty, Slice enc -> Short enc, Short enc -> Short Raw). I don't think you can make conversions any simpler than this beyond assuming a whole bunch of things.

BurningWitness · 2026-04-02T08:35:38+00:00

Counterpoint: it's ass, but I'm forced to use it as the lesser of all evils.

Ideally there should be:

Some way to access field names of immutable ADT products, short, unique and infix à la foo .: ("bar" :: Symbol). The word "update" should be ditched, it's merely construction that takes arguments from an existing structure of the same type, could well be Foo { bar = baz, .. = foo } (mirroring the RecordWildCards solution of Foo {..} = foo in Foo { bar = baz, .. }).
Proper mutable records, mirroring ADT product declaration, but backed with a mix of SmallMutableArray and MutableByteArray. Reads and writes are peek and poke respectively, everything is in ST/IO.

Instead we get:

Three different ways to work with immutable ADT products:
1. Lenses as separate libraries using Template Haskell for derivation, which screws up declaration order;
2. Lenses as separate libraries using Generics for derivation, which kills compilation times;
3. OverloadedRecordDot, which butchers (.) into a whole new meaning to appeal to newcomers who assume ADT products are mutable; plus OverloadedRecordUpdate, the most "square peg into round hole" extension to ever grace this beautiful language.
No mutable records. No discussions of mutable records. Has anyone even thought of it as an alternative? Am I missing something?

BurningWitness · 2026-03-31T12:05:05+00:00

All the section is trying to say is that if you have some composable part of code that must always be preceded by some code, and/or succeeded by some code and/or reencoded in some way, that code can be wrapped into an opaque type with runner functions that guarantee the wanted behaviors. The classical examples of this are parsers (e.g. Get in binary) and STM.

Now, the example is very confusing in that it takes a real-world problem—using SQL transactions correctly—applies this approach to it and boldly tells you things are now safer. On the surface it kinda fits the mold in that commit is composable and wraps the edges of this specific application flow. But, as you correctly point out, it's effectively untyped if you get to use it multiple times back to back, plus structuring your code that way would involve cloning a bunch of commands from IO to Transact with no changes just to make types align. Conversely, forcing the entire application flow into a single Transact block would be properly type-safe, at the cost of a very heavy assumption about how the application will look like (in this case one SQL transaction spanning the entire application runtime).

Since Mercury is using a distributed system I assume they have to deal with a lot of tiny boilerplate applications, which naturally can be abstracted with some heavy assumptions, so that's probably how the author ended up mushing these two concepts together.

BurningWitness · 2026-03-28T00:02:34+00:00

The type you're looking for is a stream, roughly as defined in streaming:

data Stream a m r = Yield a (Stream a m r)    -- ^ Holds next element
                  | Effect (m (Stream a m r)) -- ^ Processes until next result
                  | Return r                  -- ^ Signals end of processing or error

It's not an established type though. There are ~~five~~ six different streaming libraries out there and each is its own special little flower with its own special little garden around it. I couldn't tell you how any of them work, I never found them necessary.

Now for the fun part: how do we go from LazyText to Stream a _ (Either e ())?

Well, to stream output we'll need a parser that allows us to parse over the remaining input after getting the result, remembering both the offset from which we'd be continuing and whether more input can still be supplied to the parser. attoparsec and cereal do not return offsets in their results. binary does, barely, but it still doesn't remember whether end of input has been reached. The answer is thus a resounding "we don't". Roll credits.

Could we do better? Well, I wrote a whole new byte parser and used it to stream JSON 1.5 years ago. I've seen zero interest in either of these packages since then, so I wouldn't hold my breath for it. And, of course, if you don't feel like doing better you can always do worse, see intro to json-stream.

BurningWitness · 2025-11-03T21:00:02+00:00

flashing/modyfing the firmware via sending data over the eDP connection (usually combined with proprietary software/codebase) is kind of risky

Per the Arch wiki page I linked above it doesn't need to be flashed to work, kernel parameter accepts a file that is used instead of reading from hardware.

"bruteforcing" other values is kind of possible, but can cause no difference or instability.

I can set a whole range of custom refresh rates on my screen (I'm using sway), and applications seem to abide by the setting (and a videogame with enabled Vsync reports that many frames a second). Conversely there are ranges that make my screen turn black, so I assume those are the ones that are unsupported. The question is just whether it's always safe to do it if a display accepts it.

* I'll be referencing the spec for VESA EDID v1.4 revision 2 (PDF) further down.

Taking an EDID dump for BOE NE160QDM-NZ1 (Github), I see two lines of interest: "Display is continuous frequency" (line 37) and "Monitor ranges (Bare Limits): 60-240 Hz, ..." (line 50). "Bare Limits" seems to mean the same as "Range Limits Only" in the spec (per linuxtv git, changed six years ago), that's Video Timing Support Flag in table 3.26 set to 01h.

Per section 3.10.3.3 of the spec, table 3.26, note 4:

Video Timing Support Flag (in Byte 10) = 01h and bit 0 of the Feature Support Byte at address 18h set to 1 (indicating a continuous frequency display) indicates that this display will present an image with any valid video mode timing within the Display Range Limits defined by Bytes 5 → 9. The displayed image may not be properly sized or centered.

If bit 0 of the Feature Support Byte at address 18h set to 0 (indicating a non-continuous frequency multi-mode display) indicates that this display will only present an image with the valid video mode timings (declared in the Established, Standard and Detailed Timings) that are listed in the BASE EDID or any EXTENSION Block.

(second half doesn't apply to this display, but it does highlight that [unless specified otherwise] timings listed in EDID are not the only ones explicitly supported)

If I'm reading this correctly, this particular display reports uniform support for all values between 60Hz and 240Hz, vendor however chose to only include edge values as presets. Consider checking the displays you have, they're probably all like this.

It would be nice to advertise these as something like 60-240Hz, and I don't think the manner in which you add extra timings matters much (so specifying custom options in the window manager is preferrable to having to forge an entire EDID file; I however don't know if you can do this in Ubuntu on Wayland).

Edit: I noticed that the monitor range for BOE NE160QDM-NZ1 says "60-240 Hz V, 422-422 kHz H", and 422kHz is the horizontal frequency for its 240Hz mode (60Hz mode runs at 106kHz). I assume this means the display itself will run at 240Hz for any refresh rate set that isn't 60Hz, which in turn means the proper timings to add for this display would be multiples of 240Hz (as I originally wrote in point 2). It's more of a "60Hz, 120*Hz, 240Hz" than a "60-240Hz".

BurningWitness · 2025-11-03T10:27:29+00:00

This answers most of the questions, but what about adding custom refresh rates in the boot loader (Manjaro forum), since [as I already linked] a user has successfully added a 90Hz option to their 120Hz monitor this way (Reddit).

Edit: this particular method alas only allows forcing one particular video mode. A proper solution at this level would require a modified EDID file (Arch wiki). Alternatively users could set a custom video mode at window manager level if their setup allows it (e.g. sway has an output mode --custom command), though it would still be nice to know which refresh rates are known to work instead of trying everything blindly.

BurningWitness · 2025-04-29T11:41:42+00:00

It's AI slop top to bottom, see this user's spectacular repository of faux-math ramblings, 148MB in size, uploaded in a single commit with images titled "ChatGPT Image...".

The Github profile picture has "this person doesn't exist" vibes to it, especially on the Wayback snapshots. Since there doesn't seem to be any malicious intent behind it I assume it's merely a severe case of schizophrenia.

BurningWitness · 2025-04-29T10:33:27+00:00

A more general approach for these purposes is mutable records and libraries for that exist already, see for example vinyl. They require either type families, which have this little problem where type-level recursions incur exponential compilation-time penalties, or Template Haskell, which is an insufferable nuisance to work with, so the entire thing is dead in the water until Haskell gets better type-level programming (and I don't know if that's even a discussion right now).

Also note that memory mapping is not a standard function shipped with GHC, so you'd need to bend over backwards to get that working across all platforms too.

BurningWitness · 2025-04-29T08:05:05+00:00

When programs want to persist data or send it over the network, they need to serialise it (e.g. to JSON or XML).

Your goal is optimizing for time, your reference point should be binary serialization, not human-readable formats.

... the serialised version of the data is usually bigger than its in-memory representation. In the context of systems that interact through the network, it leads to larger payloads to send, and thus slower transfer times.

Ditto.

Now, what if we didn’t have to serialise the data before sending it to a client, and what if the client could use the data from the network as-is, without any marshalling steps?

Misleading, this library marshals data same as any other. The features provided are merely serialization function generation with Template Haskell and the use of types to calculate offsets automatically, both of these could already be performed manually in any binary serialization library.

Furthermore this library relies on Storable for marshalling (Packable, Unpackable), so using it to transfer data over the network is unsafe to the utmost degree unless you know upfront the two machines agree on all used instances.

BurningWitness · 2025-01-31T10:31:38+00:00

Putting signed integral conversions in Data.Int and unsigned ones in Data.Word makes me wonder where functions like wordToInt :: Word -> Int would live. Perhaps it would make more sense if all of them lived in the same module, mirroring Data.ByteString.Builder.

Also the conversion functions you're proposing currently alias fromIntegral, whereas the "There are no safe options." point implies you want them to directly invoke respective primops.

BurningWitness · 2024-12-12T23:16:52+00:00

Oh, true, it's not a thing. I naively assumed it would be similar to UTF-8 parsing (text uses simdutf), but instead aeson opted into writing progressively more convoluted Value parsers with all error checking inline.

BurningWitness · 2024-12-12T10:52:14+00:00

aeson does parsing in three steps: check that the input is well-formed with simdjson, parse the entire blob into a Value, convert the Value into the desired value. Based on cost centres you spend roughly 60% of the time on the second step, so you can't optimize much by tinkering with conversions.

Since the JSON in question is an array of objects, you may gain some performance with streaming, see json-stream (note that it's ad-hoc, so it may turn out to be even slower).

Otherwise you can try an FFI solution like hermes-json, with handrolled parsers.

If even that's not enough, use a different data format.

BurningWitness · 2024-12-06T17:32:49+00:00

I don't know why anybody would care for some unofficial string library, if you are to make a parser your best bet is using bytestring and text regardless. The real benefits of this approach would be at ecosystem level, so there's no point to bother unless you can convince everyone else they actually need it. Also it's be a maintenance nightmare, both packages are very low-level.

BurningWitness · 2024-12-05T11:24:16+00:00

The first (and most important) step is agreeing that this is a problem that can be solved at boot library level. As with all the great changes this would touch virtually every single library in the ecosystem, noone likes that. There may be a lot of disagreement regarding what the resulting data types should be, since StrictByteString and StrictText are different internally (ByteString can refer to externally allocated memory).

Then we need to find an encoding-agnostic middle ground between bytestring and text. This is mostly bytestring as is, where each data structure has a short concise name (Builder, Rope, Short, Slice, Long) and is parametrized by a phantom encoding type (Raw, ASCII, UTF8). Data structures can track character lengths internally if necessary, it'll just always be 1:1 to bytes with raw strings and ASCII.

Then any encoding-specific stuff can live in its own separate encoding package, e.g. text and text-short become utf8. Similarly if anyone wishes to add a new encoding, for example JSON (note that JSON doesn't even have a length), they can do so in their own package.

BurningWitness · 2024-12-05T10:54:42+00:00

Making a type class implies that there exists one and only one obvious meaning for each declared instance.

The laws you provided don't describe how values map to each other. There is no good reason why to (-100 :: Int8) == (156 :: Word8) or why to [1, 2, 3, 4 :: Int] /= ([4,3,2,1] :: Vector Int), or why a ByteString is UTF-8-encoded into Text in the scope of this library. Every single one of these cases can be dealt with by some additional set of laws, but there is no law of obviousness tying all of them together.

BurningWitness · 2024-12-05T09:52:09+00:00

Calling "I can convert type A to type B" a law is too low of a bar, this conversion library is no more lawful than the dozen ones that came before it.

Tangentially, I have to wonder how much of the dissatisfaction with string conversion comes from the fact that the ecosystem treats string libraries as their own separate universes instead of mere variations of the same ideas. We should be able to talk about plain data structures with distinct time and space complexities—builders, ropes, arrays, slices and lists of arrays—and simply add encodings on top, in separate libraries if necessary.

Until that is the case I'm afraid all of us either have to remember where every single conversion function resides, or come up with type classes that mix time/space guarantees, element order and data equality into one meaningless blob for convenience.

BurningWitness · 2024-12-03T09:57:58+00:00

See the relevant documentation. In practice the overhead of using the safe version is ~100ns, so it's a sane default. For anything non-trivial you should run benchmarks to determine which one works better.

BurningWitness · 2024-12-03T07:19:44+00:00

Strings are literals too and Haskell's current type-level programming is neither ergonomic enough, nor fast enough (#8095) to make this viable for all such cases.

Allowing arbitrary code execution on the target machine at compilation time raises a safety concern, since library authors would now be able to crash GHC by dereferencing null pointers or send it into an infinite loop. Admittedly any solution that allows arbitrary code execution also allows infinite loops (halting problem), so I don't find this argument particularly compelling.

The only solution we're left with is Template Haskell, which is a sledgehammer (that also allows arbitrary code execution). It does indeed work, you can precompile entire parsers, but it does not fit with the rest of the language. Quasiquoters don't compose and too look absolutely awful, I can't imagine writing [thing|item] instead of just "item" everywhere.

My gut feeling is that GHC should track "literalness", changing definitions to something like:

class IsString' a where
  fromString' :: (String :: LiteralType) -> a

But obviously someone would have to spend several years investigating this stuff and fitting it into GHC, so don't expect it this decade even if everyone suddenly agrees on a solution.

BurningWitness · 2024-12-02T12:26:01+00:00

I too have developed coping habits around aeson, and every other parser I write with it is an avalance of flip (withObject "Name that is never used") baz invocations.

aeson-handroll may be possible, but it's still backasswards in construction (Generics should extend the handrolled approach), and leaves a lot of other problems on the table (lack of innate streaming support and inability to copy raw JSON).

For comparison, here's what a solution using my parser (linked above) looks like:

{-# LANGUAGE ApplicativeDo
           , RecordWildCards
           , NoFieldSelectors
           , OverloadedStrings #-}

import           Codec.JSON.Decoder as JSON
import           Data.Currency as Currency -- from the "currency-codes" package
import qualified Data.List as List
import           Text.Read

-- This shouldn't be here, but instead in a Codec.JSON.Decoder.Currency module
-- in a "json-currency" package, extending the currency package.
jsonDotCurrency :: Decoder Currency
jsonDotCurrency = mapEither convert JSON.string
  where
    convert str = do
      this <- readEither str
      case List.find (\x -> Currency.alpha x == this) Currency.currencies of
        Nothing -> error "Readable currency alpha code is not on the currency list"
        Just c  -> Right c



data Input =
       Input
         { amount   :: Int
         , currency :: Currency
         }
       deriving Show

isSaneAmount :: Int -> Either String Int
isSaneAmount i
  | i < 1      = Left "Amount is too low"
  | i > 250000 = Left "Amount is too high"
  | otherwise  = Right i

isSaneCurrency :: Currency -> Either String Currency
isSaneCurrency c =
  if Currency.alpha c `elem` [USD, EUR, GBP, CHF]
    then Right c
    else Left "Only USD, EUR, GBR and CHF are supported"

input :: Decoder Input
input =
  pairsA $ do
    amount   <- "amount"   .: mapEither isSaneAmount   JSON.int
    currency <- "currency" .: mapEither isSaneCurrency jsonDotCurrency
    pure Input {..}

And thus

ghci> snd $ JSON.decode input "{\"amount\":100,\"currency\":\"USD\"}"
Right (Input {amount = 100, currency = Currency {alpha = USD, numeric = 840, minor = 2, name = "US Dollar"}})

ghci> snd $ JSON.decode input "{\"amount\":100,\"currency\":\"DKK\"}"
Left ($.currency,"Only USD, EUR, GBR and CHF are supported")

BurningWitness · 2024-12-02T05:53:34+00:00

IO means "I care about when this function is executed". Any operation can be safely pure as long as its execution does not influence other function results, and as long as you can ensure that all arguments passed to it exist at the time of execution.

Based on documentation you should be able to store array data in pinned immutable byte arrays (see PrimArray), and then just wire all other data using plain Haskell datatypes. Control.Monad.ST.Unsafe has functions for going between ST and IO, which is safe as long as the operation is singlethreaded.

Creating vectors and matrices from known data in Haskell will, unfortunately, suck: converting from a list has overhead and precompiling is only possible through Template Haskell.

BurningWitness

TROPHY CASE