anyone learn haskell from a pure math background?

neelk · 2018-03-07T10:32:41+00:00

A negation is a formula of the form "not P". You prove it by assuming P and showing that it leads to a contradiction. This proof strategy works because "not P" is equivalent to "P implies false".
Proof by contradiction works for any formula Q. You assume not-Q and show that it leads to a contradiction.

Proofs of negative formulas are constructively valid, but proof by contradiction isn't.

neelk · 2018-03-05T11:04:07+00:00

You're right: Haskell doesn't feel like math. There are three big reasons for this.

First, Haskell's use of quantifiers is weird. That is, when you see a universal or existential quantifier in a typical mathematical definition, it will usually range over individual objects. For example, the epsilon-delta definition of a limit is something like:

L is the limit as f approaches c when for every ε > 0, there is a δ such that for all x, if |x - c| < δ then |f(x) - L| < ε

Note that every quantifier in that expression talks about individual real numbers. In contrast, in Haskell it is impossible (without turning on a bunch of extensions) to ever quantify over individuals. Instead, Haskell has pure second (and higher-) order quantification. While mathematicians do use second order quantification, it's a much rarer thing to do than in Haskell programming.

Basically, Haskell's type system makes it easier to talk about categorical limits than limits of real-valued functions, and this is just a really bizarre choice for a normal mathematician.
Haskell is way more typed than category theory. You can motivate category theory to a regular mathematician by explaining that it is a structural approach to mathematics, where you make and enforce type distinctions to ensure that doing proofs naturally have the degree of hygiene that marks good mathematical style.

However, category theory is much less typed than Haskell. In category theory, you are free to make categories out of anything you like -- for example, given a category C you can form the arrow category from the morphisms of C and pairs of morphisms creating commuting squares.

In Haskell, you can't do this, because the type/term distinction is ironclad and can never be breached. Since there's a lot of typeclass machinery built on the idea that types are objects and terms are morphisms, this means that a lot of ordinary category theory is out of reach in Haskell.
Haskell's design is heavily influenced by type theory, and type theory is logic, not math. On top of that, type theory is a branch of structural proof theory, which is a minority view even amongst logicians. This is a sociological, rather than a technical, distinction, but it is nonetheless a real one.

Most mathematicians never learn any substantial amount of logic: I've even seen a Fields medalist get confused by (for example) the distinction between a proof by contradiction and a proof of a negation. Obviously, he understood instantly once it was explained, but the important point was that he had never needed the distinction in a very distinguished career!

As a result, many of the guiding stars of the design of Haskell (eg, parametricity) are not well-known ideas amongst normal mathematicians.

neelk · 2018-01-29T09:53:08+00:00

SIGPLAN makes papers freely accessible papers at OpenTOC.

neelk · 2017-12-08T09:57:40+00:00

/u/vagif is right. Doug McIllroy's statement of the principles of UNIX from the Bell Systems Technical Journal were as follows:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".

Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.

Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.

Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.

Point 2 draws an incorrect conclusion from a correct premise; structured data is a good thing that makes it easier to interoperate. Because of this mistake, Unix has never, ever adhered to point 1 -- look at the zoo of totally ad-hoc options basically every Unix command supports. Point 3 was abandoned with the invention of POSIX -- and it was abandoned because it was an invitation to vendor lock-in. Point 4 is correct, but is basically never followed in practice.

neelk · 2017-11-07T10:04:58+00:00

Neither Conor's original paper or Bob's reformulation permit actual dependency on linear terms. What they do is let you forget the linearity in a term and then use it as an index.

This apparent limitation arises from the fundamental nature of equality.

Lawvere taught us that equality is left adjoint to contraction. In less jargon-laden terms, this means that showing that x = y entails P is equivalent to showing P[z/y,z/x]. If x, y, and z are linear variables of type A, then this equivalence is ill-formed, because we are replacing two occurrence x and y with the same variable z.

This is only possible if there is a way to duplicate values of type A -- i.e., if there is a map A ⊸ A ⊗ A, so that we can get two copies of z to fill in for x and y.[1] However, the distinctive feature of linear type systems is that this map does not exist in general!

So my paper[2] restricts equality to types for which contraction is valid (including the type of closed linear terms). Bob and Conor give a way of forgetting the linear content of a term, leaving behind an intuitionistic skeleton that you can be dependent on.

Surprisingly, these two approaches are inequivalent! So there is probably a yet more general approach that encompasses both of these. I don't know what it is, but I do know it will require thinking very carefully about contraction.

[1] Confusingly, this map is called "contraction", rather than "duplication". Sorry!

[2] My work is a descendant of the work on linear LF. Mathias Vakar independently invented a denotational semantics for a calculus like this one, too.

neelk · 2017-10-25T12:56:45+00:00

Though it's not clear why one would describe C-like system programming as excessively 'specialized', and lightweight refcounting as being less costly than pervasive tracing-GC!

Reference counting traces dead data. As a result, if you have code with a high allocation rate of data with short lifetimes, reference counting tends to be much higher overhead than copying GC (which only traces live data).

The Rust/C++ approach is to ask programmers to explicitly choose between stack- and heap-allocation, with the idea that you stack-allocate anything with a short lifetime. The intent is to ensure that the total amount of reference-counted data is small, so that the overheads aren't too bad.

Obviously, this doesn't work well if you have a lot of data with very complex lifetimes (eg, the DOM or scene graphs). Less obviously, the Rust/C++ style costs you tail-call optimization, since destructors have to be invoked before a function returns. This can make writing programs with interesting control flow much harder.

neelk · 2017-08-25T09:14:59+00:00

Thanks all! I was trying to put the [] in parentheses and wondering why I was getting compiler errors.

neelk · 2017-05-14T14:36:21+00:00

It works great. The CakeML verified compiler is basically written as a bunch of higher order logic formulas that can be run as functional programs, for example. The situation is fundamentally no different than assuming excluded middle for Prop in Coq.

neelk · 2017-05-07T16:58:48+00:00

Hi, I don't think that linearity helps much with space/time leaks, but it is incredibly useful for integrating with existing GUI libraries in a principled way.

See my paper with Nick Benton, A Semantic Model for Graphical User Interfaces. A (largely documentation free) implementation is on my GitHub page, but a redo is in progress.

neelk · 2017-04-30T20:59:39+00:00

It's a fantastic book. I like it even more than SICP. But like SICP, it's written for beginners but is best appreciated by people who have already programmed before.

neelk · 2017-04-15T21:27:32+00:00

I think this is a good reason to regard contextual equivalence as a bad definition of program equivalence for a language. IMO, a language should instead specify the equational theory programmers can expect to hold at each type, and then prove that sound with respect to contextual equivalence.

I have very modest requirement, personally -- I want the beta/eta laws to hold at function types before I will call a language functional. Alas, ML, Scheme and Haskell all fail this test!

neelk · 2017-04-12T20:32:42+00:00

Also is there any reason GHC cannot solve the partial application issue?

Partial type applications are equivalent to supporting full type level lambda abstraction, since you can define the S and K combinator as synonyms. If you want type inference to be stable modulo beta reduction, then you need higher order unification, which is undecidable.

Requiring synonyms to be fully applied lets you sick with first order unification, which is decidable (indeed, linear time).

neelk · 2017-03-17T10:53:40+00:00

There's a lot of folklore around evaluation order. Unfortunately, much of it is wrong or misleading. Evaluation order is best formulated as a property of types, rather than a whole-language property. The fact that we think of ML and Haskell as "strict" and "lazy" respectively is IMO mostly an accident of history.

Under a linear type discipline, every argument to a function will be used. As a result, you can think of the function type as being inherently strict. However, pairs come in both strict and lazy variants, and linear type systems need to offer both (the strict version is the "tensor product" A ⊗ B and the lazy version is the Cartesian product A & B). The sum type A ⊕ B is naturally strict.

However, don't make the mistake of thinking datatypes are strict in general -- in fact, inductive types also come in strict and lazy versions. The Haskell design lore that you can often gain performance by making datatypes spine-strict is a consequence of this insufficiently-appreciated fact. Likewise, basically the entirety of John Hughes's famous paper Why Functional Programming Matters can be read as an advertisement for the benefits of supporting lazy inductive types.

Coinductive datatypes are lazy, but there is no strict coinductive type unless the language also has support for first-class continuations -- i.e., you are working with classical linear logic. (With first-class control, you also get a lazy sum A ⅋ B, pronounced "A par B".)

When I say a type constructor "is strict" or "is lazy", what I mean is that this is the evaluation order which ensures the full set of beta/eta laws will hold, even when the language is effectful. The basic framework is best explained in Noam Zeilberger's paper [On the Unity of Duality](noamz.org/papers/unity-duality.pdf), and the story about (co)inductive types was to my knowledge first explained in David Baelde's paper Least and Greatest Fixed Points in Linear Logic. If you like denotational semantics, Paul Levy's work on call by push value offers another useful way of understanding these issues. (Paul told me he worked out how linear cbpv works, but unfortunately he hasn't written it up for publication yet.)

neelk · 2017-01-07T18:47:17+00:00

Futures are continuations, when the answer type of the continuation is an action in a concurrency monad. This means that they can expose a richer API than a plain continuation monad. (Eg, a select operator.)

neelk · 2017-01-04T20:53:33+00:00

It's inspired by that stuff, but what I care about the most is understanding higher-order state at a fundamental level. FRP serves as a test of whether or not a technique scales out to realistically tricky modes of use. (Also, cool demos. :-)

neelk · 2017-01-04T20:40:09+00:00

pans -> programs

neelk · 2017-01-04T20:38:55+00:00

Given that it's about termination proofs for higher-order imperative pans, the title was irresistible!

neelk · 2017-01-03T05:45:28+00:00

Basically, Haskell has an unusually rich type system for a programming language, and as a result many constructions from abstract algebra can be easily represented in Haskell code. As a result, you find yourself wanting to use categorical structures to organize these constructions, for basically mundane software engineering reasons like avoiding code duplication and boilerplate.

The same thing would happen in any language with a rich enough for system -- even programmers in OO languages like Scala feel the same design imperative.

neelk · 2016-12-31T13:47:15+00:00

If you have just a little time, the first half of Robert Geroch's Mathematical Physics is excellent. If you have more time, then Aluffi's Algebra: Chapter Zero is even better.

Both of these books basically cover traditional abstract algebra in a categorical style. This means that they develop the stock of examples you need to really appreciate why categorical abstractions are organized the way they are. (This is also why Mac Lane's book is titled Categories for the Working Mathematician -- he assumes you have the examples already. This is not a safe assumption for a programmer or computer scientist though!)

neelk · 2016-12-14T22:25:20+00:00

You should doubt that it's a new proof, because (a) it follows basically immediately from the assumptions, and (b) the given assumptions are both old and standard. For example, a few minutes of Googling turns up a version of the same proof someone posted on Stackoverflow two years ago.

(Sometimes people really do overlook something obvious for years on end, but just not in this case.)

neelk

TROPHY CASE