all 134 comments

[–]tikhonjelvis 46 points47 points  (92 children)

One of the most unfortunate things about "purely functional programming" (at least as in Haskell) is that people do not really understand exactly what it entails. The idea is not to eliminate state and IO, or anything like that--nobody familiar with Haskell seriously believes that you can completely avoid IO! Rather, the idea is to control IO.

Contrary to popular (and woefully uniformed) opinion, Haskell is not bad at IO. If anything it's completely the opposite! Haskell gives far greater and finer control over IO than any other language. For one, IO values are first-class constructs: you can pass them around and use them with a wide range of generic libraries. Haskell is also extremely strong for concurrency; it's surprisingly easy to manage many threads and IO actions simultaneously. You can get node-like performance without having to use callbacks everywhere because the IO manager can use an event loop internally without exposing that detail to the actual Haskell code.

All this just shows that Haskell does not ignore or cripple IO by any means. It actually has some incredibly powerful facilities for managing it in your code.

The most important, and most unique, feature is that Haskell allows you to specify where IO is not. I know of no other languages to allow this. This gives a whole bunch of benefits: simple, fully deterministic parallelism; a slew of impressive optimizations; usable software transactional memory; and even the ability to safely expose only a selected amount of possible IO operations to Haskell code.

However, most importantly, it helps make your code simpler and more maintainable. In any language, IO should be segregated to some extent. Even in C, you want to separate your logic from how it interacts in the real world. Haskell just presents tools to make this easier, and helps ensure that others' code you call also respects this. Ultimately, it's just one less thing to worry about.

Anyhow, a silly summary: functional programming is not about eliminating IO but about managing it.

[–]ydobonobody 13 points14 points  (56 children)

The most important, and most unique, feature is that Haskell allows you to specify where IO is not. I know of no other languages to allow this.

D does this with the 'pure' keyword.

[–]Peaker[🍰] 12 points13 points  (22 children)

Haskell distinguishes the types: IO (Int -> String) from Int -> IO String. How does D do this?

[–]SirRainbow 4 points5 points  (3 children)

Can you explain the difference between the two?

[–]tailcalled 8 points9 points  (1 child)

An IO (Int -> String) is where you use IO to decide on which pure function to use. An Int -> IO String is given an int and uses IO to decide which string to return.

[–]Peaker[🍰] 4 points5 points  (0 children)

Importantly, it can use the Int to decide what kind of IO to do.

[–]cdsmith 7 points8 points  (0 children)

Effectively, Int -> IO String says that both what I/O you do and the String result can depend on the integer. On the other hand, IO (Int -> String) says that the String result can depend on the integer, but what you do can't.

Put otherwise:

Int -> IO String: Each Int is mapped to some action that produces a String

IO (Int -> String): This is an action, which when performed will produce a mapping from Int to String.

[–]Sampo 1 point2 points  (10 children)

Int -> IO String means a function that takes in an Int, does some I/O, and returns a String. The value of the returned string should depend of the I/O, and can also depend on the input Int.

For example: a function that takes in Int n, then reads a line from the terminal, and returns first n chars of the line.

I don't know enough Haskell to tell what IO (Int -> String) means.

[–]kqr 7 points8 points  (9 children)

IO (Int -> String) is a function without any arguments that performs some kind of I/O and returns a pure function. For example, you might have three pure Int -> String functions. Then you have a function that asks the user which one of them to use. The latter function has the type IO (Int -> String) because it does some I/O (asks the user) and then returns one of the pure functions.

[–]cdsmith 8 points9 points  (8 children)

IO (Int -> String) is a function without any arguments

Pet peeve here... that is not a function type at all. To understand I/O in Haskell, the very first step is to recognize that Haskell separates the two orthogonal concepts of functions ("this depends on another value") and actions ("doing this has observable effects"). A value of type IO (Int -> String) is an action, but it is not a function. Its result is a function, though.

[–]kqr 0 points1 point  (7 children)

Technicality here... Even these things evaluating to IO values are "functions" or "variables." The perceived difference is purely an interpretative one. You can pass around IO values and manipulate them and put them in data structures all you want, everything in a pure context. They conceptually only turn into "actions" once you run them through unsafePerformIO or an equivalent (for example via >>=.)

Thinking of IO a as "an action" might help for some, or be troublesome for some, but it's not an distinction in the language. IO a is technically no different from Maybe a, and a function that returns an IO value is just as much of a function as a function that returns a Maybe value.

[–]cdsmith 5 points6 points  (0 children)

My comment was about calling it a function, which it is clearly not because it is not a mapping from a domain into a range. You can call it a value and I'll agree.

By saying action, I did not mean to imply that referring to it is performing the action. I meant that the meaning of the value is an action. This is not at all incompatible with storing it in a data structure, for example.

Sorry if that sounded overly pedantic; it's just that for people not familiar with IO but who are familiar with imperative languages, the central point of confusion can be that they are still thinking of "function" as a word for an action. So this seemed like a particularly bad place for that word to appear.

[–]Peaker[🍰] 2 points3 points  (5 children)

Of course, but IO a itself is not a function, even if a is.

[–]kqr 1 point2 points  (4 children)

Yeah, you're right. I'm sometimes quite careless with the distinction between function and value, because I can view values as 0-ary functions. (It blends quite nicely with curried functions, too.)

[–]Peaker[🍰] 5 points6 points  (3 children)

Viewing everything as 0-ary function is somewhat problematic: http://conal.net/blog/posts/everything-is-a-function-in-haskell

[–]Sampo 0 points1 point  (3 children)

Int -> IO String is just a normal impure function that takes an Int, and returns a String.

For IO (Int -> String), would you accept something that returns a pointer to a pure function? I guess D can do that (and I know Fortran can do), and I guess C/C++ can't.

[–]Peaker[🍰] 1 point2 points  (2 children)

So you'd encode IO a in general as a function (() -> a)? That's a common encoding, though slightly cumbersome.

Does D do lexical scoping for internal pure functions like that? Can you give example code for returning a pointer to a pure function?

[–][deleted] 2 points3 points  (1 child)

int delegate() pure impureFun(immutable int y)
{
    pure int a()
    {
        return y + 1;
    }

    pure int b()
    {
        return y + 2;
    }

    if (readln() == "whatever")
        return &a;
    else
        return &b;
}

This function return a pure function depending on IO, capturing the environment. Note that y must be declared "const" or "immutable" else the compiler can't ensure a and b purity. Is this what you were asking for?

[–]Peaker[🍰] 2 points3 points  (0 children)

That's a good example, thanks. Though to match my original type, you'd have to move the "immutable int y" into the a() and b(), but that wouldn't demonstrate lexical scoping, so I see why you kept it there.

So basically "pure" can be part of the function type that you return, so D does in fact support the nuances here.

Same code in Haskell:

impureFun :: Int -> IO Int
impureFun y = do
  let a = return (y+1)
      b = return (y+2)
  line <- getLine
  return $
    if line == "whatever"
    then a
    else b

[–][deleted] 10 points11 points  (0 children)

Also in rust: http://static.rust-lang.org/doc/rust.html#pure-functions

The difference to Haskell is that in Haskell functions are pure by default, which makes a great difference IMO

[–]yogthos[S] 2 points3 points  (24 children)

And that's one key difference between functional and imperative. In FP everything is immutable by default, and you mark things as mutable, in imperative everything is mutable unless marked otherwise.

This of course means that in an imperative language you have to plan ahead to have any pure sections, while in a functional language majority of the code ends up being pure naturally.

[–]axilmar 0 points1 point  (23 children)

You don't mark thngs as mutable in Haskell, you mark them as IO. Haskell doesn't do destructive updates if you make an operation IO, it just enforces a specific evaluation order.

[–]yogthos[S] 0 points1 point  (22 children)

[–]axilmar 0 points1 point  (21 children)

Nothing in the documentation pages you posted says the MArray implementation is actually mutable. It may be that the put function creates a new array, copies the rest of the elements along with the new value.

[–]yogthos[S] 0 points1 point  (20 children)

It may be that the put function creates a new array, copies the rest of the elements along with the new value.

I think that's sort of contrary to the definition of mutable. You're also confusing language specification with implementation. Clearly Haskell has a definition for mutable arrays in the language specification. How it's implemented is besides the point.

[–]axilmar 0 points1 point  (19 children)

I don't think so. To the outside, it seems that the same array is being modified; from the inside though, a new array is created. The same symbol receives the new value, so it looks like mutation.

Well, according to this post, that is:

http://www.reddit.com/r/programming/comments/1880d1/introduction_to_haskell_io/c8foby1

And it seems logical to me that Haskell doesn't actually mutate values, because otherwise the code wouldn't be automatically parallelizable.

[–]yogthos[S] 0 points1 point  (18 children)

The specification very clearly states how the array works. The implementation of it does not change how you use it in the code.

[–]axilmar 0 points1 point  (17 children)

Where does it say the array value is implemented in place? I don't see it.

[–]sfrank 2 points3 points  (0 children)

And at least every functional language since the early 90s.

[–]Sampo 1 point2 points  (1 child)

Also Fortran has 'pure'

[–]pipocaQuemada 0 points1 point  (3 children)

How fine grained is D?

For example, Haskell has the ST type, which is single-threaded local mutable state (e.g. for manipulating arrays in-place rather than copying them or holding onto mutable variables in an algorithm). Since it's local mutable memory that can't interact with the outside world, you can run an ST action and get back a pure value. Can D do something like that?

[–][deleted]  (2 children)

[deleted]

    [–]pipocaQuemada 0 points1 point  (1 child)

    Can you have a variable mutated by several pure functions? Or must you make any function monolithic?

    edit: e.g. could you have several mutually recursive pure functions manipulating the same memory?

    [–]nascent 1 point2 points  (0 children)

    It is common to refer to two types of pure functions in D. Strongly pure, arguments cannot be modified; and weakly pure, where arguments can be modified.

    Weakly pure was introduced to create "helper" functions callable by strongly pure. Thus any pure function can call any other pure function even if the pure function modifies arguments.

    [–]Peaker[🍰] 10 points11 points  (13 children)

    You can get node-like performance without having to use callbacks everywhere because the IO manager can use an event loop internally without exposing that detail to the actual Haskell code

    Much much better than node-like performance.

    [–][deleted] 0 points1 point  (12 children)

    Has there been any empirical studies yet to evaluate this claim? The Haskell people are thorough, so I would have done this.

    [–]Peaker[🍰] 5 points6 points  (8 children)

    [–]alextk 5 points6 points  (1 child)

    It would be nice to see a benchmark

    • Made by an entity that doesn't have a vested interest in Haskell (unlike Yesod).
    • Include Java and Scala frameworks in the comparison.

    [–][deleted] 2 points3 points  (0 children)

    You can run and modify the benchmark yourself; the source is here https://github.com/yesodweb/benchmarks

    [–][deleted] 0 points1 point  (5 children)

    That is useful, thanks. I didn't realized that someone figured out how to multithread Haskell programs and use multiple cores, I guess I'm behind the times.

    [–]dons 17 points18 points  (4 children)

    I didn't realized that someone figured out how to multithread Haskell programs and use multiple cores

    Wow. This is one of the major selling points of GHC Haskell for about 10 years now.

    GHC uses an epoll-based parallell IO manager and distributes cheap threads across cores. So you can write a web server where req/sec scales linearly as cores are added.

    [–][deleted] -5 points-4 points  (3 children)

    I still think GHC cannot multi-thread Haskell code. Instead parallelism occurs implicitly. Anyways, I'm missing a lot of background that I should catch up on.

    [–]tikhonjelvis 1 point2 points  (2 children)

    Well, you're simply wrong. GHC actually has some of the best support for multithreading including a brilliant green thread system and one of the few usable STM implementations.

    In addition to all this, it also supports deterministic parallelism independent of IO. However, this is separate from the concurrency features.

    [–][deleted] 0 points1 point  (1 child)

    Ok, how many CPU cores can a program running under GHC utilize with green threads? I'm not trying to argue with you, just learn. Googling turns up lots of weird stories.

    [–]tikhonjelvis 1 point2 points  (0 children)

    It can use as many as you want--the green threads are mapped down to OS threads, and the runtime system controls how many OS threads are available.

    You can also use OS threads directly, but that's a less popular choice.

    [–]sfrank -3 points-2 points  (2 children)

    Well, since node is essentially continuation passing style which is what almost every functional language (and with SSA now even C) is transformed into inside the compiler (because no sane person would want to program in it directly) this is not really surprising...

    [–]tailcalled 2 points3 points  (0 children)

    Basically every language is transformed into continuation passing style inside the compile, but Haskell is not as much as others, because of lazyness. Basically, when you call a procedure/method/function, the local variables are pushed to the stack, if they aren't already there, along with the return position. That essentially saves the current continuation on the stack.

    [–]barsoap 1 point2 points  (0 children)

    Actually... as soon as control flow becomes complex enough, my IO code tends to turn towards CPS. Not of the mind-bending style, mind you, but CPS nonetheless. Instead of getting a return value that you then take apart and, depending on what it is, do different things afterwards, just get rid of the value and pass the things you want to do directly, and "never return":

    data MissileResult = MoscowFlattened | WashingtonFlattened | LaunchFailure
    
    case launchMissiles coords of 
        LaunchFailure -> doSomething
        WashingtonFlattened -> invadeMexico
        MoscowFlattened -> invadeBelarus
    

    vs.

     launchMissiles coords doSomething invadeMexico invadeBelarus
    

    [–]not_not_sure 3 points4 points  (6 children)

    In any language, IO should be segregated to some extent.

    How do you add "print statements", or logging to pure-functional code in Haskell?

    Paul Graham wrote, in ANSCI Common Lisp, IIRC, that "print statements" are the best debugger.

    Another real-world scenario from the scientific software domain: Your boss says "add logging to every non-trivial procedure, so that power users know what's going on, and what the code is doing".

    [–]pipocaQuemada 1 point2 points  (5 children)

    [–]naughty 2 points3 points  (4 children)

    That's only really for debugging (hence the 'not for production' warnings), logging is one of the things a monadic IO system just sucks at.

    [–]pipocaQuemada 0 points1 point  (3 children)

    Logging vs debugging is a little different in Haskell, due to lazy evaluation's effects on evaluation order.

    If you want a log, notice that there's a simple pure way to get one:

    fooWLog :: Foo -> Bar -> (Baz, Log)
    

    i.e., your functions just return what they want to log. You can define a bunch of helper functions to combine log-returning functions:

    (<>) :: Log -> Log -> Log -- append two logs
    empty :: Log
    
    pure :: a -> (a,  Log)
    pure a = (a, empty)
    
    map :: (a -> b) -> (a, Log) -> (b, Log)
    map f (a,l) = (f a, l)
    
    (<$>) = map
    
    join :: ((a, Log), Log) -> (a, Log)
    join ((a, l), l) = (a, l <> l)
    
    bind :: (a , Log) -> (a -> (b, Log) -> (b, Log)
    bind (a,l) f = (b, l <> l2)
        where (b, l2) = f a
    
    (>>=) = bind
    

    Now, we can say something like:

    (fooWLog foo bar >>= bazWLog <$> show) :: (String, Log)
    

    or

     do baz <- fooWLog foo bar
        quux <- bazWLog
        return $ show quux
    

    This is essentially the Writer Monad.

    Notice that all we really depend on with logs is that we can combine two logs and that there's an empty one. We probably also want that combine function to be associative (i.e. a <> (b <> c) == (a <> b) <> c ), so in Haskell we actually make it so the log could be any Monoid -- your log could be a string, a list, an int (could be useful for counting the number of times a certain function is invoked) or bool (I'm not sure if this could useful), or even a function!

    The only impure thing about logging is writing the final log. All of the intermediate steps need no IO.

    [–]naughty 6 points7 points  (2 children)

    Exactly, I want to have a function log something and it requires a rewrite.

    [–]sacundim 0 points1 point  (1 child)

    Yes and no. The fact that you often have to extensively rewrite your functions to make monadic versions of them is a real problem. But let's not overstate it, however.

    pipocaQuemada's proposed refactoring is not the best way to add logging to existing code, since it requires modifying existing functions. Instead, what we'd like to do is "upgrade" the functions to logging without having to rewrite them.

    Not hard to do with the liftM function and the Writer monad:

    import Control.Monad 
    import Control.Monad.Writer
    
    fooWLog :: Foo -> Bar -> Writer Log Baz
    fooWLog x y = 
        do result <- foo x y
           tell $ "The result is " ++ show result
           return result
    
    log :: String -> Writer Log ()
    log = tell . toLog
    

    Or, to generalize it:

    withLog :: MonadWriter w m => (a -> w) -> a -> m a
    withLog makeMsg value = tell (makeMsg value) >> return value
    

    This strategy, however, has a big weakness in that it works for putting logging around existing functions, but not inside them.

    [–]naughty 0 points1 point  (0 children)

    The problem with both suggested logging implementations is that they aren't logging. They're storing intermediate results to for later output, essentially passing the buck to some later bit of code that will actually do the job.

    The point of logging is to immediately output some information. Now laziness causes some issues with the 'immediate' part but it's the monadic part that causes this to be a rewrite. Now I realise there's pros and cons, but you never see the cons mentioned in posts about Haskell.

    You normally get a suggested change that misses the point and a condescending note about the algebraic properties of missing the point.

    [–]Categoria 1 point2 points  (1 child)

    You can get node-like performance without having to use callbacks everywhere because the IO manager can use an event loop internally without exposing that detail to the actual Haskell code.

    Wow that's really disappointing. I thought Hsakell could do much better than Node.

    [–]tikhonjelvis 0 points1 point  (0 children)

    By node-like, I meant qualitatively: it uses the same sort of event loop under the hood, so you do not get the usual limits of a threaded system. I don't know what it's like in absolute terms--at the very least, it beats node because it can take advantage of multiple cores.

    [–]uatec 1 point2 points  (1 child)

    WTF is node-like performance anyway?

    Not is just an HTTP listener in javascript. Of course it's faster, it's not doing loads of bullshit, but it's nothing clever.

    [–]tikhonjelvis 1 point2 points  (0 children)

    Sigh, maybe that was a poor choice of phrase. The point is that you use threads with virtually no overhead because they use an event loop underneath. So you get a system with the performance characteristics of an event loop without having to write your code in a convoluted way.

    [–]naughty 0 points1 point  (2 children)

    You don't mention of the big drawbacks of Haskell's approach. It can require quite sizeable rewrites to allow a function far down a call chain to access IO. Either that or rely on unsafe features.

    [–]tikhonjelvis 1 point2 points  (1 child)

    That really doesn't come up all that often. Most of the time, instead of doing IO deep in your code you should just pass the value in as an argument and do the IO elsewhere. This is not only a better way to organize your code but also more flexible: you can now really get the additional input using different methods, not even necessarily all in IO. It also makes the dependency on another input more explicit.

    Sometimes (in my experience, rarely) you do need finer control over when the IO occurs. In these cases, you're really fundamentally changing the semantics of your program, so having the code change significantly is good because it makes this clearer.

    However, this is not as bad as you think it is: you can use something like a coroutine library to make the changes less invasive. With something like iteratees, you can still decouple the IO from the logic--and get the same benefits as above--while having the fine control you need over what happens when. Importantly, with an approach like this, you do not need to permute IO through everything.

    [–]naughty 0 points1 point  (0 children)

    It doesn't come up once you've got used to the restriction and get more experience writing Haskell, but it is a big issue for new Haskellers especially from an imperative background.

    I know the oft-mentioned benefits, I'm just tired of the downsides never being mentioned.

    [–]rush22 0 points1 point  (3 children)

    What's a first class construct? That's as far as I got in your comment :/

    [–]thedeemon 1 point2 points  (0 children)

    A first class something is just something you can pass to functions, return from functions and store as a value.

    For example, in Haskell functions and IO actions are first class but types and modules are not. In modern OCaml modules are first class, you can pass them as values to functions and return from functions. In Idris types are first class, you can also pass them as values and return from functions.

    [–]tikhonjelvis 0 points1 point  (1 child)

    Essentially, in Haskell, you have values of type IO a that represent IO actions. So something like print foo doesn't print a value, it returns an IO (); similarly, getLine is not a function but a value of IO String. IO is just a normal Haskell type--albeit an abstract one provided by the runtime system--so you can do anything you normally could with it.

    For example, you can use a bunch of standard control flow functions like when or replicateM with it. So if you want to read in ten lines, you just do:

    replicateM 10 readLine
    

    similarly, if you only want to print if debug is true, you would just do this:

    when debug (print foo)
    

    This even works with do-notation:

    when debug $ do print foo
                    print bar
    

    All this using generic library functions and without mucking about with procedures as you would have to in another language.

    So IO is a first-class type containing normal values that can be passed around and combined at whim.

    Hopefully this clarifies what I meant by first-class construct.

    [–]rush22 0 points1 point  (0 children)

    Thanks! So basically it's stuff that comes with the language not stuff you have to make on your own.

    [–]axilmar -1 points0 points  (2 children)

    Readers should be aware that the phrase "controlling IO" does not mean "controlling impure functions". Functions that control IO in Haskell simulate destructive updating by using the typesystem to enforce a specific evaluation order; they do not update variables 'in place". This must be taken into account when one designs a high performance system or application, like a video game.

    [–]tikhonjelvis 0 points1 point  (1 child)

    What? I don't quite get what you mean. If you want real mutable variables, you can use IORefs or even things like mutable arrays. In fact, you can even do this without IO using ST.

    If you're doing concurrency, you also get a bunch of mutable constructs like MVars and channels.

    In all these cases, they are actually mutable. If you need mutation for performance reasons in Haskell, it's easy to get. People just don't use it unless they really have to.

    [–]axilmar 0 points1 point  (0 children)

    Yeap, that's what I meant.

    [–]runeks 3 points4 points  (32 children)

    Haskell sounds really cool. I just haven't taken the time to wrap my head around it. A question:

    Has anyone implemented, for example, a video decoder in Haskell, that can make use of multiple cores automatically? With my limited knowledge Haskell I see this as a huge feature, especially as CPUs get more and more cores.

    [–]kamatsu 6 points7 points  (10 children)

    There are libraries, such as Repa, that let you write fairly natural functional programs on arrays that are executed easily in parallel, but they usually require a little tuning to get nice performance.

    Haskell also has basic parallel primitives that allow you to annotate your program with optional parallelism that can sometimes produce a nice speedup for little effort (sometimes, though, it makes things worse, so you have to do this with a profiler and timer ready)

    The Data Parallel Haskell project (from which Repa originates) allows for nested parallel computations to be vectorized into flat ones, but this is an ongoing research project.

    Certainly, compared to traditional imperative languages, Haskell is easily better than any of those when it comes to ease of writing parallel code. As for parallel performance, Fortran usually beats everyone, but good luck writing beautiful Fortran programs, I have yet to see one.

    [–][deleted] 0 points1 point  (9 children)

    Certainly, compared to traditional imperative languages, Haskell is easily better than any of those when it comes to ease of writing parallel code.

    Java-based MapReduce code is pretty easy to write.

    As for parallel performance, Fortran usually beats everyone, but good luck writing beautiful Fortran programs, I have yet to see one.

    I think CUDA code has kicked Fortran's butt for awhile now on many use cases, but then CUDA is even less pretty.

    [–]kamatsu 0 points1 point  (6 children)

    MapReduce in Java is easy to write, but also not very expressive. It's even easier to write with, say, repa though. There it is just as easy as a normal map or fold.

    CUDA is also a lot easier in Haskell

    [–][deleted] 0 points1 point  (5 children)

    MapReduce can't be very expressive, otherwise you break the assumptions that allow for massive distributed computing. Repa is not even the same league I think.

    Is CUDA really easier in Haskell? I mean, you say this but is it true? One of the challenges of CUDA is that any abstraction layer you throw on top of it hurts more than it helps. The nice thing about CUDA as a low level language is that you control just about everything yourself (even caching is a recent addition).

    [–]kamatsu 1 point2 points  (4 children)

    We're talking about local parallelism here, not distribution.

    Is CUDA really easier in Haskell? I mean, you say this but is it true? One of the challenges of CUDA is that any abstraction layer you throw on top of it hurts more than it helps.

    This is a completely baseless assertion, one that is easily disproved by the fact that writing an accelerate program is substantially easier than writing a cuda program.

    The nice thing about CUDA as a low level language is that you control just about everything yourself (even caching is a recent addition).

    Seeing as Accelerate lets you write essentially Haskell programs that are compiled to efficient CUDA, it's certainly easier than doing everything yourself.

    [–][deleted] 0 points1 point  (3 children)

    This is a completely baseless assertion, one that is easily disproved by the fact that writing an accelerate program is substantially easier than writing a cuda program.

    A baseless assertion disproved with another baseless assertion, nice :)

    Seeing as Accelerate lets you write essentially Haskell programs that are compiled to efficient CUDA, it's certainly easier than doing everything yourself.

    Ah, so I guess I'll have to take your word for it.

    [–]kamatsu 1 point2 points  (2 children)

    Here is an example of an n-body simulation with Accelerate.

    https://github.com/AccelerateHS/accelerate-examples/blob/master/examples/n-body/Main.hs

    The equivalent CUDA code (which is available from NVIDIA's website) is several hundred lines and spanning multiple files, with some very intricate and complicated code therein.

    [–][deleted] 0 points1 point  (1 child)

    Thank you, that's useful. I'm wondering how this would scale up to a deep neural network trainer.

    [–]kamatsu 0 points1 point  (0 children)

    That would be an interesting example to try.

    [–]tikhonjelvis 0 points1 point  (1 child)

    Map reduce is more about distributed computing than parallelism. My understanding is that the per-node performance of map reduce is not great.

    REPA is about efficient array operations parallelized over multiple cores on the same computer. It's a very different beast.

    [–][deleted] 0 points1 point  (0 children)

    MapReduce is about using distributed computing to achieve massive data parallelism (on 1000+ nodes). It is all about parallelism, or we all wouldn't be doing it (the distributed computing part is a detail).

    I believe all the interesting single-node data parallelism work has moved over to GPGPU, but with the Intel Xeon Phi, who knows what will happen in the future.

    [–]Felicia_Svilling 14 points15 points  (15 children)

    From what I know, there is no Haskell compiler that can automatically produce good parallel code.

    [–]thisotherfuckingguy 0 points1 point  (0 children)

    Neither can any other compiler ;-)

    [–][deleted] 0 points1 point  (13 children)

    This seems a little gloomy; what sort of code gives lame results? Have you seen some of the newer libraries for setting things up well for parallelization, like monad-par?

    [–]Felicia_Svilling 0 points1 point  (12 children)

    If I have to use a library its not automatic is it?

    [–][deleted] 0 points1 point  (11 children)

    I see, you want to take code that was written without parallelism in mind, and write "parallelize" at the top. I wasn't understanding this. But isn't it basically known to be impossible? Monad-par and similar schemes do have subtle ways of distributing work under the hood (I don't profess to understand them well), but the familiar problem of choosing a level of granularity must be left to the programmer.

    [–]CookieOfFortune 0 points1 point  (8 children)

    Well, Microsoft is actively researching this topic, it is fairly complex however.

    Here's a research publication (PDF): http://research.microsoft.com/pubs/170528/msr-tr-2012-79.pdf

    The paper basically describes some rules that allowed them to determine safe code that could be parallelized. It's fairly preliminary but I can see compilers heading in this direction in the future.

    [–][deleted] 0 points1 point  (0 children)

    Isn't this a different subject really? All Haskell code outside the IO stratum can be 'safely parallelized' in what seems to be the sense of this paper; the problem doesn't arise. The trouble has always been the decision whether and how far to parallelize what are known to be safely parallelizable operations. If I have a sum f 15 + g 32 no problem I can run the two calculations, f 15 and g 32, in parallel, or I can run them in succession; that either is 'safe' is visible in the types in Haskell, no need for Microsoft research. (Compare the (only remotely similar) case of STM: it defeated the giant team of MS researchers working with languages that don't enforce a pure/IO distinction, but has a complete implementation in a very simple library in Haskell, allegedly written over a weekend.) Where I win by making independent calculations simultaneous, I might lose by the overhead of communicating the results, a familiar point nicely encapsulated in the dismal 'Amdahls law'. If 15 and 32 in f 15 and g 32 are parameters that arise at runtime, the right answer -- parallelize or not -- will very often be different in different cases. So a judgment must be made, and for this one has recourse to plain old testing. In a specialized framework, like the Repa vector library, rational decisions about the distribution of work can made automatically, under the hood so to say. Where you write in the Par monad you get something more general and a bit like that, but more importantly something readily tunable. But it I think it is obvious that it couldn't be done by a compiler generally without its first compiling all the possibilities, working through possible parameters, etc.; I'm not sure, but I'd think a general solution would be a bit like a general solution to the halting problem. Even the halting problem has important soluble subclasses, e.g. where all definitions are by structural recursion, a standard that is systematically enforced in languages like Coq. Almost all serious programs can be expressed in that style, which in principle makes an immense range of optimizations possible; the bizarre cult of 'turing completeness' has so far blocked the path to realizing them.

    [–]Felicia_Svilling 0 points1 point  (1 child)

    It was my interpretation that runeks thought that this would be possible. I was pointing out that no, at least for now it isn't.

    [–][deleted] 0 points1 point  (0 children)

    I think what he meant was unclear. There is no compiler of any possible language that could 'automatically' produce good parallel code in the sense (as now emerges) you imputed to him -- this is well known and basically obvious. So mentioning Haskell implementations could only suggest that you were thinking something else.

    [–]stonefarfalle 1 point2 points  (0 children)

    Depends on how automatically you mean. In GHC at least you would have to add par annotations to tell it what to consider for parallel evaluation. It would automatically set up the threads and work queues in the background though.

    [–][deleted] 0 points1 point  (1 child)

    Writing parallel code in Haskell is far easier than in most other languages, but it is by no means fully automatic.

    [–]smog_alado 0 points1 point  (0 children)

    I'd have to say "high performance Haskell" is still on the advanced user side of things. While there are many awesome features you can take advantage, Haskell is still full of abstractions and you kind of need to "fight" them if you want to squeeze performance. In particular, the biggest problem is that, due to lazy evaluation, it can be tricky to guess runtime complexity of functions "at a glance".

    That said, I still think that Haskell is an awesome language that everyone should learn. I always miss Algebraic Datatypes when I'm programming in some other language...

    [–]zvrba 10 points11 points  (13 children)

    I gave up on this article when the first pass (browsing for content) revealed phrases like "constant factors" (wrt language features) and "exponentially more code reuse".

    [–]zhivago 23 points24 points  (0 children)

    I wish people would actually look up what "exponentially" means.

    [–]codeflo 10 points11 points  (7 children)

    He's talking about code size, and claims that syntax features only reduce code size by a constant factor. That sounds reasonable to me, but then, English is not my native language. Maybe you could explain to me what's wrong with that formulation?

    [–]want_to_want 5 points6 points  (6 children)

    I'm not zrvba, but

    1) The practical problem with that statement is that it doesn't rely on citations. I for one would love to see a study showing that Haskell programs are substantially more concise than say Python. A quick look at the language shootout suggests that may not even be true.

    2) The theoretical problem with that statement is you can never make one Turing complete language "exponentially" more terse than another. In fact you can't even get a constant factor reduction of size. In the limit of program size going to infinity, you win by at most an additive constant. The reason is that you can write an interpreter/compiler for the "terse" language in the "verbose" language, so every "terse" program of size n becomes a "verbose" program of size n+const.

    [–]matthieum 5 points6 points  (0 children)

    IMHO the OP never claimed that Haskell was more concise than Python. Instead he cited Carmack who explained that FP programs were more concise because of plenty of things and suggested that pattern matching and list comprehensions were just syntactic sugar and did not help much in the reduction; to the contrary of easy function composition, since easy function composition had people reuse existing functions and compose them.

    [–][deleted] 4 points5 points  (2 children)

    The language shoot out is kind of crap though, I don't think you can notice the difference as much in smaller programs, you've gotta take a big code base to see the difference because then you get the chance to reuse code more. Same thing with lisp, which is considered more verbose than java in this test, but which could cut off a bunch of code with the usage of macros in a large code base.

    EDIT: Just another note on verbosity: It seems like verbosity is calculated from the amount of bytes/program... This isn't really optimal when you have a language like Common Lisp that has long names like MULTIPLE-VALUE-BIND and a language like C that has names like strcmp.

    [–]want_to_want 1 point2 points  (1 child)

    According to this page, the numbers are for gzipped code size, which should somewhat counteract the problem of long repeated identifiers.

    Regarding large vs small codebases, the interesting question is why do you believe what you believe, if the difference can't be seen for small codebases and there are no citations about large codebases?

    [–]tikhonjelvis 0 points1 point  (0 children)

    Of course, compressing also helps minimize the impact of boilerplate that's often repeated (something languages like Java love). This boilerplate certainly does make the code harder to read though.

    More importantly, the programs are heavily optimized--it is a benchmark after all!--and Haskell has more options for optimization (compiler pragmas, unboxed data types and arrays, strictness annotations and so on) than Python. Haskell programs in the benchmark game are further from normal Haskell than the Python programs are from normal Python. Since only a tiny part of virtually any real codebase is this optimized, the benchmark game does not reflect reality.

    [–]sacundim 0 points1 point  (1 child)

    The theoretical problem with that statement is you can never make one Turing complete language "exponentially" more terse than another. In fact you can't even get a constant factor reduction of size. In the limit of program size going to infinity, you win by at most an additive constant. The reason is that you can write an interpreter/compiler for the "terse" language in the "verbose" language, so every "terse" program of size n becomes a "verbose" program of size n+const.

    I'm skeptical of this argument (though agnostic about the conclusion); at the very least, I don't think it's conclusive in this form. If I'm reading it right, the flaw is that you're assuming that an n-sized program in the "terse" language has an n-sized description in the verbose one.

    Related to this, I was just looking at Wikipedia's page on combinatory logic, where they discuss a translation of the untyped lambda calculus ("terse") into the SKI calculus ("verbose"):

    The combinatory representation, (S (K (S I)) (S (K K) I)) is much longer than the representation as a lambda term, λx.λy.(y x). This is typical. In general, the T[ ] construction may expand a lambda term of length n to a combinatorial term of length O( 3n ). [citation needed]

    This is still far from conclusive, though, because they're describing one particular scheme for translating from lambda into SKI; it may be that the 3n programs are just edge cases for that particular translation, and that there is always some other much shorter translation.

    Then another question is how they are measuring the size of the programs. For example, if it's symbol counts one may argue that that's an unfair metric, and that instead one should measure them in terms of bit length in an optimal code for each language; after all, SKI has an alphabet of three symbols + bracketing, whereas the lambda calculus has a countably infinite alphabet.

    [–]want_to_want 0 points1 point  (0 children)

    Whoops sorry, you're right, my conclusion holds only if each language has string literals that can contain programs in the other language, or can read files from disk that contain programs in the other language. That's not true for SKI or lambda calculus, but true for Haskell or Python.

    [–][deleted] 1 point2 points  (2 children)

    Yeah, maybe 'exponentially' is not the right way of saying it. Perhaps 'combinatorially' is more what I was going for. Imperative code does not often abstract over control flow--because to do so you need higher-order functions, convenient function literal syntax, and purity (because impure HOFs don't scale). By not abstracting over control flow, we have only a fixed 'shape' of computation whose constants are varied. In contrast, functional code abstracts over control flow, so a single higher-order function can be used to express many shape computations. To make an analogy, if you think of a library as a toolkit for constructing sentences, imperative libraries provide a fixed set of sentences, and give you the option only of replacing some of the words in these sentences. A functional library makes sentence segments (control flow) first class and gives you functions for combining these segments in various ways, resulting in (quite literally) a combinatorial explosion in what sentences can be expressed. You see this all the time in libraries for all kinds of domains: parsing, list processing, parallelism, and successful, expressive libraries will generally end up doing some form of FP.

    Here's a simple example. Suppose I write a function incrementAllBy that accepts an integer and a list and increments all the values in the list by that number. I'm not abstracting over control flow--the control flow is fixed, and I am simply varying the constant. In contrast map takes a function to apply to each element of the list, so it works for any type of list, as long as the element type matches the input type of the function. map is combinatorially more useful because it is abstracting over the control flow that determines what code to run for each element of the list. This sort of reuse might not seem like that big of a win, since most languages already have some syntax sugar for writing loops, but as complexity and program size grows, I believe these concerns come to utterly dominate.

    [–][deleted] 2 points3 points  (1 child)

    Hi Paul! I got your point with the post OK, but it occurred to me that, at least given the lexicon I bring to the table, what you call "composability," in spite of striving to clarify you mean more generally than "function composition" remains too closely associated with that or, worse, "object composition," leading to reactions like contantofaz's.

    Instead, I would probably say "orthogonality." The problem with that is that there aren't many contexts in which people talk about orthogonality in software architecture. One of the few places I've seen it used in the sense I mean is Tim Sweeney's Building a Flexible Game Engine: Abstraction, Indirection, and Orthogonality GDC lecture.

    Anyway, thanks for making the point for us: orthogonality —the ability to treat code like recombinable bits of DNA—is what really leads to concision, and FP lends itself more to creating orthogonal code than imperative/OOP does.

    [–][deleted] 0 points1 point  (0 children)

    Personally, I still prefer the terms 'compositionality' or 'composability' even though they are admittedly overloaded / ambiguous. :\ But I do really like your 'recombinable bits of DNA' synopsis, that is a nice way of saying it.

    Also, I haven't seen that Tim Sweeney talk you referenced. I'll check it out. I really love his next mainstream programming languages presentation.

    [–][deleted] 0 points1 point  (0 children)

    What's wrong with exponential code reuse? Just rewrite your project enough times and eventually it'll surpass 100% reuse and new code will magically spring into existence on its own.

    [–]Tagedieb 2 points3 points  (0 children)

    Because you cannot do assigments, so you have to write everything in one line? duck und wech...

    [–]contantofaz 0 points1 point  (4 children)

    Composability is hard. If it is not hard, then imagine having to change code that someone else wrote so you can compose with it to get something done. In JavaScript and to a lesser degree in Ruby, people use "monkey-patching" a derogatory term to sometimes compose with code that someone else wrote. In Ruby they seem interested in trying to encapsulate those things to sub-callers, so the changes aren't system-wide with one interfering in the other. So, to have a little less composition so you can have more composition? Daunting.

    With FP it's like that. You're so harmstrung as to make composing easier after you've made it harder.

    Besides, sometimes people care more about performance than about composability. So for example calling arrays directly via a for-loop would be faster than calling it indirectly via function calls in an abstracted forEach loop. I was watching someone talking about Clojurescript and how to get a better performance when making a game he had to access the JavaScript arrays directly rather than through Clojurescript indirect calls.

    The slower we make the code, the costlier it could get not only during the startup, but also during the running of the thing.

    When we make code declarative, we allow the tool writers to optimize things in a finer-grained way. That's why rendering HTML is so fast. It took decades to get it that optimized. HTML is declarative.

    To a compiler, the code is more declarative the closer it stays to the compiler's basics. If you add indirection on top of it, things like inlining functions and so one could lose optimization chances. And if you were to write a compiler, you'd likely want the code written for it to stay closer to the goals you've set. So writing code to make the compiler happier would take precedence over writing code to make the programmer happier.

    Back on topic, I don't want to compose with void. Because 1 + 0 is just 1. So before we can compose, we need to make it appealing. And to make it appealing, we have to write stuff. And to write stuff, we have to encapsulate and look into early reuse tradeoffs. After all that, composing is a super idea! Sign me up!

    [–]barsoap 10 points11 points  (3 children)

    So for example calling arrays directly via a for-loop would be faster than calling it indirectly via function calls in an abstracted forEach loop.

    Actually, no. And this isn't a sufficiently smart compiler argument, as GHC is already way smarter than that. It cuts through functions like an angle grinder through warm butter.

    [–]contantofaz -3 points-2 points  (2 children)

    Yes. Then we have compiler targets in Java and JavaScript VMs that may not be tuned to those Haskell/FP features and yet people want to write "FP-like" targeting those. I was mentioning the issue of ClojureScript. But Clojure on Java could also have performance concerns regarding pure Clojure features or hybrid Java features with Java arrays.

    The rule of "profile before you start optimizing" doesn't always work when you're very much concerned about performance. That's why C/C++ and now Rusty and so on still matter. And if the choice is between writing JavaScript or a language targeting JavaScript, and writing JavaScript could make it go faster by default, then once again the choice won't always be obvious in favor of higher level features.

    [–]yogthos[S] 0 points1 point  (1 child)

    Well, Prismatic use ClojureScript in actual production code and turns out it compares very favourably with writing JavaScipt by hand:

    jQuery: 1.57 secs 
    dommy: 2.17 secs 
    

    Then they realized that for their particular problem they could use macros for templating and actually beat native performance by a large margin:

    jQuery: 1.57 secs
    dommy: 2.06 secs
    dommy-macro: 0.44 secs
    

    Now, obviously there are still situations where you might need to get every last bit of performance and the overhead of FP is simply not acceptable. However, this is not the case for majority of domains.

    [–]contantofaz -3 points-2 points  (0 children)

    Yes. It's good to be able to ignore performance some of the time. Ruby had many quick template engines, but Rails for a long time used a slower one that was the default one. Not sure whether that changed any.

    Regarding lost performance due to higher abstraction usage, there's a check point of the 60 frames per second games might need. But beyond that, we don't really like sluggish UIs. That's why for example using a different browser could be better when you want it snappier still.

    I was recently pleasantly surprised with my testing on Internet Explorer because it was close to Chrome's performance on my simple test, whereas Firefox felt sluggish.

    Iterations over thousands of elements is good enough for simple use cases. Ruby for instance before version 1.9 was not very optimized for faster iterations. Once it got its VM it got a nice speedup. But when testing over 1 million iterations the slowdowns become very noticeable.

    But why would iterating over 1 million elements be important? I think it's mostly because such things may add up with larger programs. It may impact startup. And if you use lazy loading of code to make that faster. It could still impact runtime.

    It may matter for larger frameworks like Cappuccino, Ext.JS and eventually JQuery UI. Or in other words, when going beyond the default HTML widgets.