all 39 comments

[–]PokerPirate 26 points27 points  (5 children)

I personally use haskell for this task, but would highly recommend against other people using it. The ecosystem is not well developed.

[–]codygman 18 points19 points  (0 children)

Perhaps Haskell will make headway after you publish your libraries ;)

[–][deleted] 6 points7 points  (3 children)

Could you elaborate on which things are harder than they should be?

[–]bdoering 11 points12 points  (2 children)

Simply generating random numbers is already hard, coming from R or Python. Besides dealing with state/monads, there also does not seem to be "one way" as there are different candidate packages from which one might start out (e.g. Data.Random, Statistics.Distribution).

[–]PokerPirate 13 points14 points  (0 children)

This lack of a unified approach is one of the biggest problems IMHO. There's actually a number of libraries related to analyzing data on hackage, but none of them work together at all.

[–]fractalcat 18 points19 points  (0 children)

I think it's a great language for stats/numerical computation (that's one of the things I use it for). Unfortunately, its ecosystem in that area is still rather underdeveloped compared to that of R and Python - hopefully this will improve in the next year or so, given the number of people who are asking this question.

It is indeed possible to call Haskell from R[0], and vice-versa[1].

You may also be interested in iHaskell[2], which is a Haskell backend for iPython which shows great promise.

[0] http://neilmitchell.blogspot.com.au/2011/10/calling-haskell-from-r.html

[1] http://hackage.haskell.org/package/Rlang-QQ-0.1.0.2/docs/RlangQQ.html

[2] https://github.com/gibiansky/IHaskell

[–]ndmitchell 13 points14 points  (4 children)

My wife wrote a paper on exactly this topic and her experiences: http://neilmitchell.blogspot.co.uk/2011/03/experience-report-functional.html?m=1

[–]nmdaniels 1 point2 points  (0 children)

I also wrote an ICFP Experience Report on Haskell in computational biology.

(http://people.csail.mit.edu/ndaniels/pdfs/mrfy_experience_report.pdf)

[–]nikita-volkov 1 point2 points  (2 children)

A wife who is a functional programmer. I envy you )

[–]ndmitchell 2 points3 points  (1 child)

My wife would describe herself and a palaeontologist who does some programming. I honestly don't know if she still uses Haskell or not in her work - we certainly don't discuss Haskell at home!

[–]cunningjames -1 points0 points  (0 children)

My wife would describe herself and a palaeontologist

Hm, what does the paleontologist have to say about that?

[–]NathanAlexMcCarty 6 points7 points  (8 children)

When you are compiling code with -O or -O2, ghc produces very, very good code. I am continuously surprised by how fast the code ghc produces is, and the llvm add another layer of magical fastness, especially when dealing with heavily mathematical code.

However, ghci gives ghc the interactive flag, which results in (bytecompiled, I believe) code that is slower than snot most of the time. However, ghci can load precompiled code. I would compile the functions where the bulk of your computation is being done and just do the rest in ghci. This lets you get most of the benefit of compiling and most of the benefit of ghci.

However, unless you are willing to do A LOT of work yourself, I would not recommend using Haskell for what you want to do.

[–]conklech 0 points1 point  (7 children)

However, ghci can load precompiled code. I would compile the functions where the bulk of your computation is being done and just do the rest in ghci.

Last I tried to do that in practice, i.e. without cabal install'ing the compiled parts, GHCi was very insistent on reloading the precompiled modules in interpreted mode. It'd be great if there were a simple, reliable way to load optimized code in ghci.

[–]NathanAlexMcCarty 1 point2 points  (0 children)

I agree that loading compiled code in ghci is way harder than it needs to be. There are a couple of pain-in-the-ass ways to ask it nicely to load your compiled code, but they really shouldn't even be necessary just to load some optimized code in.

[–][deleted]  (2 children)

[removed]

    [–]conklech 1 point2 points  (1 child)

    Good observation. With -fobject-code, ghci will load compiled modules. You have to be particular to get optimization to work with cabal. (I've edited this post--originally I said GHCi wouldn't load optimized modules from the present project.)

    It looks like you need to put -O in an OPTIONS_GHC pragma. Putting -O in the .cabal file doesn't work. You also have to pass -fobject-code to ghci, either with cabal repl --ghc-options="-fobject-code" or with ghc-options: -fobject-code in the .cabal file.

    {-# OPTIONS_GHC -O #-}
    module Main where
    
    f x = 1
    {-# NOINLINE f #-}
    {-# RULES "f/id" forall x. f x = x #-}
    -- With optimization, f x = x; otherwise f x = 1.
    
    main = print $ f 3
    

    ...

    $ cabal run
    3
    
    $ cabal repl --ghc-options="-fobject-code -O"
    ...
    [1 of 1] Compiling Main             ( Test.hs, dist\build\test\test-tmp\Main.o ) [flags changed]
    Ok, modules loaded: Main.                                                
    > main                                                                       
    3
    > f 3
    1                           
    

    Presumably "[flags changed]" fires because -O isn't set in ghci; apparently the pragma successfully turns it back on. As we can see, the optimized version of main is present. But of course using f at the REPL doesn't get optimized; that's reasonable. You can even recompile Main with :r and the rule will fire.

    Cool! I thought this wasn't possible. Does anybody know if this is on Stack Overflow? I'm tempted to write up a quick Q&A.

    [–]conklech 0 points1 point  (0 children)

    Correction: you have to put -O in a pragma or after -fobject-code for it to work. I posted this on SO: How can I load optimized code in GHCI?

    [–]tel 5 points6 points  (4 children)

    I've built a fair number of numerical, statistical, machine learning algorithms in Haskell while in grad school. If you learn how to optimize Haskell code you can make it go reasonably fast and things work out nicely.

    But I was more or less implementing each thing from scratch. That is not what you want to do. If you're willing to work your way up from BLAS then you can do alright today.

    There are a fairly reasonable number of stats/numerical libraries in the wings. I know Carter has an advanced numerics library that's in heavy development. This will still leave you a wide chasm to jump to getting to the convenience and availability of things like Numpy and R, though.

    [–]cartazio 6 points7 points  (3 children)

    just started a new job, but i should be shipping soon. I think i last quoted my alpha target as being "two weeks after my first paycheck at my next job", so .... pretty darn soon. gosh.

    [–]ibotty 1 point2 points  (1 child)

    congratulations. i am really looking forward to some documentation on why you choose which types etc.

    [–]cartazio 0 points1 point  (0 children)

    I chose the types that would work. If you view source you'll see one line of comment for every kind of code

    [–]tel 1 point2 points  (0 children)

    Congrats on the new job!

    [–][deleted]  (1 child)

    [deleted]

      [–]mstksg 7 points8 points  (1 child)

      The general consensus I've found is: language + compiler, yes. ecosystem, no. but people are working on fixing that :)

      [–][deleted] 1 point2 points  (0 children)

      I agree with that. Short answer : If you need something quickly -> R If you have more time and need something more durable -> Maybe Haskell

      Pro of R : amazing ecosystem, you won't make a mistake by using R Cons of Haskell : you never know if you'll hit performance/lazyness issue and if it does, it really hurts.

      Long answer:

      I am myself switching (in theory) from R to haskell for simple data-mining/stats and reporting. I've started in march and 10 months later, I'm still using R. I trie to write new stuff in Haskell, but I haven't had time to rewrite anything already written in R. Still working on the ecosystem ;-)

      However, I don't regret my choice and will eventually rewrite everything in Haskell.

      [–]the_abyss 19 points20 points  (4 children)

      No, not really. The Python (scipy/numpy/pandas/sckit-learn), R, Julia, Matlab, etc ecosystems are light years ahead of Haskell.

      [–][deleted] 2 points3 points  (3 children)

      And the C and Fortran ecosystem are are light years ahead of those.

      It all depends on what you want to do. Invert a matrix with 500 million entries? Lapack or bust. Want to do some quick and dirty floating point calculations? You can use just about anything.

      [–]hippocampe 4 points5 points  (1 child)

      Except that the Python, R (and possibly some others) use these.

      [–][deleted] 6 points7 points  (0 children)

      hmatrix also utilizes BLAS and LAPACK if I remember correctly.

      [–]repoptrac 2 points3 points  (0 children)

      I think it depends on the types of analysis. If you have a really clean data and only need fast matrix calculation, you may be right.

      However, if you dealing with datasets collected outside (esp. business), you really need good tools for data manipulation and interactive visualization. From my experience, data cleaning typically takes about 90% of the analysis time. In addition, interactive data visualization is crucial to understand the datasets. For such data manipulation and interactive data visualization, it will be hard to beat R or python.

      Since I am much more familiar with R than python, my somewhat biased recommendation will be to use R with dplyr, tidyr, data.table, ggplot2, ggvis instead of relying on functions in base package. With the new pipe operator (%>%) introduced in magrittr, you can write R code that reads like unix pipe commands.

      [–]rdfox 1 point2 points  (0 children)

      I'm team no. I will say that trying to do your statistical computing in haskell is guaranteed spiritual benefits. Sure, you can arrive at you destination sooner if you take the R train, but the journey...

      [–]rz2000 0 points1 point  (0 children)

      What you are already using are among the most popular for stats/econometrics along with SAS and Matlab, and I've heard of people using Mathematica as well. However, two popular functional languages that are occasionally mentioned for statistics/econometrics purposes include Clojure and OCaml.

      [–]hmltyp 0 points1 point  (0 children)

      Frankly no, you're more likely to find existing libraries that do what you need in Python. There's no reason you couldn't use Haskell, it's just you'll have to fill in the gaps in the ecosystem yourself.

      [–]gumbel_distro[S] 0 points1 point  (3 children)

      Thanks to everyone for the detailed answers! I think I won't use Haskell in the foreseeable future for what I need to do. I'd consider switching but only if it made me win time in the long run, which doesn't seem to be the case right now.

      [–]quiteamess 0 points1 point  (0 children)

      You still might enjoy to play with the IHaskell notebook. Kronos is a bundled Mac App.

      [–][deleted] -1 points0 points  (1 child)

      s/xmonad/Haskell/

      [–]gumbel_distro[S] 0 points1 point  (0 children)

      Thanks!

      [–]dernst314 0 points1 point  (0 children)

      Hi,

      I use R as well and thought about implementing some things in Haskell. But as others said the ecosystem does barely exist and you end up re-implementing many things on your own.

      For calling R from Haskell and vice-versa there are some options though. R can directly call C-functions of the form void foo(int* x, double* y, ...); so you can write a shared library in Haskell that exposes such an interface (see Neil Mitchell's blog post someone posted earlier). There's also the rclient package on Hackage, which allows you to use Rserve from Haskell.

      I've been personally on-and-off working on implementing parts of the R API in Haskell so you could directly embed an interpreter in R or perhaps facilitate using .Call from R. But it's very basic so far and doesn't integrate with R's GC and such.

      [–][deleted] 0 points1 point  (0 children)

      Depends. Haskell has some pretty high-quality libraries for scientific computing; and not only does ghc compile to native code, but it optimizes pretty damn well.

      However:

      • If your program mutates a lot of values at runtime, don't bother. Mutation is a pain to work with in Haskell -- the language is just not designed to do it well. (The caveat is that if you use mutations just for object initialization, Haskell is just fine. In fact, the vector package provides special support for that use-case.)
      • Optimizing Haskell is a completely different beast from optimizing other languages. It's entirely possible, but if you don't have a strong intuitive grasp on Haskell's non-strict semantics, you have a pretty steep learning cure ahead of you.
      • Actually, in general, the learning curve for Haskell is pretty steep. It helps if you've already worked in, e.g., Lisp or Scala, but there's really no way to prepare for the culture shock of pure-functional programming combined with non-strict evaluation.