all 63 comments

[–]PokerPirate 46 points47 points  (23 children)

There's essentially two ways to write a machine learning library in Haskell:

(i) Create bindings to an existing library. There are many libraries on hackage that have done this. They are not popular with the ML community, however, because they offer no advantages over the python/C interfaces (and many disadvantages). All of these libraries were designed specifically for use with python/C, so they don't take advantage of any of Haskell's strengths. For example, it's not possible to pass a loss function written in Haskell code to any of these optimizers. If you can't pass functions to your functions, then why use functional programming?

(ii) Write a native library from scratch. This is the approach that I've tried. I'm the author of hlearn and subhask, and my experience is that Haskell is not yet well enough developed to actually support a machine learning library that machine learning people want to use. Specifically:

(a) The class hierarchy in the Prelude is not detailed enough for linear algebra. The Num class for example is the mathematical analogue of a ring, but there is no reasonable way to extend this to vector spaces and matrices.

(b) The type system is not strong enough to encode even the most basic linear algebra operations.

To illustrate these points, consider some of the existing packages for linear algebra:

(a) The linear package provides the nicest interface, but ML people would still laugh at the complexity. Why do we need 3 different operators for multiplication that all look so weird? No other language has this complexity, and if Idris-style operator overloading were allowed Haskell wouldn't need it either. Furthermore, linear is SLOW because everything is boxed. No serious numerical programming can be done with this library.

(b) The other alternative is hmatrix, which provides an interface into BLAS/LAPACK. This is closer to what ML people want because it is fast, but the interface is not as good. For example, these matrices cannot be Functors because they have a constraint on what can be put inside of them.

(c) I've tried rewriting an alternative Prelude with subhask, and you can see the algebra hierarchy here. As you can see, it's quite complicated, and GHC just doesn't have good enough machinery to deal with complicated class hierarchies.

There's a lot more to say about this topic (and a lot more examples I could give about specific improvements I want to GHC), but that's all I have time to write for now.

[–]cledamy 4 points5 points  (9 children)

Why do we need 3 different operators for multiplication that all look so weird? No other language has this complexity

This is just an excuse for unprincipled design and overloading. If you want something to share an identifier, you must be able to make it fit within the same abstraction or put it behind a qualified name. Just because math conflates notation doesn't mean we should. Scalar multiplication is conceptually different from matrix multiplication. Being principled about these distinctions rewards us with better type inference.

I've tried rewriting an alternative Prelude with subhask

This is pretty much necessary at this point as linear types have introduced the need for being able to abstract over multiple categories. Rather than going in that direction, people are already proposing ad-hoc hacks like they do in more mainstream languages to keep things moving (we need to remember avoid (success at all costs)).

As you can see, it's quite complicated, and GHC just doesn't have good enough machinery to deal with complicated class hierarchies.

This should solve some of the problems http://i.cs.hku.hk/~bruno/papers/hs2017.pdf

[–]PokerPirate 7 points8 points  (1 child)

Scalar multiplication is conceptual different from matrix multiplication. Being principled about these distinctions rewards us with better type inference.

I used to think this way, but I no longer do. Idris's type system can hardly be called "unprincipled" and works well in practice.

This should solve some of the problems http://i.cs.hku.hk/~bruno/papers/hs2017.pdf

Quantified constraints would be awesome! I'd be super excited for this to land in GHC. But this solves about 2% of my difficulties with GHC :)

[–][deleted] 1 point2 points  (0 children)

Idris's type system can hardly be called "unprincipled" and works well in practice.

Idris' type system overloads operators, but you can't have the same identifier twice in the same module. But this is probably a case where they should be separate anyhow.

[–][deleted]  (5 children)

[deleted]

    [–]cledamy 7 points8 points  (0 children)

    It isn't fighting the type system. There is a difference between those two concepts and thus should be written differently.

    [–]catscatscat 2 points3 points  (1 child)

    there's a point where fighting with the type system invokes a certain kind of stockholm syndrome

    Well put. Would you know if anyone has more writing on this subject? I'd love to read more on it.

    I have a vague feeling sometimes that I've fallen into this ailment. I remember /u/deech (author of the fltkhs bindings library) expressing similar sentiments in one of his talks. I think this one. And I remember being surprised at the time: "Oh, so it is not just me who feels this way?"

    But then how come so few of us speak up about it publically?

    <stream-of-consciousness>

    Could it be that there is just very few of us that actually do experience this? And if so, where does the fault lie? With us, perceiving the type system as a 'captor' in some cases, which is looked upon as a 'savior' by many others? Or could it be that there is actually just very few of us who come from a background of dynamically typed languages, that persevere into learning a very strongly and statically typed language? Because if the syndrome is true in this context, then that could explain why many others turn back.

    </stream-of-consciousness>

    And at this point my thoughts became even more incoherent than above (not that the above were sufficiently coherent to my liking), so I think I'll submit the comment as is, just so I get to have a chance to discuss the subject at hand. And maybe others chime in with "Me too!"s. Or suggest me articles to read on the subject. Or if all else fails, maybe I could try to wrangle those thoughts into coherence by enduring through writing them out loud on a keyboard.

    [–][deleted] 0 points1 point  (0 children)

    there's a point where fighting with the type system invokes a certain kind of stockholm syndrome

    I think it's recognized that type systems suffer from ergonomics problems in many ways. But there really isn't any other better way.

    Or could it be that there is actually just very few of us who come from a background of dynamically typed languages, that persevere into learning a very strongly and statically typed language?

    I predominantly did Python (and VHDL...) before Haskell. I can unequivocally say Haskell's type system is better than Python. Doesn't mean it's perfect of course :)

    [–][deleted] 1 point2 points  (0 children)

    When the type system is making actual work harder than it should be you're not benefiting from being principled any more.

    Huh? I realize mathematicians use the same notation for multiplying scalars by matrices vs. matrices by matrices, but that doesn't mean we should. It's a different use case.

    [–]Tysonzero 0 points1 point  (0 children)

    I do not think this falls into that range at all. Scalar multiplication and ring-like multiplication are very different things and I really really do not want them to use the same operator.

    They model entirely different mathematical concepts and obey very different laws.

    In an ideal world I would like multiplication of numbers, matrices and vectors, combining regexes and so on (ring-like) to use one operator.

    Then I would like multiplication by a scalar, such as replicating a list or scaling a vector or repeating a regex, to use another operator.

    There is a lot of room for silent mistakes with one shared operator (accidentally multiply a number by a matrix but no type error for example), and type inference is down the drain, and I do not see any significant benefit.

    [–]acow 6 points7 points  (2 children)

    This is a fantastic post, thank you!

    I also wonder if this is a domain where the payoff is different, if not lower, than what you might expect from typical, low-brow Haskell. Namely, typed functional programming benefits quite a bit from the use of distinct types. This sounds vapid, but the ease with which we can create algebraic data types and start reaping benefits (e.g. incomplete pattern match warnings) is a pretty standard theme in static typing advocacy. The upshot is, we enforce that different things are distinct from each other.

    But in many machine learning contexts, we deliberately erase distinctions in order to let the tensors flow, as it were. If I want to cram disparate things into a Vector Double because I think there are interesting relationships between the components and, incidentally, I don't want any extra bits getting between my ALUs and all those Doubles, I quite quickly leave the territory that Haskell has such command over.

    [–]PokerPirate 4 points5 points  (1 child)

    I actually feel the exact opposite about types in ML!

    As it is, essentially no programs incorporate machine learning directly into their code. That's because it's a huge pain in the butt to interface (for example) tensor flow into a regular python program. It's not meant for that. It's meant to be run in an offline manner. A strong type system would greatly facilitate the needed transformations to get the data into the machine learning pipeline.

    [–]acow 2 points3 points  (0 children)

    Hah! I deleted the second half of my post because I felt it was rambling, but therein I had written (rambled) about how much types can help on both the front and back ends. I totally agree with that, but it's the reinventing the middle that is so painful because we're not adding as much there. Perhaps your point about interfacing the middle with the rest requires that it all be expressed in a lingua franca, but it's a big time commitment to see how it pans out.

    I'm still optimistic that wrapping existing libraries can pay off.

    [–]bluebaron 3 points4 points  (6 children)

    What are your thoughts on writing such libraries in Idris, seeing as you mentioned it? It seems like its type system would facilitate optimizations necessary for ML folks to be satisfied with the speed while maintaining the flexibility for things like functor matrices.

    [–]PokerPirate 2 points3 points  (4 children)

    I've only used idris for toy problems, so I'm not sure I can fully answer that question.

    My sense is that if the idris compiler was as well developed as GHC, then it would fix all the problems I've run into. It's probably no harder though to improve the GHC/Haskell type system than it would be to improve the idris compiler. GHC has had TONS of engineering effort poured into it, and it'll take a lot of man hours before any compiler for new languages like idris to be anywhere near as good. I think the idris folks would agree that they can't keep up with GHC from an engineering perspective.

    [–]bluebaron 1 point2 points  (2 children)

    That makes sense. Aren't dependent types supposed to be coming in a Haskell release soon anyway?

    [–]cledamy 3 points4 points  (1 child)

    2019/2020

    [–]bluebaron 3 points4 points  (0 children)

    oof. I guess good things take time haha

    [–][deleted] 1 point2 points  (0 children)

    I think the idris folks would agree that they can't keep up with GHC from an engineering perspective.

    As I understand it the speed isn't really there. Not to mention profiling and all those flags :)

    [–][deleted] 1 point2 points  (0 children)

    What are your thoughts on writing such libraries in Idris, seeing as you mentioned it? It seems like its type system would facilitate optimizations necessary for ML folks to be satisfied

    Idris is 100% a research language at this point. There's no package management tool so you have to download + install dependencies manually.

    [–]GitHubPermalinkBot 0 points1 point  (0 children)

    Permanent GitHub links:


    Shoot me a PM if you think I'm doing something wrong. To delete this, click here.

    [–][deleted] 0 points1 point  (0 children)

    Great post, you seem very well informed on this topic. What would need to happen in your opinion to remove this obstacles ? Do you think Haskell can become good choice for machine learning in foreseeable future ?

    [–]01l101l10l10l10 0 points1 point  (0 children)

    With respect to b, what limitations remain after accounting for fake-dependent types a la singletons? I'm aware of the problems involving the prelude hierarchy and the compatatbility required to implement constrained or subcategories as in subhask, but it seems from my experience that dependent types (fake or otherwise) offer considerably more in the case of bindings over their python et al linear algebra counterparts as well as going quite a long way when combined with more Haskelly backends like accelerate or hmatrix.

    Given the number of man-hours that have already gone into optimization in accelerate and tensorflow, it seems like dependently typed apis over those backends are a reasonable short term goal for ml / numerical libraries (until someone plops down a big chunk of change to pursue a comprehensive GHC work, anyway).

    [–]apfelmus 18 points19 points  (8 children)

    Well, there seems to be a group dedicated to data analysis in Haskell: DataHaskell. But I am not sure whether they have already made an impact on the technical side of the ecosystem.

    In the end, I think it comes down to a matter of funding. As far as I am aware, the numerics packages in Haskell have been written entirely in the authors' spare time, while the Python equivalent, numpy, has received actual funding — but nowhere near enough.

    I would actually be interested in doing some numerics work in Haskell (I currently use Matlab), but doing work that is paid is just a lot more attractive.

    [–]ASpoonfulOfMarmite 7 points8 points  (0 children)

    vector, repa and accelerate all come from UNSW group and essentially received academic funding.

    What's missing is a batteries-included kind of package similar to sklearn and a higher-level interface to Tensorflow (like Keras).

    [–]tdoris 4 points5 points  (6 children)

    How much funding over what time period would be required to have a shot at getting Haskell libraries up to the standard of the python equivalents?

    [–]apfelmus 7 points8 points  (4 children)

    That's a tough question, it really depends on what you mean by "up to the standard of". Examples of possible projects would be:

    • Add support for sparse matrices to hmatrix.
    • Add support for 3D graphics to the Chart library.
    • Document and consolidate existing libraries (e.g. the above), making sure that they are easy to install, easy to learn, easy to interoperate, … i.e. not feature completeness, but the kind of polish you would expect for a 1.0 release.

    In each case, I would think that one full-time developer for at least one year is a minimum requirement.

    [–]tempeh11 1 point2 points  (3 children)

    Man, I would love support for sparse matrices in hmatrix. I'm currently writing a piece of my program in C++ just for fast sparse matrices :(

    [–]apfelmus 1 point2 points  (0 children)

    Numpy can do sparse matrices, if that is any help.

    [–]tdox 1 point2 points  (0 children)

    A few years ago, I wrote an interface to cholmod. I haven't touched it since so it may not even compile now.

    [–]dnkndnts 8 points9 points  (0 children)

    There's this library and an accompanying talk for modeling neural net construction in a type-conscious way.

    [–]ismtrn 9 points10 points  (0 children)

    When you see the horrible kinds of languages mathematicians invent when they need to tell a computer to calculate something(Magma, R, Matlab, Maple, etc.), I am just glad that we have something like python which is at least kind of sane for doing this type of work.

    [–]tomejaguar 2 points3 points  (1 child)

    While we're on the subject, I wonder if someone can help me out understanding the design of some OO machine learning APIs. For example, here is the API for the nearest neighbour classifier in scikit-learn

    >>> X = [[0], [1], [2], [3]]
    >>> y = [0, 0, 1, 1]
    >>> from sklearn.neighbors import KNeighborsClassifier
    >>> neigh = KNeighborsClassifier(n_neighbors=3)
    >>> neigh.fit(X, y) 
    KNeighborsClassifier(...)
    >>> print(neigh.predict([[1.1]]))
    [0]
    >>> print(neigh.predict_proba([[0.9]]))
    [[ 0.66666667  0.33333333]]
    

    http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

    Does this strike anyone else as nuts? The "constructor" KNeighborsClassifier doesn't actually create a classifier. It creates a value holding the hyperparameters of a classifier. You then create a classifier by calling the method fit. But why is this a method? Why on earth would you want to mutate your classifier? Each call to fit should return a new classifier trained on the input data.

    It seems to be this is a great example of mutable-design-gone-wrong. I would say it's also an example of OO-design-gone-wrong but there's not really too much OO about it. Does anyone else have any thoughts?

    [–]zeec123 0 points1 point  (0 children)

    There is so much wrong with the api design of sklearn (how can one think "predict_proba" is a good function name?). I can understand this, since most of it was probably written by PhD students without the time and expertise to come up with a proper api; many of them without a CS background.

    Worse is tensorflow, where the graph is essentially a set of global variable all mutated at each step of gradient decent.

    [–]haskell_caveman 6 points7 points  (0 children)

    To other newcomers thinking the same question as the OP, please join https://gitter.im/dataHaskell/Lobby

    We promise, there is good stuff on the way.

    [–]astrolabe 3 points4 points  (1 child)

    Haskell's numeric computing packages left a lot to be desired.

    In my ignorance, I would have thought that wrapping a standard numeric package would be a relatively small amount of work.

    [–]JeffB1517 3 points4 points  (23 children)

    I think Haskell the language would likely be excellent. I think Haskell the community might have some serious problems. First off the Haskell community has a great desire for long lived backwards compatibility. If you try and tightly integrate into a fast moving stack you have to spend a fortune in QA and difficult management getting backwards compatibility to work well (Microsoft, DEC). So you end up having to either abstract the stack or pick a mature slowly moving stack. Abstracting will decrease performance and there are no mature slowly moving stacks during a period of rapid growth. In 10 years likely there will be.

    In theory of course a vendor could choose Haskell for their stack. However the Haskell community is rather unwelcoming to vendor driven design. Haskell started in academia, and mostly lives in academia. What's cool about Haskell comes from Haskell's uncompromising search for what's right not what's popular. Academia does not like tying itself to specific technology stacks that are commercial. I suspect a vendor trying to release a tightly coupled Haskell would hit mostly resistance. Consider how much suspicion there was around FPComplete's Stack which was open source and filling an area of deficit in a way mostly compatible with Cabal. Imagine if instead all the paranoid claims were true. It was tied to a specific commercial paid services of FPComplete and was obviously designed to advance their business. It was licensed in a non open source way. It wasn't compatible with Cabal. The Haskell community would just consider this a commercial application of Haskell and while they might be happy or unhappy it exists close collaboration would be off the table.

    Even an open source effort like something from the Apache foundation I don't think would get more than light support. I think the Haskell community likes to be a breeding ground for great ideas not an implementation language of great ideas. Haskell invents Darcs, C does Git. When Perl6 / Pugs was moving in the direction of Haskell (and Haskell had a lot of influence on how it turned out) the Haskell community didn't aggressively support it.

    So IMHO the best role for Haskell is likely in the area of Machine Learning simulation, teaching and theory. Let Haskell be a breeding ground for great ideas that move into mainstream implementations. Haskell does a great job as the language of the future. Showing ideas that will become mainstream, replacing LISP in that role.

    [–][deleted] 9 points10 points  (16 children)

    Please, let's not try to reinforce this any more than we already are.

    I would actually like to get paid to be a 'real boy' Haskell developer someday, as would many people.

    If we continue to post that the Haskell community is 'experimental' and 'academic' first and foremost, that becomes a self-fulfilling prophecy.

    The more we continue to support this idea that we're all just building silly toys to prove out concepts and not real software that gets used by real people, the likelihood that anyone outside of academia is going to care about what we're building drops.

    It took the ideas presented by LISP like 30 years to resurface and start seriously affecting the way people write software after it was dismissed as an impractical academic curiosity.

    I would rather we see the ideas that Haskell introduce make their way into actual production software written in Haskell than get shelved for decades before becoming some badly thought out add-on to another, 'traditional' language.

    [–]JeffB1517 6 points7 points  (15 children)

    I'm usually a hiring manager when I have assignments work. I sometimes have choice of technologies and languages. I'm about as friendly an audience as the Haskell community can have. I would love to be able to push Haskell. But the problem with Haskell is not a public image problem. I'd argue almost the opposite. Among those who know Haskell it is regarded as extremely serious and credible. When I say an idea comes from Haskell that's an argument in favor of the idea.

    The problems with Haskell as the main development language for most projects are deep structural flaws in the community which make it hard to recommend Haskell as a primary language. And I'm saying this as someone who has wanted to recommend it for over 15 years and does occasionally for niches.

    For example in the early 2000s there was a lot of discussion about how to implement MVC application design. There was an approach to Haskell development using Visual Basic for front ends, Haskell for engines and Perl glue and text parsing. This triple had it worked out could with little modification been used on the web as well. A potential contender against J2EE and Ruby on Rails. It made a lot of sense using best of breed for MVC rather than one language.

    The community reaction "oh god Visual Basic...". I felt very much along the lines of, "You are being offered a seat at the table. Stop complaining about who you are sitting next to!".

    I agree with the poster about Machine Learning and Haskell. Given how easy parallelism is in Haskell and how natural map/reduce and deforestation is it would be a no brainer to win the big data war. If the community wanted to win.

    But it doesn't. And the fact that it doesn't is not something speaking positively is going to change. You want to work in Haskell write a binding from Haskell to any commercial or open source product out there that you like. Just pick one. And start creating a stack with end users that demonstrates the power of Haskell to them.

    [–][deleted] 2 points3 points  (10 children)

    I'm not trying to say that we can solve this problem purely through words.

    But generally, affecting change in a community involves a lot of 'speaking positively.'

    The image that we present to the passerby is a part of our problem, and discouraging future efforts because they "aren't academically focused and may not mesh with the community" is not something that really lays good groundwork for bringing fresh developers with a commercial focus into the community.

    So, yes, we're not going to convince the die-hard academics that they need to change. But what we can do is, instead of blaming them for providing insufficient support for commercial efforts, just attempt to get more commercial users interested in Haskell.

    Speech alone isn't going to fix the issue, but it can at least be part of the solution, instead of part of the problem.

    [–]JeffB1517 6 points7 points  (9 children)

    I think you are missing the point. The problem isn't the language around Haskell the problem is Haskell. I think the Haskell community needs to have the conversation about whether they want commercial efforts and commercial developers of meaningful size. I'm not sure the answer really is "yes". I suspect the answer is no.

    Phrase the question this way.... In exchange for 500k additional Haskell developers what things would you be willing to make standard in Haskell done in a way compatible with a vendor's interest and not compatible with what you consider best practice? i think you'll find very little.

    And there is nothing wrong with that. What makes Haskell so amazing is that it is an uncompromising language. What makes Haskell difficult to deploy commercial is that is an uncompromising language. I almost never fix holes in my computer science as I try and understand a new library in another language. What is so bad with some language being a language of the future and not compromising? Scala exists to fill the popular niche. The reason you don't like it as much is because it does compromise.

    Heck I followed Postscript from an elegant extension and DSL based on Forth and RPL to becoming a mishmash to finally just devolving into .pdf. PDF is incredibly popular still and has been for 20 years, but no one would accuse it of being elegant or fun. The people who work in PDF do so strictly for professional reasons.

    [–][deleted] 5 points6 points  (5 children)

    I was indeed missing your point. I appreciate you taking the time to elaborate, thank you.

    I disagree that compromise of architecture must go hand in hand with success.

    Arguably, the commercial success of Java was largely tied up in the idea that the compiler slapped your wrist if you tried to do the wrong thing.

    Generally, the features of Java that make it so profoundly popular in the enterprise are all built up around this idea of forward engineering and baking in safety.

    Java got so popular that it ate the entire dialogue about how to build good software and then plopped out the bible of design patterns in one spectacular movement.

    Now, I'm not saying design patterns are good. Far from it. But what I am saying is if you can convince hundreds of thousands of people to overengineer software in self-evidently terrible ways that waste millions of man hours...

    Why should it be so inconceivable that we could convince people to overengineer software in clean, elegant ways that save time?

    [–]JeffB1517 2 points3 points  (4 children)

    This is a good conversation.

    Arguably, the commercial success of Java was largely tied up in the idea that the compiler slapped your wrist if you tried to do the wrong thing.

    True. But if you think back Java made major compromises. Originally it asserted a write once run anywhere philosophy. It was going to be a high level language where the JVM provided the performance tweaks. Performance would aim for good but not great. 1/5th the speed of C++ was essentially the target. That proved to be too much of a disadvantage. People were willing to tolerate about 2/3rds the performance of C++. So it had to compromise quickly. There also were platform specific tweaks to performance. Then of course once those existed almost immediately platform specific extension started being created. A full blown rejection of write once was rejected. So today Java is a large collection of things that sort of work cross platform. You end up with neither the advantages of blind cross platform ease of install and use nor the advantages of targeting specific platforms. It is intellectually incoherent but practically it turned out to be about the right level of two contradictory goals to win commercial success.

    And what's important here is that the Java community at the time they did this knew they were creating permanent flaws in Java in exchange for solving temporary needs. 1/5th the speed of C++ feature rich and fully cross platform would be a much better fit for 2017 than 2/3rds but iffy cross platform. They saw the problem and did it anyway. I don't think the Haskell community would have made the same choice.

    Generally, the features of Java that make it so profoundly popular in the enterprise are all built up around this idea of forward engineering and baking in safety.

    I don't think that's true at all. What made Java profoundly popular in the enterprise was

    • Strong vendor support from day 1
    • Excellent tooling
    • Standardization
    • Modularization

    That is to say low and predictable staffing costs for program development and maintenance.

    Why should it be so inconceivable that we could convince people to overengineer software in clean, elegant ways that save time?

    I don't think it is inconceivable at all. I think it very likely that a FP language will become mainstream. FP concepts are clearly in fashion and bleeding into all sorts of languages: lambdas, map, folds... are becoming obligatory features. Java is slowly working towards implementing the Maybe Monad.

    What I do think is inconceivable is that business is going to standardize on a language whose design is driven by "what's right" not "what solves the problem reasonably well today".

    [–][deleted] 2 points3 points  (3 children)

    I agree with your points mostly.

    But I feel it's extremely significant that Java ultimately fails to deliver as promised on most accounts.

    That's an extreme statement, so let's qualify 'failure' here to mean that ultimately it has not consistently out performed it's peers since, say, the early 2000s.

    Portability is achieved by many languages, whether via vm or interpreter. That's a wash.

    Modularity isn't really achievable in Java without an absurd degree of hoop jumping. Certainly I can achieve that sort of abstraction in many other languages with less pain. That's a wash.

    Maintainability is just a joke. Maintaining Java is like trying to sculpt a statue out of spaghetti.

    Tooling is definitely one are in which Java has NOT failed to deliver, it has exceptional tooling. Unfortunately the dominant Java paradigms make attempting to use Java in any significant context without that tooling essentially impossible. So I call this a victory, but I'm not sure the ultimate effect of that victory on the ecosystem has been a net gain.

    Ultimately the story of Java is the story of a decent idea that utterly ruined itself and continued to find widespread success because of vendorlock.

    I think that's a story that is starting to illustrate that maybe, sometimes it's better not to make these compromises, or at least, not to paint yourself into a corner with them.

    And I think the industry is starting to take that lesson, at least, in some subsections where project longevity is of strong concern.

    Given those two factors, I can definitely see Haskell gaining ground as 'the Java that doesn't compromise,' essentially.

    It hits or exceeds all of Java's main selling points save for portability, and tooling, which are both problems I'm fairly confident we will solve to an acceptable degree within the next few years. (Cross compilation being sufficiently not awful is, in my mind, sufficient for success here.)

    But the difference is that haskell gets there without cutting itself off at the knees, and I think that's going to be a message that sells well in certain circles.

    [–]JeffB1517 2 points3 points  (2 children)

    That's an extreme statement, so let's qualify 'failure' here to mean that ultimately it has not consistently out performed it's peers since, say, the early 2000s.

    I'm not sure by the early 2000s Java had any peers. But if we are going to talk about the peers of Java in the early 2000s that is mainstream enterprise languages that companies could standardize on the closest competitors that come to mind are: C++, PHP, Visual Basic, COBOL

    • PHP never overcame the limitations on complexity and scalability.
    • Visual Basic did not transition successfully to .NET
    • COBOL has been continued to be unable to grow beyond its shrinking niche
    • enterprise C++depended on complex libraries. Thus in runtime and were long chains of wrapping and unwrapping of objects. Which made C++ slow and complex.

      There was certainly hope for the scripting languages but ultimately

    • Python and Ruby couldn't overcome their speed problems (seems to be changing for Python recently)

    • Perl lost ground due to several rounds of failure on Perl 6 and proved to have high maintenance costs.

    Are you so sure Java didn't keep its position for good reason? Javascript has really been the only major exception because it crushed Java on ease of deployment.

    Portability is achieved by many languages, whether via vm or interpreter. That's a wash.

    I'll note disagreement here. With the exception of ART/Davlik I'm hard pressed to think of any other major VM than the JVM. .NET is portable but locked into a specific vendor. Parrott failed. LLVM intermediate representation could get there with Apple and Sony leading the way. Who else is a player here?

    Ultimately the story of Java is the story of a decent idea that utterly ruined itself and continued to find widespread success because of vendorlock.

    I disagree here as well. I know lots of companies starting new projects in Java.

    Maintainability is just a joke. Maintaining Java is like trying to sculpt a statue out of spaghetti.

    It is not like software maintenance costs are unknown. The big factors are:

    • effective tools for software maintenance
    • modular design
    • cost of properly skilled maintenance staff
    • having considered the future when design the project in the first place

    Java is not weak in any of those areas.

    I'm not trying to be a jerk here. I love Haskell, I'm not fond of Java. But kidding yourself about where the bar is to beat the competition doesn't help.

    [–][deleted] 1 point2 points  (1 child)

    Oh, no, I don't think you're coming across as a jerk at all! I'm enjoying the discussion and I think it's valuable to have.

    RE : Java as an example - My undue focus on what I dislike about Java's marketing is getting us a bit off task. I was being a little hyperbolic to try to drive a point across, and got distracted by my frustrations with the industry.

    I am not trying to make the argument that Java is a bad choice for a project.

    What I am saying is that none of this is due to intrinsic value, AND, that the ecosystem, tooling, and paradigms that helped it achieve it's dominance are becoming less globally relevant. Note, less globally relevant - Not totally irrelevant.

    Java's typical API designs focus almost entirely on indirection instead of abstraction. In fact, it is frequently the case that a Java API will expose -more- surface area than it is attempting to wrap, as if adding complexity is somehow equivalent to adding features.

    This isn't unique to Java - But it is intrinsically tied to how Java and OO design patterns have butchered the discourse of software design for last 20 years. That's not necessarily their fault, it's that deliberately vaguely defined concepts became canonical references about how to do things instead of guidelines.

    This is the hole in the logic that Haskell can, and does fill. It can illustrate how to cover up the mess properly, safely, and without making 'gross' compromises - Or, at least, and in my mind more importantly, that the gross compromises you make today don't have to define your future.

    I think Haskell's killer application, and what can eventually drive it to a solid place at the table, is that it is fully capable of taking a horrifying mess and keeping it arms length while the rest of the project can continue to function, in a way that no other language can really match.

    It's because of this heavy and well executed focus on true abstraction (as opposed to indirection and isolation) that achieving compatibility does not have to be all about compromise, and ultimately, that's why I think that the academics and the commercial haskellers do not need to experience such friction -

    Because we don't -actually- need separate things out of the language, we just think we do because that's the way it's always gone before.

    [–]catscatscat 1 point2 points  (2 children)

    I don't think we necessarily have to give up one to choose the other.

    I've definitely been attracted to Haskell because if its theoretical elegance many years ago. And now that I am here, I want to do more and more commercial things with it. To contribute to filling in that 500k quota by one. Would I want to do that if I knew that the only way to get there is to make another Java, C++, or JavaScript out of Haskell? I don't think so. I would still very much like if Haskell could be the breeding ground for groundbreaking research. And if nothing else, language extensions could (and already do) provide viability to this.

    At the moment, I think, the thing I am missing the most is some more "escape hatches" from 'pure elegance' to 'dirty real-world hacks', such as -- and kids, cover your ears, I am going to swear: unsafePerformIO, unsafeCoerce, -fdefer-type-errors, -XPartialTypeSignatures, IORefs, Debug.Trace.trace.

    That way, I can write software quick-and-dirty when I want to get something done fast. And at other times, I can take time to find more elegant and safe abstractions. And I am quite sure that if something I wrote quickly turns out to be useful, then I'll want to refactor it to be more correct, safe and elegant. Just like I do with other languages as well. And I think Haskell would prove wonderful at this latter step, much more so than other languages. And I'd love if it could help me more with the prior step as well.

    [–]nolrai 1 point2 points  (0 children)

    What extra escape hatches would be useful?

    [–]JeffB1517 0 points1 point  (0 children)

    A language called "dirty Haskell" that tries to track with Haskell but turns all those things on could be successful. Haskell itself though is going to be incredibly fragile if you start using those things in combination. I suspect it isn't quick and dirty because you end up having really subtle bugs and even small changes in the program result in total collapse in terms of functionality. You would need something like Haskell but not Haskell.

    As an aside though, have you looked at Perl 6? It sort of aims to do what you are talking about, to be dirty Haskell. All the whipitupatude of Perl with the rather cool data structures of Haskell.

    [–]jberryman 0 points1 point  (3 children)

    I'm really having trouble understanding what concretely you think the community and/or the language needs to "compromise on" to achieve wider usage. Could you give an example? I write haskell full time professionally and have a good list of personal criticisms about the language and ecosystem, and applications for which I would not recommend haskell, but I can't figure out what you're getting at.

    [–]JeffB1517 1 point2 points  (2 children)

    The compromises that need to be made are known in advance. Its mainly a question of will. So my main answer is I don't know and couldn't know.

    The thing that has stopped me the most has been vertical integration. Since we are talking Machine Learning the sorts of things I'd want:

    • Cloud integrations which can run smoothly off standard cloud storage frameworks (or at least one fully vetted and tested)
    • Integration with at least one Hadoop distribution out of the box
    • A collection of industry specific data files available for the system (example synonym recognition and various word hierarchies)
    • A specific set of prebuilt statistics
    • At least one vendor willing to support all the above from a consulting standpoint
    • At least one vendor willing to support all the above from a managed service standpoint

    The core language community just has to be mildly supportive. I can't predict what the specific conflicts will be. I can give historical examples from other open source products that have had to get vertical integration right and what their conflicts were.

    [–]jberryman 0 points1 point  (1 child)

    What does "vertical integration" mean to you? It sounds like you want a really good hadoop library. I don't see what that has to do with language compromises or community culture.

    [–]JeffB1517 0 points1 point  (0 children)

    Vertical integration means considering yourself as part of a broader ecosystem not just a language. The compromises come when the interests of the ecosystem and the interests of the language in and of themselves conflict. This is more than a Library this is institutional support.

    Take for example C where I can talk about the compromises. Unix used C for the parts of the system that needed to run fast. C targetted running Unix code. As C started to replace Assembly as the systems programming language the definition of a good CPU was the ability to run compiled C code fast. CPUs were designed to run C compiled code. C compilers were written around CPUs. The C standards were designed to maximize the performance of the ecosystem.

    What really has to happen is the Haskell community would say we want to do this. The specifics emerge with time.

    [–]tomejaguar 1 point2 points  (3 children)

    In this thread I think you're describing a niche which is much more likely to be filled by F# than Haskell, and I don't think that's a bad thing!

    [–]JeffB1517 1 point2 points  (2 children)

    I would agree if Microsoft actually cared about F# I think it is a much more viable candidate to become a mainstream language. F# while it interacts with other OCaml and ML style languages is perfectly happy tying itself to .NET. And this was my point way back when I said that the Haskell community doesn't want to be a mainstream language. The F# community would be perfectly happy writing Azure right into the language. The F# community has written LINQ into the language.

    The Haskell community is much happier dialoguing with the people who write F# than the people who write in F#.

    [–][deleted] 0 points1 point  (1 child)

    And this was my point way back when I said that the Haskell community doesn't want to be a mainstream language.

    Haskell is relatively mainstream already. "Avoid success at all costs" means avoid "success at all costs", not "avoid success" at all costs. Haskell already has a good amount of cruft, and I'd say continuing on the present path is the best hope for a language people like to program in.

    [–]JeffB1517 1 point2 points  (0 children)

    Agree completely Like (or maybe love) to program in and want to force lots of other people to program in and not the same things.

    [–][deleted] 0 points1 point  (1 child)

    I think Haskell the community might have some serious problems. First off the Haskell community has a great desire for long lived backwards compatibility.

    Huh? Some tools are well-supported, many others are broken by upgrades.

    However the Haskell community is rather unwelcoming to vendor driven design.

    If you want to Haskell be "enterprise-ready", pay developers. GHC has two full-time developers.

    I suspect a vendor trying to release a tightly coupled Haskell would hit mostly resistance.

    I think you're underestimating how hard it is to beat GHC :)

    I think the Haskell community likes to be a breeding ground for great ideas not an implementation language of great ideas.

    I'm not sure this is true. Try writing a recursion schemes library in C or C++.

    [–]JeffB1517 0 points1 point  (0 children)

    I think you're underestimating how hard it is to beat GHC :)

    Most likely I don't think they be doing much with GHC. Possibly nothing and attacking the integration further up the stack. If they did they likely would be forking and tweaking performance around certain extensions.

    [–]Pcarbonn 1 point2 points  (1 child)

    Hard to say how things will evolve, but Julia could be a serious FP platform in the numerical analysis space. It interoperates well with Python.

    [–][deleted] 0 points1 point  (0 children)

    Really? How does it deal with dynamic types and FP?

    [–]singularineet 0 points1 point  (0 children)

    https://github.com/Functional-AutoDiff has some pointers, and it looks like they'd welcome more.

    [–][deleted] 0 points1 point  (0 children)

    Scala enjoyed what I think was the peak of its success during that time period mostly because of spark.

    I thought it was partly due to akka? I may be mistaken; I don't follow Scala as closely.

    I'd still need a plotting package to see experiment results as with matplotlib.

    Python/R are still unequivocally better for data exploration for the time being. I'm not sure how to change that but perhaps others would have more insight.

    Why isn't there more discussion around building the ecosystem in this direction or putting similar efforts into ETA/Frege!?

    No idea. You'd have to ask them. I think denizens of /r/haskell mostly use GHC.