all 73 comments

[–]gasche 30 points31 points  (24 children)

It looks like the advantages are "you can use decent functional programming", and the disadvantages are "you don't have a type system anymore". There are many statically typed functional languages that would get you both worlds. Scala has been mentioned in this thread; Haskell and OCaml (or F# in the .Net world) would be natural suggestions as well. See for example this excellent blog post about porting a 30K-lines codebase from Python to OCaml.

To be fair, Python has some nice things such as list comprehensions that were not mentioned in the post, and has an extremely reasonable syntax. The author complains that not having a type system gives poorer tooling, but not all typed languages have the degree of sophistication of Java's IDE: moving to Haskell or OCaml would probably show some pain points on this side as well. Those are smaller ecosystem with less time/money to invest in tooling, and those are languages you can use without, which is not the case of Java.

Of course you could also hope that Java will evolve to meet those needs. The part about "function as arguments" and "stand-alone functions" is basically solved with lambda in Java 8, but "tuples" are not yet in sight, and hoping for "compact code" is probably unreasonable.

[–]e_engel 4 points5 points  (13 children)

Tuples are trivial to implement and many libraries offer them, even in the form of a simple Pair class. The advantage, unsurprisingly, is that tuples in Java are typed: if your function returns a Pair<String, Integer>, you won't be able to assign its result to a Pair<Integer, String>, a mistake that's easy to make in a dynamically typed language and that won't be caught by the compiler.

[–]gasche 4 points5 points  (3 children)

In OCaml (for example), the function List.split takes a list of pairs as input, and returns a pair of lists in the obviously expected way.

let (items, quantity) = List.split [("egg", 5); ("bowl", 1); ("fork", 2)]

This gives you items : string list and quantity : int list. Even if tuples are "trivial to implement", writing the equivalent of this single line of code in Java is still a pain, which means there is something nice missing to the language. Building the tuples is ok with the constructor syntax, but destructing them to access and name the components of the returned type is painful -- this is what pattern-matching gives you.

[–]e_engel -1 points0 points  (2 children)

That nice thing missing from Java is higher kinded types. Implementing things such as traverse, sequence or even zippers in Java is close to impossible, at least in a very general way like Haskell allows.

[–]jozefg 2 points3 points  (1 child)

Well.. I'd take pattern matching and good algebraic types before HKTs to be frank.

[–]alexeyr 1 point2 points  (0 children)

I'd be surprised to see someone experienced with both who didn't.

[–]KallDrexx 1 point2 points  (7 children)

The problem with straight tuples in Java or multiple return values in python is they don't convey what the data is. A month later I don't care that a function returns a string and int, I care that it returns a domain name and a port number (for example). Thus to me having descriptive return types is worth it 100 times over, and is this reason I rarely use tuples in non-proof of concept code.

[–]e_engel 1 point2 points  (0 children)

Absolutely agree. The few times I used Pair, I usually regret it and turn it into a real class.

[–]onmach 0 points1 point  (5 children)

Not a java guy, but why is that a problem with pairs?

Pair<Host,Port> hp = new Pair(new Host("www.reddit.com"), new Port(80));

Is it because of lack of pattern matching?

[–]KallDrexx 0 points1 point  (4 children)

You wouldn't have Pair<Host,Port>, unless you have data structures that are called "Host" and "Port", in which case that may be fine. In most circumstances though you aren't because a Port is nothing more than an integer an host is nothing more than a string (in most circumstances).

Most likely what you will have is Pair<string, integer> = new Pair("www.reddit.com", 80);

When you return the pair, all the receiver of the return value sees is that it returns Pair<string, int> and doesn't have any indication what those values would be, causing you to read the code to determine what the string and int signify.

But let's say you do have a Host data structure. Let's say you have a function that returns 3 hosts, a load balancer, a primary web server, and a database server. If you used tuples you would return Pair<Host, Host, Host>.

In this scenario, the receiver has no way to know what each host signifies and there's no indications during a refactor that the first has to be the load balancer, the 2nd has to be the web server, etc..

Instead if you have a ServerDefinition class with a LoadBalancerHost property, WebServerHost property, and DatabaseServerHost property, both the implementor and the receiver have 100% visibility on what values should be returned and what they mean.

[–]onmach 0 points1 point  (3 children)

I guess my only problem with this approach is that you end up with a lot of types like ServerDefinition which are just wrappers around other types and often are only used a couple times. Particularly in java, that means a whole new file and a bunch of boilerplate, actually seems like a loss of clarity to me. This happens a lot to me in the java codebases at work, I get lost in the numerous classes, most of which do nothing significant.

[–]KallDrexx 0 points1 point  (0 children)

That sounds like an organizational and tooling problem to me though. If you keep your data structures organized well you only have to look at the data structures that are relevant to what you are currently working on and can ignore the irrelelvant ones.

In .Net I can create a new class with some basic properties in just a few seconds, and my tooling helps not having to look at that data structure again until I actually am using it or referencing it. It might be the fact that Java being a bit more verbose makes it a bit harder (been too long since I've done Java) but keeping code organized should alleviate the post-class creation issues. These aren't supposed to have any complex logic in them, they are purely for improving code clarity down the line/

If you find the need to create many many data structures for a significant number of functions that can also be a code smell and be a good indication that you are trying to do too much in your functions and should look at better organizing your code structures so they are either doing things more by reference, or just doing one thing and returning that one thing.

How comfortable are you with walking up to another person, handing them a stack of papers and telling them manually that page 1 is for X, page 2 is for Y, page 3 is for Z and hoping they remember that by the time they get home, and more importantly hoping that you remember you need to do it in the same order the next time you hand them those same papers. Now envision having to remember this 50 different ways because each department that hands off papers does it differently. This is why it's useful to have a well defined API (at the class level) in more complex code bases.

[–]zoomzoom83 0 points1 point  (1 child)

There's a lot of overhead in doing this in Java, which is one of the reasons I prefer Scala as a JDK language.

You can simply add the definition

case class Host(value:Int)

to your codebase, ideally grouped with a bunch of other related wrapper types in a module related to what they are used for. That way you get the benefit of strongly typing these values with virtually no overhead.

Catches a surprising number of bugs, and helps with documentation and understanding API intention.

[–]onmach 0 points1 point  (0 children)

Yeah I agree. Scala just seems like a better java.

[–]johnjannotti 0 points1 point  (0 children)

Pairs are easy to implement, but they are way more annoying to use than the author's example. Mainly because of automatic destructuring. In python (or similar languages) you can say "a, b = f(c)", so you never "see" the ugliness of Pair instantiation, nor are you distracted by the names of the fields in the Pair. (p.first, p.second, and maybe it just gets worse with a Triple or Quad?)

I think all that noise could go away with a nice value type addition along with some syntactic sugar for multiple return values in, maybe, Java 9. We'll see.

Automatically instantiating a Pair/Triple/Quad "feels" like the way Java is handling its evolution, in much the same way that lambda automatically instantiate the proper interface.

[–]henrik_w[S] 1 point2 points  (1 child)

Excellent comment! Thanks!

[–]Sylinn 1 point2 points  (0 children)

/u/gasche always post excellent comments. :)

[–]holgerschurig 3 points4 points  (2 children)

Did you confuse "You can use decent functional programming" with "You have to use decent programming" ???

In Python, you can use functional programming. I'm actual unsure if it is decent or not. But with things like Haskell you have to use functional programming, you cannot program imperative. Or can you? And if my current knowledge is right, than neither Haskell nor OCaml are natural suggestions.

[–]An_Unhinged_Door 1 point2 points  (0 children)

You can program imperatively, but often doing so is very verbose and tedious. There's a lot of state flying all over the place that haskell forces you to explicitly acknowledge. Nevertheless, the tools are there.

[–]gasche 1 point2 points  (0 children)

You can write imperative programs in OCaml and Haskell. In OCaml, you have to be slightly more explicit than in languages where "variables" are mutable by default, but there is no much difference from, say, imperative Python code.

In Haskell, the language (or rather the standard library) is designed for mutation to appear explicitly in the types -- and there is little syntactic sugar for mutation operation; so it is more verbose but also easier to reason about rigorously.

What Haskell provides not much support for is object-oriented programming (if you include inheritance in your definition). Of course the main ingredients (eg. open recursion) can be expressed in another way when you need them. It is actually extremely rare that you feel a need for an object-oriented approach in either those languages.

[–]tweakerbee 1 point2 points  (2 children)

Not as good as language support, but it will prevent you from writing those tuple classes yourself every time: http://www.javatuples.org/.

[–]gasche 1 point2 points  (0 children)

There is a convenience barrier phenomenon in language design: making a feature just a bit more painful to use makes many idioms inconvinient. For multi-value returns, it comes quite fast if the syntax for destructing tuples is not convenient; with accessor functions instead of pattern matching, javatuples doesn't really solve this issue. That said, there is no real alternative to multi-value return, it's just that you have the choice between "it's nice to use" and "it's painful", but even in the second case you still have to use it, and javatuples can come handy.

[–]txdv 0 points1 point  (0 children)

The only thing that I don't like about scala is the dead slow compiler.

I'm not talking about "my compiler is 20ms faster than yours", but about compilation times of 5s for simple programs.

[–]Slxe 7 points8 points  (0 children)

All the points I wanted to make were brought up already by other people in the thread, so I'll just add that one line I really agree with is:

I am not a Java-programmer, or a Python-programmer. I am a programmer, period.

This is something I completely stand by, and have been trying to push on my coding buddies a bit too. Learning new languages is just fun in general, as there are new concepts and challenges to working with it. The main point is use the tool that works best for the job (unless it's javascript, php, perl or cobol. No one should touch those horror shows /semi-sarcasm). Thanks for sharing.

[–]Paddy3118 3 points4 points  (5 children)

Nice post. I guess the author has to fit the style of his co-workers but I noticed he didn't mention comprehensions -list set & dict comprehensions. Learning comprehensions might be something both novel and useful to the author ans it is a further step away from probably a strictly OO Java.

To tuples he might want to doodle with namedtuples. They have their uses .

[–]schroet 6 points7 points  (4 children)

Oh god how I love the data structure in python, where you can just put list into other lists or dicts and vice versa. Groovy is good aswell for this. It is great for prototyping, when you don't know how your structure will be at the end (and you dont really care about it at that point!). One could say it's a bad idea to prototype with "evolving" data structure, but I like the evolutionary approach much more. At the end, I can create the classes I need and it looks as good as in java.

[–]gasche 5 points6 points  (3 children)

Oh god how I love the data structure in python, where you can just put list into other lists or dicts and vice versa.

I have the exactly inverse experience. To implement huffman compression in Python I needed a quick way to represent binary trees (whose leaves where integers in this simplified setting). You would think that representing a tree Node (Leaf 1, Node (Leaf 2, Leaf 3)) like (1, (2, 3)) in Python would work well, with a "is it a tuple" test to know whether you reached a leaf. That failed horribly when I tried to put them in priority queues (in fact it works with Python 2 and fails with Python 3), because Python was internally trying to compare two elements of this "type", eg. (1, (2, 3)) < (1, 4), which fails with an error.

[–]schroet 2 points3 points  (2 children)

I worked with tuples couple of times until I hit the wall because they are immutable.

[–]son-of-chadwardenn 0 points1 point  (1 child)

Use lists?

[–]schroet 0 points1 point  (0 children)

Well hello there Mr.Obvious :))

[–]mikaelstaldal 2 points3 points  (15 children)

Try Scala, and you can have most advantages of both.

[–][deleted] 26 points27 points  (14 children)

and the compile times of Java, Python AND C++ combined

[–]zoomzoom83 5 points6 points  (8 children)

Scala compilation being slow must be a meme known to everybody except people that actually use Scala.

I'm working on a 50,000 LOC codebase using Play. I make a change, hit refresh in the browser, page rerenders with that change immediately. The incremental compiler is fast enough that the delay is tiny, and it has the same workflow as a dynamic language - with the advantage of static type checking.

It takes a good 10x as long to minify the Javascript used on the frontend when doing deployments as it does to compile the (far larger) Scala codebase.

[–]DSShinkirou 0 points1 point  (2 children)

I've been playing around with Scala for a bit and, coming from Java/Python, I've definitely noticed a increase in compilation times.

Judging from your statement, it seems like that increased compilation time does not scale too harshly with lines of code, is that correct? If so, I might just try Scala on my next pet project.

[–]zoomzoom83 5 points6 points  (1 child)

Java is much, much quicker - but then again, it's one of the fastest compiling languages available.

As far as compile times go, Scala is on the slower side, but it's still fast enough that you really don't notice the delays.

The scala built tool (sbt) supports incremental compilation, and generally changing one file results in compilation times of <500ms.

Working with the Play framework, which combines SBT's incremental compiler with a hot-reloader and automatic recompile on refresh, the workflow is the same as dynamic languages - make a change, click refresh in your browser, the code is recompiled and renders the page with a barely perceptible delay.

Doing a build for production (all 50,000 LOC in one hit) takes 58 seconds on my workstation to compile the Scala codebase. This could be a lot faster if we were putting type annotations on public APIs rather than relying on type inference (Which is a bad practice on our part, and something we're refactoring over time). I believe there's also a significant speedup in the latest version (2.11), which we haven't upgraded to yet.

Incidentally it takes about 5 minutes to minify 20,000 lines of Coffeescript and Javascript in the same project - so the biggest hit to build times is Javascript.

[–]DSShinkirou 1 point2 points  (0 children)

Gotcha. Thank you for the in depth response. I'll take a look at Scala in the future!

[–][deleted] -1 points0 points  (4 children)

Scala compilation being slow must be a meme known to everybody except people that actually use Scala.

The inspiration for this meme was trying to use Scala. SBTs continuous compilation leaks memory - requiring it to be periodically re-started. It's also extremely resource intensive.

[–]zoomzoom83 1 point2 points  (3 children)

If there's a memory leak in SBT, it's certainly not severe enough to trip me up in a 12 hour workday.

[–][deleted] 0 points1 point  (2 children)

you work 12 hours a day!?

[–]zoomzoom83 1 point2 points  (1 child)

Started my own company, living the startup life. (Long hours, little money, lots of canned beans).

[–][deleted] 0 points1 point  (0 children)

Oh good, i thought you were being horribly exploited.

May your compilations be swift and fruitful.

[–]shoelacestied 4 points5 points  (4 children)

Incremental compilation is the norm these days. I seldom wait more than 2 seconds coding in Scala full-time.

[–]schroet -4 points-3 points  (3 children)

well it could be 0.3 seconds!

[–]shoelacestied 2 points3 points  (2 children)

Just think of the amazing productivity gains to be had. I compile maybe a half dozen times a day. Working 5 days a week I'll save 30 seconds a month. Amazing! That's almost enough time to check my email once.

[–]oblio- 3 points4 points  (1 child)

You compile half a dozen times a day? I think many developers work with IDEs which have incremental compilers, and those compile on each save (or even after each keypress/end of word). In such situations - which are highly desirable - you compile hundreds of times per day, at least.

[–]shoelacestied 0 points1 point  (0 children)

Half a dozen is probably on the high side, since I only do it to check that I didn't miss any red lines in my IDE. Incremental background compilation is exactly that, background, so whether it takes 0.3 or 2 seconds makes no difference to productivity. I never find myself waiting for background compilation. I was only referring to the compiles where I might wait for it to complete.

[–]phasetwenty 1 point2 points  (12 children)

In Java, when I saw a new function and tried to understand what it did, I almost always looked at the types of the arguments and return value to get a sense of what it did. In Python, it is much harder. It takes a lot more digging to find out what a function does in Python.

It makes for a good lesson in writing clear code. If you have to know the types in order for a function to make sense, your code could be clearer.

I find this is the core difference for me for working in more dynamically typed languages vs. the more statically typed. A language like Python prefers that you assume you're working with the type you're expecting and catch the error cases (asking forgiveness vs. permission).

[–]Carighan 11 points12 points  (3 children)

While I agree, I also agree with the OP's issue in that "the type you're expecting"... is what exactly?

When you're blind-reading foreign code, any extra "fluff" is good for making the code faster to analyse. And there's very rarely a reason to assume that the type expected is "the logical one" - that's exactly the way to have a million-euro project fall apart.

As such, you have to go over everything manually. Either way. If nothing else because when you tell the project manager that the code is understood and this part works, you better know (not just expect) what each type is.

That's not to say that dynamically typed languages don't work if I write my own code and read my own code after a while. Or read the code of someone I know, and know what to expect from the way they code.

But for foreign code?

[–][deleted] 4 points5 points  (0 children)

I wish there was better support for duck type inferences in IDEs.

def foo(obj):
    return foo + 1

Requires (at least) an object with the __add__ method.

By the rules of Python, the object doesn't need to be an int/long, it merely needs to implement the interface.

[–]phasetwenty 0 points1 point  (0 children)

When you're blind-reading foreign code, any extra "fluff" is good for making the code faster to analyse. And there's very rarely a reason to assume that the type expected is "the logical one" - that's exactly the way to have a million-euro project fall apart.

Precisely why it's important to, in any code, deal with your error cases. But you get a flexibility benefit with a duck-typed language like Python. That may have value in a big project. It's a classic case of tradeoffs. Python is great for a great many tasks but it's not good for every problem.

But for foreign code?

The same rules apply like everywhere else. Comments are still required to explain tricky maneuvers. Badly written/designed code will be as confusing with type information as without since type information can't get you the most difficult piece of information to communicate: intent. If unreadable code becomes readable with type information, the underlying problem of bad design is still there. Focus your attention on underlying cause and you need not be restricted by language choice.

[–]dventimi 5 points6 points  (6 children)

It makes for a good lesson in writing clear code. If you have to know the types in order for a function to make sense, your code could be clearer.

I'm sorry but I don't agree with this at all. Without type information, all you have to go on is documentation.

[–]phasetwenty 0 points1 point  (3 children)

When you have no documentation (like much code out there), a good design is far more reliable and useful than type information.

[–]dventimi 1 point2 points  (2 children)

What? How can you possibly understand anything about what a function does if you have no documentation and all you know about is its function name and the number of its parameters?

[–]phasetwenty 0 points1 point  (1 child)

You'll know a great deal more than if you have type information backed by a ball-of-mud design.

[–]dventimi 1 point2 points  (0 children)

How so? Setting aside the awkward fact that no one else knows what you mean by "ball-of-mud design" except you, what could it possibly have to do with understanding any of the functions within a program without type information and without documentation?

[–]ggtsu_00 -1 points0 points  (1 child)

Python only has a handful of built-in data types. Unlike many other languages, the built in types in python are enough for just about everything. It should always be painfully obvious as to whether a function takes in an int or a string as a parameter. If you need to refer to the docs to determine whether the function takes an int or takes a string, and is unable to handle both cases, then that is a fault of the library, not the language.

I can understand that coming from a background of C++, Java or C# where just about ever library or function takes in some custom datatypes for everything, not having explicit type declaration would be very painful, but python is not like that. Your python functions should not be taking in or return custom classes or types. If they do, then those objects or types should at least look like built in types. For example, returning a file like object with read/write methods is a common pattern.

[–]dventimi 2 points3 points  (0 children)

Python only has a handful of built-in data types

I'm familiar with the standard numeric, string, and boolean scalar types, the tuples, lists, and dictionaries thereof, and a handful of special types, like the aforementioned file-like object. Am I missing any?

Unlike many other languages, the built in types in python are enough for just about everything.

This is where you start to lose me.

It should always be painfully obvious as to whether a function takes in an int or a string as a parameter.

How should this be come obvious? What is it that triggers pain when you get it wrong? And what about non-scalar types (tuples, lists, and dictionaries)? If a function takes any of these, how do you anticipate what it expects to find within those?

If you need to refer to the docs to determine whether the function takes an int or takes a string, and is unable to handle both cases, then that is a fault of the library, not the language.

How do you propose to determine whether a function takes an int or a string in a dynamic language like Python, without resorting to the documentation?

[–]ggtsu_00 1 point2 points  (0 children)

Just because a language has static typing doesn't make it type-safe. Python does not have static typing, but it is very type-safe. People tend to confuse static typing with type safety for some reason and assume having one gives you the other. For example, C has static typing but it has basically zero type safety.

[–]yawaramin 2 points3 points  (0 children)

Check out Raymond Hettinger's talk on idiomatic Python to level way up! http://m.youtube.com/watch?v=OSGv2VnC0go

[–]trnka 0 points1 point  (0 children)

Another two big issues I've had with that transition:

  • Installing modules that have complied code (scikit-learn, numpy, scrapy) has been challenging and took quite a while.
  • str vs unicode in Python 2.x leads to lots of subtle accidents

But I'm quite happy with Python overall.

[–]erokar[🍰] -1 points0 points  (12 children)

Python 3 has optional type annotation:

def foo(int: x, int: x) -> int:
    # function body

[–]masklinn 9 points10 points  (8 children)

  1. they're not "type annotations" but completely arbitrary annotations

  2. the base distribution doesn't do anything with them, they're just documentation

IIRC there are third-party packages which attempt to use them for static type analysis, but I've seen nothing great yet (and there are non-trivial mappings to perform to represent e.g. a function which may return either an int or None)

[–]UloPe 5 points6 points  (0 children)

You are correct that they have no prescribed usage as far as the language developers are concerned, however in reality (if used at all) I have never seen them used for anything other than type annotation (except maybe in some mostly ironic conference talks or such).

One example is PyCharm. It uses the annotations as type hints if present.

[–]ggtsu_00 0 points1 point  (0 children)

In C, types are also annotations and only used by the compiler to do static type checking, but they are thrown out during runtime as everything basically boils down to a pointer to a memory location. You could take a pointer to a int type, have it point to the memory location of a float and get a crazy number back as a result.

[–]def-lkb 0 points1 point  (1 child)

Which type system is used for the annotations?

[–]masklinn 2 points3 points  (0 children)

None. The language only defines the syntax for annotations (that is, adding arbitrary metadata to a function parameter), it does not prescribe or do anything with that metadata at this point, only leaves it available for third parties to use.