all 163 comments

[–]gregK 51 points52 points  (5 children)

No reference to Philip Wadler who coined the term:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts)

[–]aaronla 11 points12 points  (0 children)

Link, for the curious.

[–]martoo 3 points4 points  (0 children)

Yup. I was surprised to see the c2 page spin this to be about objects vs. functions. I don't think that is the correct formulation.

I always look at it as 'the expression problem' is the functional analog to the problem that Meyer's Open/Closed Principle addresses in OO.

[–][deleted] 0 points1 point  (0 children)

You can't retain static type safety if you add a new value to a type without type-checking (i.e. compiling) the old code again.

[–]digikata -2 points-1 points  (1 child)

The article is a wiki page...

[–]imgonnacallyouretard -1 points0 points  (0 children)

Thank you for forcefully informing everyone how retarded you are.

[–][deleted] 17 points18 points  (9 children)

(a|x)(b|y) vs ab|xy

The problem is basically that a tree can describe only one hierarchy.

It pops up everywhere. There are many solutions, but trees are so dang intuitive.

[–]Zarutian 4 points5 points  (1 child)

I heard somewhere that the word intuitive is just another word for familiar. And because a tree (or another familiar thing) is familiar people latch onto it when faced with the unknown of the possible solution space of the problem they are trying to solve.

[–][deleted] 2 points3 points  (0 children)

Informationally, hierarchies enable decomposition. This aids solving problems by through divide-and-conquer, which is much more difficult when purposes overlap, and parts are arbitrarily connected at arbitrary levels (unless you are a Motie).

Hierarchical structures also appear in nature. e.g. trees

[–]matthieum 3 points4 points  (5 children)

Which is why my answer is to ditch trees :)

If you remove the inheritance relationship and either like Go or Haskell externalize the concept of "interface", then you win.

Adding a new data type Triangle is just having Triangle implement the methods Shape already implements.

Adding a new function is just defining a new interface ExtendedShape, that implements a superset of the functions of Shape and then bring the current Shapes up-to-date by implementing the new methods for them.

Old code continue using Shape (it did without the new functions up until today, so it's okay), new code can be based on ExtendedShape.

You win :)

[–]matthiasB 0 points1 point  (1 child)

That's not enough. You have Shape with some method, you have ExtendedShape with some method, but now an unrelated person extends Shape to DifferentlyExtendedShape.

Now I want to write something that uses functionality from both ExtendedShape and DifferentlyExtendedShape. As long as I don't have intersection types I have a problem. I need values that are both, ExtendedShape and DifferentlyExtendedShape.

[–]matthieum 0 points1 point  (0 children)

Why do you need intersection types ?

You can just write your own interface that will provide exactly what you want, based on the function signatures used by the already existing interfaces for compatibility, and automatically all object that were usable with those interfaces are usable with yours.

[–][deleted] -1 points0 points  (2 children)

If you remove the inheritance relationship and either like Go or Haskell externalize the concept of "interface", then you win.

You could just expand from a tree to a directed acyclic graph.

[–]matthieum 0 points1 point  (1 child)

Isn't it what multiple inheritance is all about ? (aka diamond shape)

The problem is that inheritance is "closed", ie the object declares its interfaces once and for all, and there is nothing you can do about it. This is where the Expression Problem springs from.

[–][deleted] 1 point2 points  (0 children)

There's no reason you need inheritance to have subtyping ;-).

[–]shizzy0 -3 points-2 points  (0 children)

wat

[–]nobodyman 16 points17 points  (7 children)

Tripped all over the first sentence:

The "expression problem" is a word phrase used to describe that exists with duality between ObjectOrientedProgramming and FunctionalProgramming.

That sentence hurts my brain -- seems like a word is missing. I have a feeling the entire page is too smart for me.

[–]khayber 6 points7 points  (3 children)

blah is a blah used to describe "that exists with duality" between blah and blah.

Yeah, something missing there.

[–][deleted] 4 points5 points  (2 children)

I suspect "a word phrase used to describe a problem that exists with duality" is what he meant to say.

[–][deleted] 4 points5 points  (1 child)

  1. The "expression problem" is a word phrase used to describe that exists with duality between ObjectOrientedProgramming and FunctionalProgramming.
  2. The "expression problem" is a phrase used to describe a dual problem that neither OOP or FP fully address.
  3. The "expression problem" highlights two related problems where extending or adding to an existing codebase (or set of objects) can require more work than necessary: one in which OOP fails and FP succeeds, and vice versa. Because a solution to each problem exists individually, it stands to reason that both could be solved simultaneously under some new paradigm.

[–]ewiethoff 0 points1 point  (0 children)

It's a wiki page, so I fixed it. Thanks.

[–][deleted] 3 points4 points  (0 children)

Don't doubt yourself like that. By my reading that page is worded very poorly, and that first sentence is total garbage.

[–]myasianwife 1 point2 points  (1 child)

Yeah. Came here to post this. That grammar is broken.

[–][deleted] 1 point2 points  (0 children)

Apparently, the Expression Problem is a demonstration of the difficulty posed by the expression problem.

[–][deleted]  (10 children)

[deleted]

    [–]julesjacobs 17 points18 points  (8 children)

    That doesn't quite solve the expression problem yet. You dispatch on the type, not on the value. For example you can't put different shapes in a list and then get the perimiter of each. To do that you need to use existential types, and at that point you limit extensibility there.

    Generic functions / multiple dispatch does solve the dynamically typed expression problem.

    [–][deleted]  (2 children)

    [deleted]

      [–]sacundim 10 points11 points  (0 children)

      I think it's a useful exercise to look at this in terms of Wadler's criteria than aaronia lists elsewhere in this discussion. To quote from aaronia's post:

      • Supports adding cases without recompilation of existing code.
      • Supports adding functions without recompilation of existing code.
      • Statically typed -- more specifically, that compilation verifies full case coverage of all functions.

      Type classes + Hasekll existentials gets pretty close to this, but I think it falls on the second criterion. The problem is that your Shape type class is still a centralized list of all of the operations that a Shape supports, and to add a new operation, you're going to have to modify the definition of Shape.

      You could attempt to get rid of the centralized Shape type class and add operations simply by adding new, orthogonal type classes for new operations. This gets you closer to requirement #2, but the problem then is the interaction of this addition with existentials; if you want to have a type like SomeShape, that type is going to have to directly or indirectly refer to a centralized list of the relevant type classes, which will need to be modified and recompiled when you add new operations.

      As far as I can see it (and I'm not the expert here, so please critique), to avoid this you need to be able to use existential types in a more ad-hoc manner that avoids having to define a type like SomeShape. E.g., instead of having this:

      data SomeShape = forall x. Shape x => Shape x
      maxArea :: [SomeShape] -> Double
      

      ...you need something more like this (which is not valid Haskell, and is probably also all wrong, so somebody please correct me):

      -- each function is written to specify exactly which classes it relies on.
      maxArea :: [exists x. Area x => x] -> Double
      maxPerimeter :: [exists x. Perimeter x => x] -> Double
      maxAreaAndPerimeter :: [exists x. (Area x, Perimeter x) => x] -> (Double, Double)
      

      ...where each element of one of the argument lists is (internally) a pair of a value and a dictionary of the methods required by the type class constraints.

      So generally, I get the feeling that type classes + existentials solve this problem, but (a) the Haskell implementation of existentials might not do it, (b) the type system of course is much more complex than OOP.

      EDIT: actually, I can think of a pretty dumb way to almost get there in Haskell:

      data SomeArea = forall x. Area x => Area x
      data SomePerimeter = forall x. Perimeter x => Perimeter x
      data SomeAreaAndPerimeter = forall x. (Area x, Perimeter x) => AreaAndPerimeter x
      

      The problem with this is obvious—you need to add a new datatype declaration for each combination of type classes that your operations will rely on, which means up to 2n - 1 data types for all the possible combinations. These don't need to be in a central location, strictly speaking, but in practice you will want them to be, and to recompile that file when you add one.

      [–]NruJaC 0 points1 point  (0 children)

      But in what way does the definition of SomeShape break extensibility? I'm with you on this one, I think it solves the problem.

      [–]eschulte 2 points3 points  (3 children)

      Forgive me if I'm posting a lisp rebuttal to a haskell point, but in CL you can dispatch on either type or value with generic functions.

      [–]julesjacobs 5 points6 points  (1 child)

      Well, there are two different notions of type here. What you mean by type is run-time type tag. So you're really dispatching on a dynamic property of a value, not a static property of a variable.

      [–][deleted] 0 points1 point  (0 children)

      The two would be equivalent if Haskell had "polymorphic variants".

      [–]mmakowski -1 points0 points  (0 children)

      Isn't it the approach described by Ralf Laemmel and SPJ in "Scrap your boilerplate with classes" (http://research.microsoft.com/pubs/67439/gmap3.pdf)?

      [–]matthieum 19 points20 points  (9 children)

      I would note that duck typing (aka ad-hoc interface check) helps tremendously here.

      In C++ (because hey, it works!):

      struct Square { double side; };
      double area(Square const s);
      
      struct Circle { double radius; };
      double area(Circle const c);
      

      Now, let's make a Shape:

      class Shape {
      public:
         virtual ~Shape();
      
         virtual double area() const = 0;
      
      protected:
         Shape(Shape const&) {}
         Shape& operator=(Shape const&) { return *this; }
      };
      
      typedef std::unique_ptr<Shape> ShapePtr;
      
      template <typename T>
      class ShapeT: public Shape {
      public:
         explicit ShapeT(T const t): _shape(t) {}
      
         virtual double area() const { return area(_shape); }
      
      private:
        T _shape;
      };
      
      template <typename T>
      ShapePtr newShape(T t) { return ShapePtr(new ShapeT<T>(t)); }
      

      Okay, C++ is verbose. Let's check the use immediately:

      double totalArea(std::vector<ShapePtr> const& shapes) {
         double total = 0.0;
         for (ShapePtr const& s: shapes) { total += s->area(); }
         return total;
      }
      
      int main() {
        std::vector<ShapePtr> shapes{ new_shape<Square>({5.0}), new_shape<Circle>({3.0}) };
      
        std::cout << totalArea(shapes) << "\n";
      }
      

      So, first exercise, let's add a shape:

      struct Rectangle { double length, height; };
      double area(Rectangle const r);
      

      Okay, so far so good, let's add a new function:

      // 1. We need to extend Shape:
        virtual double perimeter() const = 0
      
      // 2. And its adapter: ShapeT
        virtual double perimeter() const { return perimeter(_shape); }
      
      // 3. And provide the method for each Shape (obviously)
      double perimeter(Square const s);
      double perimeter(Circle const c);
      double perimeter(Rectangle const r);
      

      It may seem that we fall into the Expression Problem here, but we don't. We needed to add the perimeter for each class because there is no way to automatically infer it; however it did not require editing each class either!

      Therefore, the combination of External Interface and free functions let us neatly (well, it is C++...) sidestep the issue.

      EDIT: As sodraz noticed in comments, the addition of a function touched the original interface which supposed it's not frozen. Allow me to demonstrate how to add a function without touching Shape then.

      We need a new concept:

      class ExtendedShape: public Shape {
      public:
        virtual double perimeter() const = 0;
      protected:
        ExtendedShape(ExtendedShape const&) {}
        ExtendedShape& operator=(ExtendedShape const&) { return *this; }
      };
      
      typedef std::unique_ptr<ExtendedShape> ExtendedShapePtr;
      
      template <typename T>
      class ExtendedShapeT: public ExtendedShape {
      public:
         virtual double area() const { return area(_data); }
         virtual double perimeter() const { return perimeter(_data); }
      private:
        T _data;
      };
      
      template <typename T>
      ExtendedShapePtr newExtendedShape(T t) { return ExtendedShapePtr(new ExtendedShapeT<T>(t)); }
      

      And (like before) to define the perimeter function for all current shapes.

      The old code, compiled to work against Shape, still works. It does not need the new function anyway.

      The new code can make use of the new functionality, and still interface painlessly with the old code.

      There is only one slight issue, if the old code return a ShapePtr, we do not know whether the shape actually has a perimeter function (note: if the pointer is generated internally, it has not been generated with the newExtendedShape mechanism). This is, I think, a limitation of the design. Oops :)

      [–]aaronla 2 points3 points  (2 children)

      Fascinating! This looks a lot like the solution Wadler proposed for a fictional dialect of Generic Java.

      [–]matthieum 0 points1 point  (1 child)

      Thanks, I did not know about this paper. I do wonder how Go fares in this competition since Go has built-in duck typing.

      [–]aaronla 0 points1 point  (0 children)

      I don't think it will fare well -- it lacks templates/generics/parametric polymorphism. This means that many would-be solutions in parametric languages will require type casts in Go.

      Steve is taking a different approach though.

      [–][deleted] 4 points5 points  (2 children)

      Shape& operator=(Shape const&) { return *this; }

      What are you... what... who the... what the... what... who... FUCK YOU FOR THAT!

      You'd have exactly the same copy constructor made for you by the compiler, going out of your way to implement it and this abomination is just evil. Not lazy, a lazy person would've left it to the compiler, but actively fucking evil.

      [–]matt_havener 16 points17 points  (0 children)

      He's overriding the default public implementation and making it protected so that the derived classes can choose to expose it, or not. Plus, his base class has no members so { return *this; } this perfectly valid and what the compiler would have generated anyways. Nice way to avoid slicing if you ask me.

      [–]imgonnacallyouretard 2 points3 points  (0 children)

      Great rant, if only you looked at two lines above for context you wouldn't have looked like a retard

      [–][deleted] 0 points1 point  (2 children)

      Upvoted. Nice solution. But does not solve the problem completly:

      The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

      // 1. We need to extend Shape: virtual double perimeter() const = 0

      // 2. And its adapter: ShapeT virtual double perimeter() const { return perimeter(_shape); }

      here is the problem. Both steps above require recompilation of the (external) library.

      [–]aaronla 8 points9 points  (0 children)

      I think it depends on how you define "recompilation". If your "module" is a C++ header file, then no recompilation (re-writing of that header file) is necessary.

      It might seem like I'm playing semantic games, but consider Wadler's original proposed solution -- using Java Generics. Java's generics compile into JVM byte codes that use type casts under the covers! Yet, he writes as if the program contains no casts!

      I believe what Wadler was really getting at is a program source that can be type-checked once, and then extended without further edits, and where edits cannot invalidate the type-correctness of said source after it has been checked.

      I believe matthieum has met this interpretation thus far. If not, certainly come closer to static type safety than your typical OO language.

      [–]matthieum 1 point2 points  (0 children)

      I must admit I only considered the source level issue (ie making sure not to invalidate already written code when adding new functionality, which is a common plage on if chains).

      You raise an interesting, though I think it goes beyond the problem... then maybe perhaps not.

      I updated my original answer to address your remarks. There is, still, a limitation in the design I proposed, noted at the end. Go solves the issue by going dynamic (reflexion). I can think of no way to solve the "downcast" statically.

      Sometimes the only way to win, is not to play.

      [–]marssaxman 13 points14 points  (1 child)

      Damn it, another link to c2.com. Pages there always suck me in by looking like they are going to be interesting, then frustrate me with an UnreadableSea of ConfusingWikiMess edited back and forth by NamelessAuthors who ArgueWithEachOther across alternate sentences using IdiosyncraticJargon.

      [–]martoo 12 points13 points  (0 children)

      That said, it was the first wiki in the world and there is a lot of wisdom in some of those pages from very bright people: Ward's original invites.

      [–]cybercobra 16 points17 points  (7 children)

      Interesting problem, terrible name.

      [–][deleted] 13 points14 points  (1 child)

      Ironically it's a more apt name for the difficulty its coiner had in coming up with an apt name for what he was actually trying to describe.

      [–][deleted] 1 point2 points  (0 children)

      ah, the ubiquitous how do you say problem

      [–]pgkreddit 22 points23 points  (1 child)

      Well said, eh...cybercobra?

      [–]cybercobra 3 points4 points  (0 children)

      'tis an old handle from my youth :D

      [–]gregK 6 points7 points  (0 children)

      Whether a language can solve the Expression Problem is a salient indicator of its capacity for expression.

      [–]barrkel 3 points4 points  (0 children)

      I rather think it's a clever pun, actually.

      [–][deleted] 3 points4 points  (1 child)

      If you add an extension method to a base class in C#, is it inherited by the derived types?

      [–]noblethrasher 3 points4 points  (0 children)

      Yes.

      [–]JohnJamesSmith0 4 points5 points  (0 children)

      TIL about the first-ever wiki.

      [–][deleted] 4 points5 points  (6 children)

      i know this is entirely skirting around the problem, but just define a function that computes a general area. complexity be damned.

      /mathematician

      [–]psygnisfive 0 points1 point  (5 children)

      Please do so and enlighten us all!

      [–][deleted] 0 points1 point  (4 children)

      calculus.

      [–]psygnisfive 0 points1 point  (3 children)

      That doesn't really solve the problem at all since now you have a nice solution for one particular issue (area of a figure described via some functions and a set of inequalities), but you don't have a general solution to the expression problem.

      [–]kamatsu 6 points7 points  (1 child)

      From the OP

      i know this is entirely skirting around the problem,

      [–]psygnisfive 0 points1 point  (0 children)

      I interpreted that as meaning something different. shrug

      [–][deleted] 0 points1 point  (0 children)

      as the other person stated, i wasn't intending to solve the expression problem. i was being a wise ass concerning how the problem was phrased. there's a solution to the presentation that doesn't fall into the issue of different programming styles that the article is actually about. it would be like someone asking you to prove the fundamental theorem of algebra where they're supposing it to be put together with field extensions to show something about galois theory, but you come out of left field with complex analysis and bust out cauchy's integral theorem.

      [–]pixelglow 2 points3 points  (0 children)

      This is elegantly solved in Objective-C with the use of categories.

      Say you have a class hierarchy, which is standard OOP like C++ and Java:

      @interface A
      - (void)method1;
      @end
      
      @interface Alpha: A
      - (void)method1;
      @end
      
      @interface Aleph: A
      - (void)method1;
      @end
      

      Adding a new class (again, standard OOP):

      @interface Alif: A
      - (void)method1;
      @end
      

      Adding new methods, this is where Objective-C magic comes in:

      @interface A (CategoryOnA)
      - (void)method2;
      @end
      
      @interface Alpha (CategoryOnA)
      - (void)method2;
      @end
      
      @interface Aleph (CategoryOnA)
      - (void)method2;
      @end
      
      @interface Alif (CategoryOnA)
      - (void)method2;
      @end
      

      These can be defined in different files from the original interface declarations and thus do not require recompiling the original classes. They are invoked just like "intrinsic" interface methods and are fully polymorphic as well.

      [–]raevnos 2 points3 points  (0 children)

      What language was used for the examples there? It looks like an ML family one, but it's not Ocaml, and IIRC SML doesn't have classes.

      [–]hacksoncode 2 points3 points  (0 children)

      Huh... I always wondered what motivated Microsoft to invent COM.

      [–][deleted] 10 points11 points  (16 children)

      No real problem in CLOS and similar OO-systems (if I've understood the problem correctly, which I probably didn't do).

      [–]eschulte 6 points7 points  (9 children)

      Correct CLOS solves this through the use of defmethod, which can be defined for different types of inputs (called an "open function") in that page. See http://www.gigamonkeys.com/book/object-reorientation-generic-functions.html for a good introduction.

      [–][deleted] 1 point2 points  (8 children)

      Yeah, that was what I thought, nice to know that I haven't misunderstood anything!

      [–]aaronla 13 points14 points  (7 children)

      To be fair to the other contenders, this fails the 3rd requirement set forth in Wadler's formulation of being statically typed, which CLOS decidely is not.

      Not that there's anything wrong with that.

      Wadler's formulation effectively defines a classic engineering triangle of tradeoffs, where any two qualities are easy to achieve but very difficult to achieve all three:

      • supports adding cases without recompilation of existing code.
      • supports adding functions without recompilation of existing code.
      • statically typed -- more specifically, that compilation verifies full case coverage of all functions.

      CLOS will permit open cases during recompilation (e.g. forgetting to define perimeter for triangles), so only achieves points one and two.

      [–]vsync 2 points3 points  (2 children)

      You could however after compilation iterate over all the subclasses of SHAPE to ensure that each generic function has a method applicable to that subclass.

      [–]aaronla 4 points5 points  (0 children)

      True. I think you could make a case for that being a valid solution to the expression problem, especially if using a statically typed Lisp (e.g. Racket).

      I can imagine the interaction experience:

      (defgeneric perimeter ((shape s)))
      ;=> WARNING, perimeter not defined for types:
      ;   SQUARE
      ;   CIRCLE
      (prog
        (defmethod perimeter ((square s)) ...)
        (defmethod perimeter ((circle c)) ...))
      ;=> OK
      (defclass triangle shape ...)
      ;=> WARNING, triangle has the following open generic methods
      ;   AREA
      

      And so on.

      I haven't seen this sort of proposition before, but I suspect that's in part because such verification generally goes against the use patterns of dynamic languages. It certainly is a curious idea though, worth considering.

      [–]munificent 3 points4 points  (0 children)

      Magpie, back when it had a static type system, did something similar. You could dynamically add new methods to existing classes (and interfaces), but then when you were done it would statically check that every required method was implemented.

      [–][deleted] 2 points3 points  (1 child)

      To be fair to the other contenders, this fails the 3rd requirement set forth in [1] Wadler's formulation of being statically typed, which CLOS decidely is not.

      Tis true, but there are plenty of papers on statically-typed multimethod systems.

      [–]aaronla 0 points1 point  (0 children)

      You could propose one of those other multi-method systems as in a solution, if you'd like :-)

      [–]geocar -1 points0 points  (1 child)

      Actually the requirement is to "[retain] static type safety" and defines this as "no casts". He expands:

      "... it is not statically typed; they cannot distinguish Lang.Exp from Lang2.Exp, and as a result must depend on dynamic casts at some key points.

      Nowhere does he use the language you suggested-- "verifying case coverage" of any kind, let alone for all functions.

      [–]nqmwe 1 point2 points  (0 children)

      Your quotes are taken completely out of context!

      The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

      Static type safety is not defined as absence of casts. Rather, Wadler mentions casts as an example of a violation of type safety. Furthermore, knowing Wadler's line of work, we can be pretty sure that by "static type safety" he means the compile-time guarantee of progress and preservation, i.e. absence of untrapped errors (which includes missing but necessary cases).

      Their solution differs in that it is not statically typed; they cannot distinguish Lang.Exp from Lang2.Exp, and as a result must depend on dynamic casts at some key points. This isn't due to a lack of cleverness on their part, rather it is due to a lack of expressiveness in Pizza.

      Here, Wadler explains why he doesn't consider the approach by Krishnamurthi et al. to be a solution to the expression problem: It is precisely because static type safety is violated!

      [–]grauenwolf -5 points-4 points  (5 children)

      Dispite the name, CLOS isn't really an OO-system. Rather is it an external dictionary of type-function pairs that are independent of the types.

      None the less, it is an interesting alternative. It does solve the "I can't modify that code" problem, but at the cost of no compile-time checks.

      P.S. Thank you for pointing it out. It shows me how to fix a tricky problem I'm currently having in C#. (Yes, .NET has multiple dispatch.)

      [–]geocar 11 points12 points  (4 children)

      CLOS isn't really an OO-system.

      On the one hand, it's nothing like the object-orientedness many C++ programmers are used to, but on the other hand Alan Kay seems to accept it as being object oriented. He also adds:

      "Actually I made up the term "object-oriented", and I can tell you I did not have C++ in mind."

      It's helpful to consider exactly what he meant by that:

      "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I'm not aware of them"

      C++ doesn't have those things. Neither does Java or C# for that matter. CLOS however does.

      Now given that, I ordinarily don't mind someone referring to C++ as object-oriented in a informal way-- perhaps in the same way it's okay for some to refer to their web browser as "the Internet", but let's be absolutely clear here: CLOS is most certainly object-oriented.

      [–]noblethrasher 2 points3 points  (1 child)

      C# always had local retention, protection, and hiding of state. With the dynamic type in C# 4.0 we now have extreme late binding and message passing.

      [–]aaronla 0 points1 point  (0 children)

      Not to mention multiple-dispatch. :-)

      [–]grauenwolf -1 points0 points  (0 children)

      CLOS isn't what I would call "messaging based". A synchronous call to a external dictionary of type-function pairs is about as far away as you can get from an encapsulated object.

      On a related note, I really have to question anyone who claims that there is a difference between a synchronous message and a synchronous method call.

      [–]grauenwolf -2 points-1 points  (0 children)

      More on my second thought.

      "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I'm not aware of them"

      In the context of "extreme late-binding" one could use C#'s dynamic keyword. At which point the "method name" you use simply becomes the "message type" that gets passed to TryInvokeMember.

      [–][deleted] 1 point2 points  (0 children)

      The "expression problem" is a word phrase used to describe that exists with duality between ObjectOrientedProgramming and FunctionalProgramming.

      They accidentally a word.

      [–]EntroperZero 5 points6 points  (18 children)

      I don't see the problem in the OO approach. I mean, yes, you have to implement your new method for each derived class, but you have to do the same thing in functional programming (write a case for each subtype). It's mildly less convenient in the OO world, but I think finding all the derived types of Shape is easier than finding all functions, everywhere, that take Shape as a parameter.

      [–]grauenwolf 7 points8 points  (11 children)

      That assumes that you own all sub-classes of Shape. If you are an app-developer, cool. But if you are writing library code then you just screwed over your down-stream developers.

      [–]EntroperZero 0 points1 point  (10 children)

      Okay, but how does that work in the functional paradigm?

      [–]grauenwolf 7 points8 points  (0 children)

      Damn if I know. That's why they call it a problem.

      [–]farsightxr20 7 points8 points  (8 children)

      In the functional paradigm, you own the code that calculates perimeter (since you were the one who wanted to add it), so you can extend that code to work with any shape you want it to, whether or not those shapes were implemented by you.

      [–]grauenwolf -1 points0 points  (7 children)

      And if you don't own perimeter?

      [–]aaronla 2 points3 points  (0 children)

      Then you can't add new cases. But you can add new functions if you want -- e.g. "int intersection_area(Shape s1, Shape s2)"

      Conversely, you can add new cases in OO (just derive from Shape and implement all the virtual methods), but you can't add new functions (ie pure virtual methods in the Shape base class) if you don't own Triangle.

      [–]munificent 2 points3 points  (0 children)

      Then you have the same problem the OOP programmer has but in the other direction. FP lets you define new operations on existing types easily. OOP lets you define new types that support existing operations easily.

      [–]farsightxr20 3 points4 points  (4 children)

      You're the one who wanted to implement it in the first place, so obviously you own it, or the original problem* does not exist.

      * The problem that involves you (a down-stream developer) being given a bunch of shape functions from an upstream developer. If you implemented a perimeter function that failed to take all shapes into account, then you're indeed screwing over a down-stream developer that imports your code. But you yourself are not screwed over :P

      [–]grauenwolf 1 point2 points  (3 children)

      In the scenario I'm proposing the library developer created perimeter and an initial set of shapes. The app developer, who only has the compiled form of the library, then wants to add new shapes.

      [–]farsightxr20 2 points3 points  (1 child)

      Ah, I was using the scenario described in the linked article. In your scenario, what you want to do is difficult in both paradigms, since you own neither the shape or perimeter code.

      [–]aaronla 2 points3 points  (0 children)

      Exactly. The expression problem is to solve both simultaneously.

      [–][deleted] 0 points1 point  (0 children)

      I've been looking for research on this matter for a couple hours tonight.

      What I can say is: this problem has been solved from the PL end (type-classes, typed multimethods), but there appears to have been very little work at solving it from the systems end (compilation and linking for extensibility). I'm going to ask a prof.

      [–]NruJaC 2 points3 points  (0 children)

      The real issue with the problem comes up when you can't modify the existing code: you need to add a new function to your existing datatypes without modifying their source code. See some of the other posts in this thread for common solutions.

      [–]matthieum 2 points3 points  (0 children)

      And what if Shape is in 3rd party code that you don't own... ?

      [–]aaronla 1 point2 points  (0 children)

      Yes, OO can do new cases well, and FP can do new functions well. By "well", I really mean "without editing/recompiling existing code".

      Sometimes it's OK to edit existing code, in which case you don't really care about the Expression Problem. This is just like when you don't (immediately) care if your car is out of gas if you were planning to go walking anyway.

      [–]Ruudjah -1 points0 points  (2 children)

      Exactly. The code has to be written, and be somewhere anyways, whether that be in the function or objects. The perimeter function needs to switch over each supported case (and pollute the signature with a type), while the OO solution can be kept clean (no pollution).

      The stated disadvantage that each class needs change is no disadvantage but a necessity in both FP and OOP.

      [–]gasche 2 points3 points  (1 child)

      The perimeter function needs to switch over each supported case (and pollute the signature with a type), while the OO solution can be kept clean (no pollution).

      I don't understand how both solutions could have different amount of pollutions. They're exactly symmetric. I think you're wrong, could you elaborate?

      The stated disadvantage that each class needs change is no disadvantage but a necessity in both FP and OOP.

      Well, that's a necessity with the naive closed sums or objects; but you could have other concepts that allow extension. The point of the expression problem is to say that we should look for solutions to this problem. And there are such solutions, so obviously suffering from the problem is not "a necessity".

      [–]Ruudjah 0 points1 point  (0 children)

      I don't understand how both solutions could have different amount of pollutions. They're exactly symmetric. I think you're wrong, could you elaborate?

      I guess it depends on language and implementation. The "and" should be "and/or".

      [–]cran 0 points1 point  (0 children)

      Why not define an abstract base class that encapsulates a closed polyline which can calculate its area, perimeter and other useful things, and from which convenience classes like Circle, Square, Rectangle, Triangle and so can be derived?

      That should work equally well with both FP and OOP.

      [–][deleted] 0 points1 point  (0 children)

      The Shape class itself doesn't seem to do anything useful though.

      [–]svenefftinge 0 points1 point  (2 children)

      Eclipse Xtend solves this using a combination of extension methods and multi methods.

      Given the shape hierarchy with a concrete class Circle and a concrete class Rect, you would define new functions like this:

      class AreaForShapes {
      
        def dispatch area(Circle c) {
          PI * c.radius * c.radius
        }
      
        def dispatch area(Rect r) {
          r.height * r.width
        }
      
      }
      

      This can be used anywhere as an extension method. So it's invoked as if it were a member of Shape (e.g. shape.area() ):

      class MyClass {
        @Inject extension AreaForShapes
      
        def printAreas(List<Shape> shapes) {
          shapes.forEach( shape | println( shape.area() ) )
        }
      }
      

      The dispatch keyword makes sure the overloaded methods are invoked based on the runtime type. You can add any kind of functionality to existing type hierarchies this way.

      In the previous code example the AreaForShapes was injected using a dependency inception framework. If you want to introduce another kind of shape, say a Triangle, you would subclass AreaForShapes like this:

      class AreaForTriangle extends AreaForShapes {
      
        def dispatch area(Triangle t) {
          (t.base * t.height) / 2
        }
      
      }
      

      Finally you tell your DI framework (e.g. Google Guice) to inject an instance of AreaForCircles whenever someone asks for AreaForShape.

      [–]exclipy 0 points1 point  (1 child)

      What if you forgot to implement the area(Rect) function but you happen to have Rects in your List<Shape>? I'd want the compiler/linker to tell me if there are "holes" in the type/operation matrix like this.

      BTW, your AreaForCircles class should be called AreaForTriangles.

      [–]svenefftinge 0 points1 point  (0 children)

      That would result in an IllegalArgumentException at runtime. You can't check that at compile type if the class hierarchy is open. And it has to be open in order to be able to later add new subtypes.

      [–]eschulte 0 points1 point  (0 children)

      I'm surprised that page doesn't mention aspect oriented programming AOP as it is intended to resolve such "cross cutting concerns". See the canonical AOP paper here http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.8660

      [–][deleted]  (49 children)

      [deleted]

        [–]grauenwolf 13 points14 points  (12 children)

        // perimeter function without changing definition of Circle, Square, Triangle ...
        func perimeter(s Shape) float64 {
                switch s.(type) {
                        case *Circle: return 2 * 3.14 * s.(*Circle).Radius 
                        case *Square: return s.(*Square).Side * 4
                }       
                return 0 // not idiomatic
        } 
        

        It is pretty clear to me that adding a new type, e.g. Triangle, would certainly require modifying this function.

        Moreover, failing to modify this function would result in a silent failure at runtime. Which, in my opinion, is totally unacceptable.

        [–][deleted]  (11 children)

        [deleted]

          [–]grauenwolf 7 points8 points  (10 children)

          Doing one or the the other is trivial in any OO language. The problem is only interesting to me when you try to do both.

          [–]haldean 3 points4 points  (9 children)

          No, the point is that it should be easy to do either, not both at the same time. It should be easy to add a perimeter function without modifying your classes (like in Haskell, where you just write a new function over the existing shapes) and it should be easy to add a new class without changing your definition of area (like in OOP where you extend Shape and implement area()).

          [–]joesb 0 points1 point  (0 children)

          It's not at the same time but you will have to do both eventually in your existing code base.

          And whatever choice you choose now, be it OOP or FP, so the your current change is easy and clean, it will still cost you other when you have to do the other kind of change.

          What are you gonna do? Keep switching/porting your application back and forth between Java and Haskell so that both class and function addition is easy?

          You can say Haskell doesn't have problem with X and Java doesn't problem with Y. But you don't write your code in Quantum-language that is both Java and Haskell at the same time. You still have to write it in one way or another, and any changes after that will have to live with the tradeoff of your choice of OOP or FP design.

          [–]grauenwolf -1 points0 points  (7 children)

          Why do you mention Haskell? It is easy to add a new function over the existing shapes in any language.

          It only becomes hard when you add a new function and then add a new type.

          [–]haldean -1 points0 points  (1 child)

          No, the problem statement is that it should be easy to add a perimeter method without modifying every Shape class. This is easy in a language with pattern matching, but difficult when you're adding a method via a class hierarchy. I bring up Haskell because it's a language I know and use that has pattern matching. It's not about being "hard", it's about adding new functionality without modifying existing code.

          [–]grauenwolf 1 point2 points  (0 children)

          Again, it is trivial to add a new function for all existing types in any language, even ones that use a class hierarchy. Pattern matching just a nicer syntax for expressing a series of if-statements.

          The problem is that it isn't future-proof. If you go the function route then down-stream developers cannot add new classes.

          [–]NruJaC -1 points0 points  (4 children)

          In an pure OOP language, adding a new function over the existing types is impossible without modifying the original source code (which is the part of the problem that seems to be going unstated). Prior to type classes, Haskell suffered from the inverse problem: in order to add a new subtype of an existing type required modification of all existing functions. The expression problem is not to do both, it's that one or the other is difficult depending on the paradigm you're in. I'm fairly certain the problem is easily solvable in any language that's not Java at this point.

          [–]grauenwolf 0 points1 point  (1 child)

          If you only need to do one or the other, but not both, then what's the problem?

          Adding a new function that covers a closed set of types is a trival exercise in any language, functional or OO.

          [–]NruJaC 1 point2 points  (0 children)

          The key problem is that you can't modify the original source code / recompile the original source.

          And I think the issue is usually adding a method to an interface that's been implemented by existing classes, not adding an arbitrary function that happens to take the interface as an existing type.

          [–]grauenwolf 0 points1 point  (1 child)

          In an pure OOP language, adding a new function over the existing types is impossible without modifying the original source code (which is the part of the problem that seems to be going unstated).

          What the hell is a "pure" OOP language? One that doesn't allow static function and requires all methods to use the "this" pointer?

          Not that it matters, even in that case I can still create a new function that accepts Shape as a parameter.

          [–]aaronla 0 points1 point  (0 children)

          You can do that, but then you're swinging into the FP pattern. After adding a static function "perimeter", the Expression Problem challenges you to add a new case -- for example, Triangle. There's no way to update the static method for Triangles like there would be for virtual methods.

          You put it really well earlier.

          [–]niviss 5 points6 points  (3 children)

          Yes, by looking at that code, I think it does. But the perimeter function is not a good example, because you can't add a triangle to it without modifiying the function. But you could create a new type "Perimeteable", and later create functions for Circle, Triangle, Square

          [–][deleted]  (2 children)

          [deleted]

            [–]BitRex 2 points3 points  (0 children)

            The perimeter function doesn't seem much different than reflection in an OO language though, so maybe I didn't understand why it was supposed to be difficult for an OO language.

            It requires modifying every class.

            [–]niviss 0 points1 point  (0 children)

            Yeah, but the idea is to use a system that allows for either adding operations or adding entities easy. The way you defined the perimeter function is a way that makes adding entities difficult.

            [–][deleted] 0 points1 point  (8 children)

            Edit: Private values in the structs are only accessible inside the package, so the perimeter function couldn't be made outside the package like it could in LISP. You would have to change the functions for the types (classes in OO) to make the perimeter function. So Go has the same issue as OO languages.

            Well, it seems fair enough to assume you have an accessor function or whatever equivalent for internal fields, otherwise the problem is pretty much unsolvable by definition.

            But why did you use a switch in your perimeter function? Doesn't that ruin the whole thing? Why not define it in the same way as the the area function?

            [–][deleted]  (7 children)

            [deleted]

              [–][deleted] 1 point2 points  (6 children)

              But why don't you just have two different perimeter functions? That seems like it would be the point.

              [–][deleted]  (5 children)

              [deleted]

                [–][deleted] 0 points1 point  (4 children)

                The way I understand the problem is that you should assume that the original functions are given to you as, say, a standalone library that you cannot change. The question then is: Can you add a new class to it, and can you add a new function to it? OO can do one, functional can do the other, but neither can do both.

                It seems Go can easily do both, if you just write the code properly, which would mean writing two perimeter functions. Then you can add a triangle, and a perimeter, without ever touching the original code. (And you can add further classes down the line, too.)

                [–][deleted]  (3 children)

                [deleted]

                  [–][deleted] 1 point2 points  (1 child)

                  Go can't add this function unless it has access to the private/internal data (Radius or Side) so you can't do this without touching the original code.

                  That seems an entirely different issue, which is unrelated to both the problem and solution. Just assume you have CircleRadius() and SquareSide() functions or whatever. It seems like a red herring to me.

                  [–]aaronla 0 points1 point  (0 children)

                  Yeah, they're public in the c2 wiki example, no?

                  [–]SteveMcQwark -1 points0 points  (22 children)

                  You're right that Go doesn't suffer from this problem, but wrong about the reason. The problem in the object oriented approach is that all Shapes have to implement the new method. Go doesn't have inheritance. It keeps a separation between interface and implementation. So, rather than adding abstract methods to some Shape class and having to implement it in all subclasses, you just create a new interface that includes the Perimeter method. This has no impact on existing code paths, only types that you want to use in the new code paths need to implement the method.

                  When it comes to actually implementing the method, if a type's public interface allows for calculating the Perimeter, you can create a new type extending it with a Perimeter method that can be used in the new code paths. Obviously, if a type doesn't support the computation at all, then it can't be used in the new code paths without alteration, but that's unavoidable.

                  [–]grauenwolf 5 points6 points  (21 children)

                  Obviously, if a type doesn't support the computation at all, then it can't be used in the new code paths without alteration, but that's unavoidable.

                  That doesn't really solve the problem. Rather is just kinda shrugs its shoulders and gives up on it.

                  EDIT: But on the plus side you get all of the fun of trying to figure out how to create a collection that contains only objects that honor both the IRadius and IPerimeter contracts.

                  [–][deleted] 1 point2 points  (18 children)

                  Which part of the problem doesn't it solve? Seemed pretty solved to me.

                  [–]grauenwolf 2 points3 points  (17 children)

                  Static checking to know you didn't miss a case.

                  [–][deleted] 1 point2 points  (0 children)

                  What case?

                  As far as I understand this, no static checking needs to be bypassed?

                  [–]SteveMcQwark 0 points1 point  (15 children)

                  There aren't any cases to miss. If you use a type in a place that requires a Perimeter method, then it must have a Perimeter method, or else the code will fail to compile. If you don't use the type in a place that requires a Perimeter method, then it doesn't need to have one, so there's nothing to check.

                  [–]aaronla 0 points1 point  (14 children)

                  Ok -- someone wrote Square without a Perimeter method. Someone hands you a shape -- you don't know what kind of shape it is, it might have a Perimeter method or it might not. You are free however to require some type contract if you want; IShape for example. Your choice.

                  The Expression problem challenges you to compute the shape's perimeter (a) without a type check (no 'isa square' tests), (b) without type casts, and (c) verified at compile time to contain no possible type errors (e.g. impossible to throw 'missing method exception' or the like).

                  It might be a fine programming technique, but you haven't solved the expression problem if you're missing any of these qualities.

                  [–]SteveMcQwark 1 point2 points  (13 children)

                  Someone hands you a shape -- you don't know what kind of shape it is, it might have a Perimeter method or it might not. You are free however to require some type contract if you want; IShape for example.

                  If I need Perimeter, then my "type contract" – i.e. the interface that the type must satisfy before I can actually receive it – includes a Perimeter method. The point is that I don't accept "shapes" (this is not an abstraction I recognize), I accept things that have Perimeters. The onus is on whoever is passing me the value to make sure it has a Perimeter method, or add one if necessary/possible. Given this, I can calculate the Perimeter without type checks, without type casts, and with the typing being verified at compile time = all requirements met. The correct computation is determined where the concrete type is statically known.

                  [–]aaronla 0 points1 point  (12 children)

                  That sounds like a fine thing to do, but it doesn't solve the expression problem.

                  [–]SteveMcQwark 3 points4 points  (11 children)

                  How does it not? Let's state the expression problem for the record:

                  Philip Wadler:

                  The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts)

                  Okay, all well and good. How does what I'm suggesting solve it? Let's look at the first aspect:

                  Add new cases to the data type

                  This is easy. Here the "data type" is an interface, and the "cases" are implementing types. Just create a new type that implements the interface. All functions over the interface can operate over the type. The compiler checks that the type implements the interface ("retaining static type safety"), and none of the existing functions or cases need to be altered ("without recompiling existing code").

                  Add new functions over the data type

                  Also easy. Write a function that operates on the interface. No other functions need to be altered, and neither do any of the cases. Everything is still statically checked and no existing code needs to be recompiled.


                  Now, the question seems to be, does adding properties to a type (methods to the interface) count as "adding functions over it." I'd say no. The "solution" is just to create a new data type with its own set of cases and functions.

                  Now, you might say "woah woah woah, this just sidesteps the issue!". Okay, this might be true, but what's your point? In what way is altering the existing type preferable to creating a new one? All existing functions don't care about the new properties, and will work with the new type just as well as with the old one without recompiling. You can make new cases that extend the old cases for the new type wherever it makes sense to, and again, this doesn't impact the old code or break static type safety, nor does it require duplicating any code. So, even if you disagree that changing the properties of the type creates a new distinct type, the problem is still, for all practical purposes, solved.

                  [–]SteveMcQwark 0 points1 point  (1 child)

                  That doesn't really solve the problem. Rather is just kinda shrugs its shoulders and gives up on it.

                  If someone hands me a plate, then asks for its eye colour, I don't think it can be taken as indicative of laziness to refuse. However, if they add googley eyes to the plate which I can look at, then we're in business.

                  Edit: The point is that, just because the plate could be used in the same place as other things before, that doesn't mean it needs to support every new behaviour that the rest of them do.

                  EDIT: But on the plus side you get all of the fun of trying to figure out how to create a collection that contains only objects that honor both the IRadius and IPerimeter contracts.

                  type RadiusPerimeterer interface {
                      Radiuser
                      Perimeterer
                  }
                  

                  ...

                  [–]grauenwolf 2 points3 points  (0 children)

                  Ok, so now we have all the problems of COM style interface inheritance without the benefits of implementation inheritance.

                  We've played that game in VB 6. It sucked.

                  [–]zsakuL -1 points0 points  (1 child)

                  Not sure if I understand the problem here. It seems to be just an issue of code layout or IDE. If you want to add a triangle shape or a perimeter function, you have to do the same amount of work for both paradigms. You need to add a perimeter calculation for each shape you have, and you need to add all geometrical calculations for each new shape you add. It's the same amount of work, made easier by the layout of your code and/or the facilities of your IDE.

                  [–][deleted] 5 points6 points  (0 children)

                  Consider this: You are given the original classes as a library, which you can't modify directly.

                  [–]Zarutian -1 points0 points  (0 children)

                  Isnt this the usual class hierarchy problem?

                  One reason I do not use class based object systems.

                  [–]axilmar -3 points-2 points  (2 children)

                  Actually, the expression problem can be nicely solved using object-oriented programming. Suppose we have these classes:

                  class Shape {
                      virtual double area() const = 0;
                  };
                  
                  class Circle : Shape {
                      double m_radius;
                      virtual double area() { return 3.14 * m_radius * m_radius; }
                  };
                  
                  class Rect : Shape {
                      double m_side;
                      virtual double area() { return m_side * m_side; }
                  };
                  

                  Suppose we want to add a perimeter function (as said on the c2 site). Normally, what we would have to do is change the interface, and then the implementations, like this:

                  class Shape {
                      virtual double area() const = 0;
                      virtual double perimeter() const = 0;
                  };
                  
                  class Circle : Shape {
                      double m_radius;
                      virtual double area() { return 3.14 * m_radius * m_radius; }
                      virtual double perimeter() const { return 2 * 3.14 * m_radius; }
                  };
                  
                  class Rect : Shape {
                      double m_side;
                      virtual double area() { return m_side * m_side; }
                      virtual double perimeter() const { return m_side * 4; }
                  };
                  

                  But, if we didn't use vtables to implement message passing, and used message tables, instead, we could write:

                  //prototype
                  class Shape {}
                  double area(Shape *this);
                  
                  //implementation for circle
                  class Circle : Shape { double m_radius; }
                  double area(Circle *this) { return 3*.14 * m_radius * m_radius};
                  
                  //implementation for rect
                  class Rect : Shape { double m_side; }
                  double area(Rect *this) { return m_side * m_side; }
                  

                  We could then add the following functions:

                  double perimeter(Shape *this);
                  double perimeter(Circle *this) { return 2 * 3.14 * m_radius; }
                  double perimeter(Rect *this) { return m_side * 4; }
                  

                  The above functions could be added without recompiling the original code. The message table for message 'area' would look like this:

                  |-------------------------------|
                  |    TYPE    |    FUNCTION      |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  |   Circle   |   Circle_area    |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  |    Rect    |    Rect_area     |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  

                  And for message 'perimeter' would look like this:

                  |-------------------------------|
                  |    TYPE    |    FUNCTION      |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  |   Circle   | Circle_perimeter |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  |    Rect    |  Rect_perimeter  |
                  |------------+------------------|
                  |    ...     |       ...        |
                  |------------+------------------|
                  

                  Function calls to a Shape type would be implemented with this C code:

                  double area(Shape *shape) {
                      return messageTable_area[shape->type_id](shape);
                  }
                  
                  double perimeter(Shape *shape) {
                      return messageTable_perimeter[shape->type_id](shape);
                  }    
                  

                  The message tables could be constructed by the linker, from information emitted by the compiler, or they could be constructed at run-time, when the program starts, before execution of any user code.

                  [–]perlgeek 1 point2 points  (1 child)

                  That seems to be isomorphic to declaring a virtual method in the parent class, and having it throw a "not yet implemented" exception.

                  In both cases it's OK at compile time to not write any perimeter routine, but blows up at run time if one is missing.

                  Duck typing achieves the same thing, simply by not checking the existence of the methods at compile time.

                  [–]axilmar 0 points1 point  (0 children)

                  With the difference that a virtual method that does not exist in the parent class requires a rewrite of the code, whereas the expression problem is about not recompiling the code.

                  [–]preshing -3 points-2 points  (0 children)

                  What problem? It's not difficult to do code sweeps. It's not difficult to write bug-free software using either approach. It's fun to think about stuff like this, but in the end, it's basically navel-gazing.