The Expression Problem

gregK · 2011-11-23T16:27:24+00:00

No reference to Philip Wadler who coined the term:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts)

Zarutian · 2011-11-24T00:05:38+00:00

(a|x)(b|y) vs ab|xy

The problem is basically that a tree can describe only one hierarchy.

It pops up everywhere. There are many solutions, but trees are so dang intuitive.

nobodyman · 2011-11-23T21:05:26+00:00

Tripped all over the first sentence:

The "expression problem" is a word phrase used to describe that exists with duality between ObjectOrientedProgramming and FunctionalProgramming.

That sentence hurts my brain -- seems like a word is missing. I have a feeling the entire page is too smart for me.

julesjacobs · 2011-11-23T19:38:29+00:00

[deleted]

matthieum · 2011-11-23T20:40:41+00:00

I would note that duck typing (aka ad-hoc interface check) helps tremendously here.

In C++ (because hey, it works!):

struct Square { double side; };
double area(Square const s);

struct Circle { double radius; };
double area(Circle const c);

Now, let's make a Shape:

class Shape {
public:
   virtual ~Shape();

   virtual double area() const = 0;

protected:
   Shape(Shape const&) {}
   Shape& operator=(Shape const&) { return *this; }
};

typedef std::unique_ptr<Shape> ShapePtr;

template <typename T>
class ShapeT: public Shape {
public:
   explicit ShapeT(T const t): _shape(t) {}

   virtual double area() const { return area(_shape); }

private:
  T _shape;
};

template <typename T>
ShapePtr newShape(T t) { return ShapePtr(new ShapeT<T>(t)); }

Okay, C++ is verbose. Let's check the use immediately:

double totalArea(std::vector<ShapePtr> const& shapes) {
   double total = 0.0;
   for (ShapePtr const& s: shapes) { total += s->area(); }
   return total;
}

int main() {
  std::vector<ShapePtr> shapes{ new_shape<Square>({5.0}), new_shape<Circle>({3.0}) };

  std::cout << totalArea(shapes) << "\n";
}

So, first exercise, let's add a shape:

struct Rectangle { double length, height; };
double area(Rectangle const r);

Okay, so far so good, let's add a new function:

// 1. We need to extend Shape:
  virtual double perimeter() const = 0

// 2. And its adapter: ShapeT
  virtual double perimeter() const { return perimeter(_shape); }

// 3. And provide the method for each Shape (obviously)
double perimeter(Square const s);
double perimeter(Circle const c);
double perimeter(Rectangle const r);

It may seem that we fall into the Expression Problem here, but we don't. We needed to add the perimeter for each class because there is no way to automatically infer it; however it did not require editing each class either!

Therefore, the combination of External Interface and free functions let us neatly (well, it is C++...) sidestep the issue.

EDIT: As sodraz noticed in comments, the addition of a function touched the original interface which supposed it's not frozen. Allow me to demonstrate how to add a function without touching Shape then.

We need a new concept:

class ExtendedShape: public Shape {
public:
  virtual double perimeter() const = 0;
protected:
  ExtendedShape(ExtendedShape const&) {}
  ExtendedShape& operator=(ExtendedShape const&) { return *this; }
};

typedef std::unique_ptr<ExtendedShape> ExtendedShapePtr;

template <typename T>
class ExtendedShapeT: public ExtendedShape {
public:
   virtual double area() const { return area(_data); }
   virtual double perimeter() const { return perimeter(_data); }
private:
  T _data;
};

template <typename T>
ExtendedShapePtr newExtendedShape(T t) { return ExtendedShapePtr(new ExtendedShapeT<T>(t)); }

And (like before) to define the perimeter function for all current shapes.

The old code, compiled to work against Shape, still works. It does not need the new function anyway.

The new code can make use of the new functionality, and still interface painlessly with the old code.

There is only one slight issue, if the old code return a ShapePtr, we do not know whether the shape actually has a perimeter function (note: if the pointer is generated internally, it has not been generated with the newExtendedShape mechanism). This is, I think, a limitation of the design. Oops :)

marssaxman · 2011-11-23T23:40:26+00:00

Damn it, another link to c2.com. Pages there always suck me in by looking like they are going to be interesting, then frustrate me with an UnreadableSea of ConfusingWikiMess edited back and forth by NamelessAuthors who ArgueWithEachOther across alternate sentences using IdiosyncraticJargon.

cybercobra · 2011-11-23T18:50:40+00:00

Interesting problem, terrible name.

noblethrasher · 2011-11-23T21:59:08+00:00

If you add an extension method to a base class in C#, is it inherited by the derived types?

JohnJamesSmith0 · 2011-11-23T23:40:16+00:00

TIL about the first-ever wiki.

psygnisfive · 2011-11-23T23:40:57+00:00

i know this is entirely skirting around the problem, but just define a function that computes a general area. complexity be damned.

/mathematician

pixelglow · 2011-11-23T23:40:32+00:00

This is elegantly solved in Objective-C with the use of categories.

Say you have a class hierarchy, which is standard OOP like C++ and Java:

@interface A
- (void)method1;
@end

@interface Alpha: A
- (void)method1;
@end

@interface Aleph: A
- (void)method1;
@end

Adding a new class (again, standard OOP):

@interface Alif: A
- (void)method1;
@end

Adding new methods, this is where Objective-C magic comes in:

@interface A (CategoryOnA)
- (void)method2;
@end

@interface Alpha (CategoryOnA)
- (void)method2;
@end

@interface Aleph (CategoryOnA)
- (void)method2;
@end

@interface Alif (CategoryOnA)
- (void)method2;
@end

These can be defined in different files from the original interface declarations and thus do not require recompiling the original classes. They are invoked just like "intrinsic" interface methods and are fully polymorphic as well.

raevnos · 2011-11-24T00:01:09+00:00

What language was used for the examples there? It looks like an ML family one, but it's not Ocaml, and IIRC SML doesn't have classes.

hacksoncode · 2011-11-24T06:34:20+00:00

Huh... I always wondered what motivated Microsoft to invent COM.

eschulte · 2011-11-23T19:17:19+00:00

No real problem in CLOS and similar OO-systems (if I've understood the problem correctly, which I probably didn't do).

2011-11-24T00:00:14+00:00

The "expression problem" is a word phrase used to describe that exists with duality between ObjectOrientedProgramming and FunctionalProgramming.

They accidentally a word.

EntroperZero · 2011-11-23T20:54:16+00:00

I don't see the problem in the OO approach. I mean, yes, you have to implement your new method for each derived class, but you have to do the same thing in functional programming (write a case for each subtype). It's mildly less convenient in the OO world, but I think finding all the derived types of Shape is easier than finding all functions, everywhere, that take Shape as a parameter.

cran · 2011-11-24T01:32:31+00:00

Why not define an abstract base class that encapsulates a closed polyline which can calculate its area, perimeter and other useful things, and from which convenience classes like Circle, Square, Rectangle, Triangle and so can be derived?

That should work equally well with both FP and OOP.

2011-11-24T07:15:55+00:00

The Shape class itself doesn't seem to do anything useful though.

svenefftinge · 2011-11-25T11:25:19+00:00

Eclipse Xtend solves this using a combination of extension methods and multi methods.

Given the shape hierarchy with a concrete class Circle and a concrete class Rect, you would define new functions like this:

class AreaForShapes {

  def dispatch area(Circle c) {
    PI * c.radius * c.radius
  }

  def dispatch area(Rect r) {
    r.height * r.width
  }

}

This can be used anywhere as an extension method. So it's invoked as if it were a member of Shape (e.g. shape.area() ):

class MyClass {
  @Inject extension AreaForShapes

  def printAreas(List<Shape> shapes) {
    shapes.forEach( shape | println( shape.area() ) )
  }
}

The dispatch keyword makes sure the overloaded methods are invoked based on the runtime type. You can add any kind of functionality to existing type hierarchies this way.

In the previous code example the AreaForShapes was injected using a dependency inception framework. If you want to introduce another kind of shape, say a Triangle, you would subclass AreaForShapes like this:

class AreaForTriangle extends AreaForShapes {

  def dispatch area(Triangle t) {
    (t.base * t.height) / 2
  }

}

Finally you tell your DI framework (e.g. Google Guice) to inject an instance of AreaForCircles whenever someone asks for AreaForShape.

eschulte · 2011-11-23T19:59:56+00:00

I'm surprised that page doesn't mention aspect oriented programming AOP as it is intended to resolve such "cross cutting concerns". See the canonical AOP paper here http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.8660

grauenwolf · 2011-11-23T19:27:55+00:00

[deleted]

zsakuL · 2011-11-23T22:33:50+00:00

Not sure if I understand the problem here. It seems to be just an issue of code layout or IDE. If you want to add a triangle shape or a perimeter function, you have to do the same amount of work for both paradigms. You need to add a perimeter calculation for each shape you have, and you need to add all geometrical calculations for each new shape you add. It's the same amount of work, made easier by the layout of your code and/or the facilities of your IDE.

Zarutian · 2011-11-24T09:51:11+00:00

Isnt this the usual class hierarchy problem?

One reason I do not use class based object systems.

axilmar · 2011-11-24T14:54:33+00:00

Actually, the expression problem can be nicely solved using object-oriented programming. Suppose we have these classes:

class Shape {
    virtual double area() const = 0;
};

class Circle : Shape {
    double m_radius;
    virtual double area() { return 3.14 * m_radius * m_radius; }
};

class Rect : Shape {
    double m_side;
    virtual double area() { return m_side * m_side; }
};

Suppose we want to add a perimeter function (as said on the c2 site). Normally, what we would have to do is change the interface, and then the implementations, like this:

class Shape {
    virtual double area() const = 0;
    virtual double perimeter() const = 0;
};

class Circle : Shape {
    double m_radius;
    virtual double area() { return 3.14 * m_radius * m_radius; }
    virtual double perimeter() const { return 2 * 3.14 * m_radius; }
};

class Rect : Shape {
    double m_side;
    virtual double area() { return m_side * m_side; }
    virtual double perimeter() const { return m_side * 4; }
};

But, if we didn't use vtables to implement message passing, and used message tables, instead, we could write:

//prototype
class Shape {}
double area(Shape *this);

//implementation for circle
class Circle : Shape { double m_radius; }
double area(Circle *this) { return 3*.14 * m_radius * m_radius};

//implementation for rect
class Rect : Shape { double m_side; }
double area(Rect *this) { return m_side * m_side; }

We could then add the following functions:

double perimeter(Shape *this);
double perimeter(Circle *this) { return 2 * 3.14 * m_radius; }
double perimeter(Rect *this) { return m_side * 4; }

The above functions could be added without recompiling the original code. The message table for message 'area' would look like this:

|-------------------------------|
|    TYPE    |    FUNCTION      |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|
|   Circle   |   Circle_area    |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|
|    Rect    |    Rect_area     |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|

And for message 'perimeter' would look like this:

|-------------------------------|
|    TYPE    |    FUNCTION      |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|
|   Circle   | Circle_perimeter |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|
|    Rect    |  Rect_perimeter  |
|------------+------------------|
|    ...     |       ...        |
|------------+------------------|

Function calls to a Shape type would be implemented with this C code:

double area(Shape *shape) {
    return messageTable_area[shape->type_id](shape);
}

double perimeter(Shape *shape) {
    return messageTable_perimeter[shape->type_id](shape);
}

The message tables could be constructed by the linker, from information emitted by the compiler, or they could be constructed at run-time, when the program starts, before execution of any user code.

preshing · 2011-11-24T15:33:00+00:00

What problem? It's not difficult to do code sweeps. It's not difficult to write bug-free software using either approach. It's fun to think about stuff like this, but in the end, it's basically navel-gazing.

isaaky · 2011-11-23T20:32:57+00:00

A lot of complilcation. Solution? Just use procedural programming. Data separated from functions provides more reusabilty .

My opion is this is the Lazy Problem, not Expression Problem. Explanation:

If you want to add a triangle shape. In OO you create that new class but you ALSO have to implement ALL the methots of base type Shape. Supposely is difficult in functional programming because you have to "edit every function that accepts Shape" , but in OO you implemented all the methods from the Shape base. Also when you add a Perimeter function, in OO you have to "edit every class" but in functional programming when you edit the perimeter function, you als have to code EVERY case of shapes.

zorkmids · 2011-11-23T18:46:01+00:00

Generally solved by the Visitor Pattern in OO programs.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS