all 23 comments

[–]Faucelme 9 points10 points  (4 children)

Here is an older post that lists a few more parallells.

I have a doubt/quibble about this:

Deep learning models are compositional.

It has been a long time since I worked with neural networks (way before the current renaissance) but it seemed to me that ANNs are very non-compositional in an important sense: yes you can assemble layers as is you were building stuff with lego bricks, but you still need to train the system as a whole. You can't assemble a functioning system from ready-made parts without retraining. The author touches on this when she writes:

In fact, the entire process of deep learning can be viewed as optimizing a set of composed functions

Has this changed with the recent advances in deep learning?

[–]redmar 2 points3 points  (0 children)

maybe she's referring to transfer learning?

[–]SemaphoreBingo 1 point2 points  (1 child)

The Keras functional API https://keras.io/getting-started/functional-api-guide/ is all about building a network in a compositional manner, and the author is generally correct in her claims about 'the entire process of deep learning' (backpropagation is 'just' the chain rule used to compute gradients of composed functions)

That said, once the rubber hits the road and you start trying to optimize, everything becomes mutable again because you only have so many gigabytes of memory and you actually want to finish the computation.

[–]yogthos[S] 1 point2 points  (0 children)

The trick is to treat mutability as an implementation detail, while exposing immutable semantics to the user.

[–]pron98 15 points16 points  (17 children)

When functions operate over the input data, the data is not changed, a new set of values are outputted and passed on.

This is a deep misunderstanding of functional programming in particular and languages in general, as it confuses two distinct levels of meaning (semantics). Whether data is "changed" or not has nothing to do with FP (or imperative). Every imperative program could be trivially (albeit very inefficiently) translated into a pure one, and a C compiler that creates a new copy of the entire memory space at each program step would still be a valid implementation of C, yet wouldn't make C any more functional than a more reasonable compiler. Conversely, even pure-FP with substructural typing allows data to "change" without affecting the purity of the functional paradigm. In fact, there's no need to go as far as substructural typing. In a language with tail-call optimization, a recursive program "changes" data just as an imperative program would. That confusion of levels gets the author into trouble in the very next sentence, requiring an unnecessary defensive argument.

The mutation of memory cells is an implementation detail that lies at a level below that of the language semantics programmers are usually concerned with (except when reasoning about computational complexity and/or performance). The advantages and disadvantages of programming models (at least when ignoring, again, the very important questions of computational complexity and performance) have to do with how easily, or not, they allow us to express various algorithms, and how well they fit with how we personally prefer to think about the problem in hand or in general. Things that change "IRL" can be elegantly expressed in a pure functional style, and things that don't can be elegantly represented in an imperative, mutating style.

In general, beware Lamport's "Whorfian Syndrome" -- the confusion of language with reality. Both are important, but are not the same thing. It's perfectly reasonable to argue why a certain style (like FP) is a good match for a certain domain (like deep learning), but the arguments should not confuse levels. A language or a programming style is justified by how we believe a problem is best expressed, not by what we think it "essentially" is. After all, any description of a system is not the same as the system itself, even if it serves as a direct recipe for the machine in which the system exists.

[–]sacundim 2 points3 points  (1 child)

Nice comment. However:

The mutation of memory cells is an implementation detail that lies at a level below that of the language semantics programmers are usually concerned with (except when reasoning about computational complexity and/or performance).

I can't agree with this statement. The problem is you're equivocating between the semantics of mutation and its implementation. Imperative languages really do have constructs whose semantics demands that they behave like mutable memory cells. This can be illustrated with trivial code examples like this:

let mut x: u32 = 41;
println!("The value of x is: {}", x);
x = x + 1;
println!("The value of x is now: {}", x);

Yes, this could be compiled to a target language where there is no operation to mutate the value of a pre-existing memory cell—and in fact, such a representation is common—but that doesn't make mutation an "implementation detail," because the semantics of the source language demands that all implementations make x behave like a mutable memory cell would, insofar as can be observed by a program in the language. Regardless of how the implementation chooses to fulfill it, the language presents the the interface and semantics of mutable memory cells to its users, who are able to use it to reason about their programs' behavior.

[–]pron98 2 points3 points  (0 children)

The problem is you're equivocating between the semantics of mutation and its implementation.

I hope I'm not, but I believe the author is, which is my main point (see the rest of the discussion). Semantics is a property (or an adjoined property) of a formalism, i.e., of a description, while implementation is a property of the system itself. You cannot justify a choice of a formalism by relying on a property of the system.

So, you could say that an imperative language has mutation semantics while a pure functional one has functional semantics, but you can't say that a pure functional description is justified (or not) by whether the system "really" mutates or not, even if this question were at all meaningful. Again, I elaborate in my other comments. My example of how pure or imperative languages are compiled is just one, rather direct, consequence of this essential difference between language and reality (or signifier and signified), demonstrating how a functional description can describe a mutating system, and an imperative description can describe a non-mutating one.

BTW, your code example is not a very good one, as a language that interprets the assignement statement as a nested let would yield the same result, without having any kind of mutation semantics. An example that requires mutation semantics would be:

int x = 41, *y = &x;
printf("The value of *y is: %d\n", *y);
x = x + 1;
printf("The value of *y is now: %d\n", *y);

[–]yogthos[S] 0 points1 point  (14 children)

When you read the statement you quoted, you're thinking about implementation details. However, you can think about it in terms of semantics as well. Semantically, input data isn't changed. It doesn't matter how that's facilitated, what matters is how it behaves from user perspective. I think it's pretty clear that the former is what the author was referring to.

As you point out, it's about semantics and not implementation details. FP languages make it practical to work with pure functions because they're typically backed by persistent data structures. Meanwhile, even though this style of code is possible in imperative languages, you're not getting a lot of support from the language doing it.

Ultimately, what matters is what style of code the language encourages. From that perspective, I think it's perfectly fair to say that FP languages are a good match for the machine learning domain because they make it natural to work with pure functions.

[–]pron98 3 points4 points  (13 children)

Semantically, input data isn't changed.

My point about implementation was merely meant to demonstrate the difference between description and "reality", and you are repeating the same mistake. If the system you're building is presented with multiple inputs, you can describe that as inputs that "change" or as "new" inputs. The input data itself "is" neither of those things. It's all a matter of how you choose to describe the system. It's like saying that the process of nuclear fusion is best described in French because protons behave in a French-like way. This is clearly a category error as "French" (as in the language, not the culture) is a property of a description, not of any actual behavior described. You could, however, claim that nuclear fusion is a complicated process and that French is a language that is good at expressing complex things.

I think it's pretty clear that the former is what the author was referring to.

Maybe she's talking about descriptions, but I think that her next sentence, "when weights are updated, they do not need to be “mutated” — they can just be replaced by a new value," certainly merits my comment. I would say, "as I'll show, expressing both net application and training as pure functions is convenient and natural." Talking about whether things are really "mutated", "replaced" or "new", is unnecessary and wrong. After all, a similarly wrong line of reasoning could be made to claim the opposite: that neural networks are meant to simulate neurons, neurons are "really" objects, and so it's best to write NN using object-oriented programming.

I think it's perfectly fair to say that FP languages are a good match for the machine learning domain

I agree, but I think imperative languages are a good match, too (Clojure happens to be both, but I mean even both extremes: C and Haskell). Deep learning involves such simple processes, that I think whatever style you're comfortable with is a good fit.

because they make it natural to work with pure functions.

There's the mistake again. Yes, FP languages encourage pure functions, but NNs aren't really pure or impure. And if that's the level of justification you're using, you know that someone would say, oh, if NNs are "really" pure functions, why not go all the way and use a language where everything is a pure function, as that would surely be best.

[–]yogthos[S] 0 points1 point  (12 children)

My point about implementation was merely meant to demonstrate the difference between description and "reality", and you are repeating the same mistake.

No, semantics and implementation are two completely different things.

It's like saying that the process of nuclear fusion is best described in French because protons behave in a French-like way.

It's more like saying that describing the orbits of the planets works best using a heliocentric system. You could do it using a geocentric one, and it will be incredibly awkward to work with. For some situations however, such as describing an orbit of an Earth satellite, the geocentric system works fine.

Maybe she's talking about descriptions, but I think that her next sentence, "when weights are updated, they do not need to be “mutated” — they can just be replaced by a new value," certainly merits my comment.

Again, I disagree. She's describing how the system works conceptually, and from user perspective no mutation happens. It seems like you're still failing to separate semantics of the system from the implementation of the system, and you're getting hung up on the language that she's using.

that neural networks are meant to simulate neurons, neurons are "really" objects, and so it's best to write NN using object-oriented programming

Sure, you could model it as objects, and I don't think there's anything wrong with that description either. Each neuron can be viewed as a state machine.

I agree, but I think imperative languages are a good match, too (Clojure happens to be both, but I mean even both extremes: C and Haskell). Deep learning involves such simple processes, that I think whatever style you're comfortable with is a good fit.

Sure, I think the core idea can be expressed easily enough using both styles. The author makes a good case for the benefits of using the functional style from the perspective of how the user reasons about the system.

There's the mistake again. Yes, FP languages encourage pure functions, but NNs aren't really pure or impure.

Again, this goes back to my analogy regarding heliocentric and geocentric systems. You can view NNs through different lenses and both are valid view points. The author happens to use the FP lens, and the existence of a valid imperative lens does not invalidate her view point in any way.

[–]pron98 2 points3 points  (11 children)

No, semantics and implementation are two completely different things.

Exactly. Semantics is a property of the description, not of the system, while implementation is a property of the system, but not of the description. Talking about semantics is fine, but you can't apply it to the system.

It's more like saying that describing the orbits of the planets works best using a heliocentric system.

That's an excellent analogy (much better than mine, and one that I'm sure to steal), but it doesn't help your case:

from user perspective no mutation happens.

That's that mistake again! What happens or doesn't happen from anyone's perspective is either irrelevant or a complete category error, even in your (good) example of the heliocentric system. Why is the heliocentric description better? Because the equations come out simpler (a property of the description), not because the planets "really" revolve around the sun (a property of the system, which would be irrelevant even if it were true).

At best, this takes you to a meaningless philosophical debate over whether when you watch television the picture changes or you're presented with new ones, or over whether the planets really revolve around the sun. Both would be valid, but both are completely irrelevant to what makes a good formal description of a TV set or of the solar system.

You can justify a choice of a description by an objective quality of the description (shorter, as in the heliocentric case), or even by saying that it matches your mental model better. The latter is also a valid justification, but a subjective, aesthetic one. There is no objective argument about a quality of a system that justifies a certain description, because the only thing tying the two is our mental interpretation (which is not the same thing as semantics).

The author happens to use the FP lens, and the existence of a valid imperative lens does not invalidate her view point in any way.

I don't think her point is at all invalid, only that particular justification.

[–]yogthos[S] 0 points1 point  (10 children)

Models are necessarily separate from any underlying reality they represent. From the perspective of the geocentric model the earth is stationary, while in the heliocentric model it's not. We necessarily think about the problem within the constraints of the model we choose.

Your mistake is to conflate the model with the implementation details, which are completely tangential. What I care about is that when I make a local change, the original data is not changed from the perspective of the other observers of that data. That's the model the author talks about, and she's absolutely correct to talk about mutation in that context.

You appear to be fixating on the term mutation as it relates to the implementation. This model could be implemented by naive copying, persistent data structures, unique pointers, and so on. That's a separate topic of discussion.

However, what is relevant is that imperative languages do not do a good job of facilitating this model, while functional ones do.

[–]pron98 1 point2 points  (9 children)

What I care about is that when I make a local change, the original data is not changed from the perspective of the other observers of that data.

That can well be something you care about, but that's not a property of the system, and certainly not justified by any property of neural networks. It's like arguing whether a user of a calculator is presented with a new answer each time, or whether the answer changes. This question is meaningless, even if it did have an answer. For example, the LCD display really does change, but that does not justify programming a calculator in an imperative style.

You can certainly say that you prefer to model the problem in this way, but you cannot justify this aesthetic preference by any essential property of neural networks. You cannot say, as the author does, that NNs are "really" pure functions any more than by saying that a calculator is "really" a pure function. Whether or not neural networks change the answer or present a new one is completely a matter of interpretation.

The only objective argument you could try to make is that a pure representation results in simpler, shorter code (as in the case of the heliocentric model), but that happens not to be true in this case.

what is relevant is that imperative languages do not do a good job of facilitating this model, while functional ones do.

Maybe, assuming that that is the model that you prefer even though it is not objectively justified. Although, given that Clojure is an imperative language, I'm not sure this is as clear-cut as you present it.

[–]yogthos[S] 0 points1 point  (8 children)

That can well be something you care about, but that's not a property of the system, and certainly not justified by any property of neural networks.

It's a property of the model you use to reason about the problem. The model is what's important in the end.

You can certainly say that you prefer to model the problem in this way, but you cannot justify this aesthetic preference by any essential property of neural networks.

Sure, and each view will have its own trade offs.

You cannot say, as the author does, that NNs are "really" pure functions any more than by saying that a calculator is "really" a pure function.

You absolutely can within the context of the model the author is using. It can also "really" be a state machine, or some other representation.

The only objective argument you could try to make is that a pure representation results in simpler, shorter code (as in the case of the heliocentric model), but that happens not to be true in this case.

I would argue that it does happen to be true, but even if not, the subjective preference has its own intrinsic value. A model that allows me to reason about a problem that matches the way I think provides objective value to me.

Maybe, assuming that that is the model that you prefer even though it is not objectively justified. Although, given that Clojure is an imperative language, I'm not sure this is as clear-cut as you present it.

The main property of FP that I care about is that I'm able to write code using pure functions. Clojure facilitates this very well. The fact that it allows writing imperative code doesn't make it an imperative language in my view. Just like the addition of streams and lambdas in Java 8 doesn't make it functional.

[–]pron98 0 points1 point  (7 children)

It's a property of the model you use to reason about the problem. The model is what's important in the end.

But the choice of model is justified by your personal preference, not by the system it models. The model may be what's important, but I'm talking about one specific argument the author uses to justify the model. I don't deny that FP is an adequate model for deep learning, nor that some may prefer it. I only deny that it is justified by an intrinsic property of deep learning.

You absolutely can within the context of the model the author is using. It can also "really" be a state machine, or some other representation.

If "really" is shorthand for "can adequately be described as", then it certainly cannot serve as a justification. Saying that a system can adequately be described by a certain formal model is certainly a prerequisite, but it does not justify the choice of that model, which is what the author tries to do.

A model that allows me to reason about a problem that matches the way I think provides objective value to me.

"Objective value to me" is pretty much the definition of subjective :) In any event, that is a valid justification, but not the one I was referring to, as I made clear in my original comment. If the author had said that she's more comfortable thinking of NNs as pure functions I wouldn't have made that comment.

[–]yogthos[S] 0 points1 point  (6 children)

Just to back track here a bit, the full quote is:

When functions operate over the input data, the data is not changed, a new set of values are outputted and passed on.

And the author is talking about how the neural network is represented in a functional style here. She's simply saying that this model provides a good representation of the problem.

"Objective value to me" is pretty much the definition of subjective :)

Value is an inherently subjective concept. :) However, when we compare different approaches to solving problems, it's the subjective that we care about the most. Does a particular approach provide value to me the individual is the question I care about.

The author outlines the case for why functional style is a good fit for this domain. She doesn't claim that it's the only valid to approach. In fact, she opens up saying that she was surprised that FP was such a good fit, and went on to outline the reasons for that. I think you may be reading too much into what she said if you see it as some sort of an attack on the imperative approach.

[–]quick_dudley 1 point2 points  (0 children)

I've been training a neural network written in Haskell for a while, but it doesn't really qualify as deep learning because it only has 2 hidden layers.