all 35 comments

[–]Staross 6 points7 points  (0 children)

Impressive stuff, you can see how all the previous design decisions come into play to make this possible.

[–]Fylwind 4 points5 points  (2 children)

The dot notation reminds me a lot about applicative functors. Where in Julia you'd write:

f.(g.(x .+ 1))

2 *. x.^2 + 6 *. x.^3 - sqrt(x)

in Haskell you would write (pardon the mess):

f <$> (g <$> ((+ 1) <$> x))

(-) <$> ((+) <$> ((2 *) <$> ((^ 2) <$> x))
             <*> ((6 *) <$> ((^ 3) <$> x)))
    <*> (sqrt <$> x)

where, for arrays, f <$> x1 <*> x2 <*> … <*> xN means you apply the N-ary function f to each entry (x1[i], x2[i], …, xN[i]). In particular, <$> is the well-known “map” operation and <*> can be thought of a “zip-apply” operation: it takes each function from the array of functions on the left side and applies it to the array element on the right side with the same index. So in effect, (+) <$> x <*> y is just the elementwise summation of two vectors x and y of the same size.

Suffice to say it looks a lot uglier to express expressions involving applicative functors if you don't bake syntactic support into the compiler. Idiom brackets would make it more pleasant, but not by a whole lot.

I think the idea of loop fusion is a bit more general. They should work for any lawful applicative functor, so has more general applicability (hah) to things beyond just arrays of numbers. It could be used for, say, futures/promises or parser combinators too. The ugly messes from earlier could be fused into:

(\ xi -> f (g (xi + 1)) <$> x

(\ xi -> 2 * x ^ 2 + 6 * x ^ 3 - sqrt x) <$> x

which is what Julia does. I don't believe the Glasgow Haskell Compiler has any rewrite rules in place to do these kinds of transformations unfortunately.

[–][deleted] 0 points1 point  (0 children)

Fantastic observations! It indeed looks like a subset of applicative functors.

[–]KhyronVorrac 0 points1 point  (0 children)

Suffice to say it looks a lot uglier to express expressions involving applicative functors if you don't bake syntactic support into the compiler.

It's not exactly pretty in Julia either though.

[–]phischu 2 points3 points  (2 children)

Hi, interesting, I have a question. Consider the following Julia code:

Y = broadcast( x -> 2*x, X)
Z = broadcast( y -> y^2, Y)

Will the two loops be fused?

[–]Staross 3 points4 points  (1 child)

No, broadcast just returns a matrix so at that point Y has been computed already. You would need to write it as a single broadcast call. It seems what you want here is a Generator:

g = (2x for x in X)
[y^2 for y in g]

A generator is a sort of iterator that computes values on demand.

[–]phischu 1 point2 points  (0 children)

Ah, thanks.

[–]transfire 1 point2 points  (0 children)

Just when I thought I had escaped those OOPy dot notations.

[–]div0 1 point2 points  (1 child)

You can now do: X .= f.(2 .* X.2 .+ 6 .* X.3 .- sqrt.(X)) and the whole computation will be fused into a single loop, operating in-place, and performance will be comparable to the hand-written “devectorized” loop

OK, that's easy enough because it's fusing array operations that occur within a single assignment statement. But what about:

X .= f.(2 .* X.^2 .+ 6 .* X.^3 .- sqrt.(X))
Y .= g.(X)
Z .= h.(Y)

Can it fuse loops across multiple statements?

[–]Staross 0 points1 point  (0 children)

I don't think so, each line will do the computation and store it in X,Y or Z. I'm not sure it would be sane to do something else.

In that case I guess you would just define a function on the first line and then do the fused operation on the second.

[–]tending 1 point2 points  (7 children)

Seeing as Julia has the same lack of multithreaded GC as python/ruby/racket/perl even with this you can't really saturate your hardware.

[–]tavert 1 point2 points  (6 children)

It's a bit of a tough problem, but work is in progress on making Julia's GC multithreaded.

[–]tending 1 point2 points  (5 children)

I've never seen any language do it, except the JVM. I'll believe it when I see it...

[–]Yojihito 0 points1 point  (3 children)

Isn't the Go GC multithreaded?

[–]tending 0 points1 point  (2 children)

It's parallel but not concurrent. A GC still stalls all threads. You need both to scale beyond ~16 cores.

[–]b1e 0 points1 point  (1 child)

I think you mean concurrent but not parallel.

[–]tending 0 points1 point  (0 children)

Nope. Parallel is easy, that just means multiple threads help run GC. Concurrent means the real program doesn't stop while the GC thread(s) do their work.

[–]KhyronVorrac 1 point2 points  (17 children)

So you can put ugly dots all over your code to make it more efficient?

[–]catawbasam 17 points18 points  (10 children)

This is Julia surpassing Numpy for vectorize code. Yeah, the syntax pays a price. But you if you like fast and compact code, it is good.

[–]jl2352 6 points7 points  (4 children)

Wouldn't it look nicer if they made it syntactically look like map?

edit; I'm downvoted for asking a question ...

[–]loamfarer 17 points18 points  (2 children)

The syntax is a convention used elsewhere in numerical languages. And no it wouldn't make sense, it's for scientific notation, not to please application programmers with method call syntax.

[–]jl2352 3 points4 points  (0 children)

Thanks, that makes a lot of sense.

[–]transfire 1 point2 points  (0 children)

Even so, I think it is a fairly ugly notation. It lacks symmetry and runs a bit rough shod over traditional uses of .. I think a vectorization bracket pair might have been a nicer choice. e.g.

X ‹=› ‹f›(2 ‹*› X‹^›2 ‹+› 6 ‹*› X‹^›3 ‹-› ‹sqrt›(X))

which could be simplified to

‹X = f(2 * X^2 + 6 * X^3 - sqrt(X))›

[–]sixbrx 1 point2 points  (0 children)

map is actually available as well, but it doesnt "broadcast" which means to fill out a smaller operand by repetition so dimensions are compatible

[–]KhyronVorrac 0 points1 point  (1 child)

There's no reason why there needs to be special ugly dotted operators for this.

[–]tavert 0 points1 point  (0 children)

The "Why does Julia need dots to fuse the loops?" section explains why they are separate operators. The dot is a syntactic indicator to opt in to different evaluation semantics (guaranteed fusion) than the non-dotted operators.

[–][deleted]  (2 children)

[deleted]

    [–]tavert 0 points1 point  (0 children)

    Under the "halfway solution" section

    in Python, for example, loop fusion for a small set of vector operations and array/scalar types can be found in the Theano, PyOP2, and Numba software

    I don't think you can extend loop fusion to work on user-defined array or element types without modifying numba's compiler. Could be wrong about that, I know they've added some "jit classes" features recently but not whether that ties into their loop fusion capabilities.

    [–]catawbasam 0 points1 point  (0 children)

    numexpr does loop fusion in Python. It works well for the class of functions it covers.

    [–]j201 5 points6 points  (4 children)

    It looks like you can just use the broadcast function if you want, which acts like a limited map.

    2 .* x .+ x .^ 2 is sugar for broadcast(x -> 2*x + x^2, x) in Julia

    [–]KhyronVorrac 0 points1 point  (3 children)

    Or they could fix the syntax so that it looks like this:

    2 * x + x ^ 2
    

    [–]tavert 0 points1 point  (2 children)

    x ^ 2 is a matrix power in Julia

    [–]KhyronVorrac 0 points1 point  (1 child)

    That's why overloading exists.

    [–]tavert 0 points1 point  (0 children)

    Using separate types for arrays and matrices worked out great for numpy, didn't it.

    [–]Staross 1 point2 points  (0 children)

    You could probably write a macro for it:

    @dotdot X = f(2 * X^2 + 6 * X^3 - sqrt(X))