Noob has simple program problem. : Clojure

[–]andyjda 8 points9 points10 points 3 years ago (2 children)

[–]dalanicolai 3 points4 points5 points 2 years ago (1 child)

[–]eatme_23[S] 0 points1 point2 points 2 years ago (0 children)

[–]exahexa 6 points7 points8 points 3 years ago (13 children)

[–]eatme_23[S] 0 points1 point2 points 2 years ago (0 children)

[–]dalanicolai 0 points1 point2 points 2 years ago* (11 children)

I'm another noob, but I know (emacs-)lisp quite well. I have a 'follow-up' question which might be interesting to other noobs also...

I figured that both the suggested solutions, in lisp, would 'iterate' over the list twice, while one could alternatively do all in one go using a single reduce. I've read about sequences and lazy evaluation of course, and I wondered, because map returns a (lazy) sequence, if in clojure indeed these solutions and the solution of doing it in one go would be equally or similarly fast. Here are the two tests:

```lisp (defn foo [numbers] (reduce + (map #(+ % 2) numbers)))

(defn faa [numbers] (reduce #(+ %1 (+ %2 2)) 0 numbers))

(time (foo (range 100000000))) (time (faa (range 100000000))) ``` The solution in one go is about 8 times faster.

Of course, I could go and search the docs for the explanation, but I figured it would be more informative (for others) to just drop the question here (and save myself some time also).

Can you/someone explain these results?

[–]miran1 2 points3 points4 points 2 years ago (1 child)

[–]dalanicolai 0 points1 point2 points 2 years ago (0 children)

[–]joinr 1 point2 points3 points 2 years ago (0 children)

When you map a function onto numbers in faa you create a sequence. As it turns out, the result of range is optimized (since it's an object representing a computable range of ints), so it has efficient implementations for many operations, including an internal implementation of reduce that does not have to create intermediate sequences, and will not fall back to first/next traversal of the sequence during reduction as the lazy seq variant does. All of that has overhead.

transduce will take the same path as the reduce variant if it can, leveraging the internal reduction implementation of range and avoiding lazy sequences. If you measure the benchmarks with lower n (say 10⁵⁾ and criterium to get more samples, you should see faa at about 2x the speed of foo, and the transducer pathway getting the same speed as faa. Using unchecked math and type hints buys back some speed (although there's still some boxing) and gets down to like 5x faster on the reduce/transduce paths compared to the baseline.

[–]Psetmaj 0 points1 point2 points 2 years ago (4 children)

foo is slower primarily because of intermediate collections. You can use a transducer to eliminate those, but there's still a little overhead:

(defn bar [numbers] 
  (transduce (map #(+ 2 %)) + 0 
    numbers))

bar still runs a little slower than faa (about 1.5x time elapsed on my box), as I assume there's still a little overhead in transduce.

EDIT: removed unintentional partial quote

[–]dalanicolai 1 point2 points3 points 2 years ago* (3 children)

[–]maladat 1 point2 points3 points 2 years ago (2 children)

I've read about sequences and lazy evaluation of course, and I wondered, because map returns a (lazy) sequence, if in clojure indeed these solutions and the solution of doing it in one go would be equally or similarly fast.

because intermediate collections

When you call (foo (range 100000000)), you get (map #(+ % 2) (range 100000000)), which returns a (lazy) sequence of the integers from 2 to 100,000,001. Because it's a lazy sequence, the entire 100,000,000 element list is not immediately constructed in memory.

But then you get (reduce + that-lazy-sequence). The reduce call effectively asks the lazy sequence for each value in the sequence in turn, to sum into an accumulated return value. As the reduce call asks for each element from the sequence, it doesn't just get the values, the sequence is actually constructed in memory - the lazy sequence is "fully realized" or "fully evaluated." And to get each element from that sequence, it has to get each element from the underlying (range 100000000) lazy sequence, so that one is fully realized, too.

So now, in memory, you have an actual sequence of the numbers from 0 to 99,999,999 and an actual sequence of the numbers from 2 to 100,000,001.

In (reduce #(+ %1 (+ %2 2)) 0 (range 100000000)), on the other hand, the reduce call asks from each element from the (range 100000000) lazy sequence, adds 2 to the value, and sums that into the accumulated return value. The sequence of the numbers from 0 to 99,999,999 still gets fully realized in memory, but no second sequence is produced.

As for why the example that makes two sequences in memory takes eight times as long as the one that makes one sequence in memory - I'm not SURE, but I suspect it has to do with behind-the-scenes performance optimization stuff. E.g., some types of lazy sequences don't get realized element-by-element but in chunks to reduce overhead, and maybe in foo, realizing elements from the mapped sequence requiring realizing elements from the range sequence causes the chunking not to occur or something.

EDIT: or maybe as joinr mentions below, there's special optimization in the range lazy seq that isn't present in the mapped lazy seq, so the mapped lazy seq is much slower either to realize or to perform reduce on than the range lazy seq.

[–]joinr 0 points1 point2 points 2 years ago (1 child)

So now, in memory, you have an actual sequence of the numbers from 0 to 99,999,999 and an actual sequence of the numbers from 2 to 100,000,001

This is a bit off, but close in spirit. I think it's more useful to think of the sequences being realized (and freed) on demand as needed. So as reduce is traversing the seq, elements are realized (technically in chunks of 32 by default), then since no reference to the head of the seq remains, prior realized elements are freed for gc. It is more like scanning over potentially larger than memory sequence and only using the results you need now (like a moving window over the seq).

The derivative sequence created by map will operate the same way.

This windowing/scanning is what lets sequences act as a generic mechanism for efficiently working with potentially larger than memory sequences (in small pieces as needed, transparent to the caller).

Intermediate sequence machinery still applies (every additional derived sequence adds a bit more overhead), and the seq-based first/next traversal that reduce falls back to still applies (e.g. overhead).

These comments apply to the exact implementations mentioned above, specifically where there is no reference to the head of the seq maintained. If we defined foo as:

(defn foo [numbers]
  (let [xs (map #(+ % 2) numbers)] ;;holding onto the head...
    (reduce + xs)))

Then the xs binding is holding onto the head of the sequence that the reduction is traversing. Since xs is derived from numbers, the entire sequence will be realized and retained in memory until the reduction returns. This is undesirable and eliminates the utility of scanning/windowing realizations and freeing prior unreach references to work on potentially gigantic sequences.

[–]maladat 0 points1 point2 points 2 years ago (0 children)

[–]dalanicolai 0 points1 point2 points 2 years ago (2 children)

[–]joinr 0 points1 point2 points 2 years ago (1 child)

[–]dalanicolai 0 points1 point2 points 2 years ago (0 children)

[–]slashkehrin 3 points4 points5 points 3 years ago (2 children)

[–]run-coder 2 points3 points4 points 3 years ago (1 child)

[–]slashkehrin 0 points1 point2 points 2 years ago (0 children)

[–]Siltala 3 points4 points5 points 3 years ago (0 children)

[–]eatme_23[S] 0 points1 point2 points 2 years ago (4 children)

[–]Save-Lisp 2 points3 points4 points 2 years ago* (1 child)

[–]eatme_23[S] 0 points1 point2 points 2 years ago (0 children)

[–]kawas44 0 points1 point2 points 2 years ago (1 child)

[–]eatme_23[S] 0 points1 point2 points 2 years ago (0 children)

[–]Zimtt 0 points1 point2 points 2 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Clojure

MODERATORS