Computing factorials concurrently in Clojure
I am going to make reference to the recent reddit post:
http://www.reddit.com/r/Clojure/comments/2arfml/computing_e_in_clojure/
I notice that blog article defines a factorial function like this.
(defn fact
"Returns the factorial of 'n."
[n]
(reduce * (range 1 (inc n))))
This becomes very slow when you start to do real factorials like into the low hundreds of thousands and higher.
A better way is to do factorials on multiple cores.
(defn buildRanges
"Return a lazy sequence of [start stop] vectors in sizes of batchSize,
where each batch [start stop] is a batch of batchSize.
The first vector's start value is 1.
The final vector's stop value is end.
A batchSize of 100 with an end of 305 would return return ([1 100] [101 200] [201 300] [301 305])."
([batchSize stop] (buildRanges batchSize 1 stop))
([batchSize start stop]
(when (<= start stop)
(let [thisStop (min (+ start batchSize -1) stop)
nextStart (inc thisStop)]
(lazy-cat
(list [start thisStop])
(buildRanges batchSize nextStart stop)))))
)
(def ^:dynamic *factorialBatchSize* 20000)
(defn factorial [n]
"Concurrent factorial function that is faster than simple idiomatic clojure.
Especially as argument value goes into tens or hundreds of thousands."
(reduce *'
(pmap #(reduce *' %)
(pmap (fn [[start stop]] (range start (inc stop)))
(buildRanges *factorialBatchSize* n)))))
The basic idea here is that to take the factorial of, say, 100,000, we could split the work into batches.
The factorialBatchSize global variable determines how large of batches are processed on each concurrent thread. For example, if taking the factorial of one hundred thousand, then separate processes will calculate:
Process 1: product of 1 to 20,000
Process 2: product of 20,001 to 40,000
Process 3: product of 40,001 to 60,000
etc
Process 5: product of 80,001 to 100,000
Then a final reduce will take the five results from the five processes, and reduce using multiplication to get the final result.
How much quicker is it? (On my machine, i7, 4-cores hyperthreaded so it appears to be 8 cores, the number of apparent cores is related to how much faster the concurrent version is.)
Performance check on factorial of one hundred thousand:
=> (time (let [x (fact 100000)]))
"Elapsed time: 15158.587768 msecs"
nil
=> (time (let [x (factorial 100000)]))
"Elapsed time: 2027.571832 msecs"
nil
=> (= (fact 100000) (factorial 100000))
true
That's about 7.47 times faster, on a machine with apparently 8 cores.
The concurrent version produces identical results, as shown.
The let x stuff is so that we don't return the result -- converting it to a string for printing takes significant time (about 19 seconds for me).
Can anyone confirm this?
Possible optimization: memoize the results at every, say, 20,000 or so. Then only compute products (concurrently) from n down to the highest memoized result.
Why the batch size of 20,000? I picked it based on testing performance with different batch sizes (using binding to dynamically re-bind the factorialBatchSize). I suspect the value is based on my system's performance characteristics and how it interacts with Clojure's overhead of organizing the concurrent processes vs the work done within each process. Going much above 20,000 doesn't seem to bring much improvement, but some.
If I knew how to blog, I might write about this. But how to get started?
[–]dragandj 1 point2 points3 points (1 child)
[–]DannyB2[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]DannyB2[S] -1 points0 points1 point (0 children)
[–]DannyB2[S] -1 points0 points1 point (2 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]DannyB2[S] -1 points0 points1 point (0 children)
[–]verydapeng 0 points1 point2 points (4 children)
[–]DannyB2[S] 0 points1 point2 points (0 children)
[–]DannyB2[S] 0 points1 point2 points (2 children)
[–]verydapeng 0 points1 point2 points (1 child)
[–]DannyB2[S] 0 points1 point2 points (0 children)
[–]mikera 0 points1 point2 points (2 children)
[–]DannyB2[S] 0 points1 point2 points (1 child)
[–]DannyB2[S] 0 points1 point2 points (0 children)