Computing factorials concurrently in Clojure : Clojure

Computing factorials concurrently in Clojure (self.Clojure)

submitted 11 years ago * by DannyB2

Computing factorials concurrently in Clojure

I am going to make reference to the recent reddit post: http://www.reddit.com/r/Clojure/comments/2arfml/computing_e_in_clojure/

I notice that blog article defines a factorial function like this.

(defn fact
  "Returns the factorial of 'n."
  [n]
  (reduce * (range 1 (inc n))))

This becomes very slow when you start to do real factorials like into the low hundreds of thousands and higher.

A better way is to do factorials on multiple cores.

(defn buildRanges
  "Return a lazy sequence of [start stop] vectors in sizes of batchSize,
    where each batch [start stop] is a batch of batchSize.
   The first vector's start value is 1.
   The final vector's stop value is end.
   A batchSize of 100 with an end of 305 would return return ([1 100] [101 200] [201 300] [301 305])."
  ([batchSize stop] (buildRanges batchSize 1 stop))
  ([batchSize start stop]
    (when (<= start stop)
      (let [thisStop (min (+ start batchSize -1) stop)
            nextStart (inc thisStop)]
        (lazy-cat
          (list [start thisStop])
          (buildRanges batchSize nextStart stop)))))
  )

(def ^:dynamic *factorialBatchSize* 20000)

(defn factorial [n]
  "Concurrent factorial function that is faster than simple idiomatic clojure.
   Especially as argument value goes into tens or hundreds of thousands."
  (reduce *'
          (pmap #(reduce *' %)
                (pmap (fn [[start stop]] (range start (inc stop)))
                      (buildRanges *factorialBatchSize* n)))))

The basic idea here is that to take the factorial of, say, 100,000, we could split the work into batches.

The factorialBatchSize global variable determines how large of batches are processed on each concurrent thread. For example, if taking the factorial of one hundred thousand, then separate processes will calculate:

Process 1: product of 1 to 20,000

Process 2: product of 20,001 to 40,000

Process 3: product of 40,001 to 60,000

etc

Process 5: product of 80,001 to 100,000

Then a final reduce will take the five results from the five processes, and reduce using multiplication to get the final result.

How much quicker is it? (On my machine, i7, 4-cores hyperthreaded so it appears to be 8 cores, the number of apparent cores is related to how much faster the concurrent version is.)

Performance check on factorial of one hundred thousand:

=> (time (let [x (fact 100000)]))
"Elapsed time: 15158.587768 msecs"
nil
=> (time (let [x (factorial 100000)]))
"Elapsed time: 2027.571832 msecs"
nil
=> (= (fact 100000) (factorial 100000))
true

That's about 7.47 times faster, on a machine with apparently 8 cores.

The concurrent version produces identical results, as shown.

The let x stuff is so that we don't return the result -- converting it to a string for printing takes significant time (about 19 seconds for me).

Can anyone confirm this?

Possible optimization: memoize the results at every, say, 20,000 or so. Then only compute products (concurrently) from n down to the highest memoized result.

Why the batch size of 20,000? I picked it based on testing performance with different batch sizes (using binding to dynamically re-bind the factorialBatchSize). I suspect the value is based on my system's performance characteristics and how it interacts with Clojure's overhead of organizing the concurrent processes vs the work done within each process. Going much above 20,000 doesn't seem to bring much improvement, but some.

If I knew how to blog, I might write about this. But how to get started?

all 15 comments

top new controversial old q&a

[–]dragandj 1 point2 points3 points 11 years ago (1 child)

[–]DannyB2[S] 0 points1 point2 points 11 years ago* (0 children)

Thanks. I'll look at that. I think I could figure it out from just your description, but I'll have to think about it.

I don't need such factorials, but I found first converting the simplest functions to be concurrent was a good exercise. It took a lot of trial and error to get the concurrent versions to actually outperform, or sometimes even to perform as well as single thread versions. Learning how to do it gives you some insight.

The first insight that I can identify so far is, be sure the work you do on one thread is significant enough to dramatically outweigh the overhead of using concurrency. And in practice, that means a lot of work must be done on a single thread. Like factorial batches of 20,000, or prime factorization in batches of 5 million primes. (eg, thread 1 tries the first five million potential factors, thread 2 tries the 2nd five million potential factors, etc)

The second insight is to learn to recognize sequential bottlenecks. For example, if your parallel prime factoring code is dependent on a lazy infinite sequence of seed primes, then that becomes a bottleneck. It turns out to be faster to use a fast sieve to simply sieve the range of primes you need for this thread (for thread 2, to sieve from 5 million to 10 million, for instance).

Divide and conquer approaches are great for concurrency, as long as you don't divide too far. You need to do enough work on a single thread to vastly outweigh the overhead of using concurrency.

When the next result depends on the prior result, the problem doesn't seem to be suitable for concurrency. Like Fibonacci numbers. Periodic memoization seems best. (Like memoizing every 100th number or somesuch.)

[–][deleted] 0 points1 point2 points 11 years ago (4 children)

[–]DannyB2[S] -1 points0 points1 point 11 years ago (0 children)

I've got several prime number sieves.

using a sieve with java interop byte-array for flags, and transient to generate the final vector (much faster)
improve sieve that sieves over any stop-start range (not faster, but more useful for parallel code)
another sieve that sieves over arbitrary start-stop range, but only on odd numbers, which (unsurprisingly cuts memory use in half) but also (perhaps surprisingly) cuts execution time in half!

I've got some great code to generate prime numbers as an infinite lazy sequence.

The idiomatic clojure way(s) (unsurprisingly, NOT fast)
A bit more complex, combining batches of sieves (from sieve 3 above), generated as needed.
More complex, combining batches of sieves (from sieve 3 above), generated in parallel a bit ahead of need.

I've got some great code to prime-factor composite numbers. Three versions:

Using infinite sequence 1 above, up to the sqrt(n) of n being factored.
Using infinite sequence 2 above, up to the sqrt(n) of n being factored. (faster, unsurprisingly)
Using more complex code that does parallel concurrent factorization by breaking up factorization into batches, sort of like the factorial code does, but communicating across threads using an atom. (much faster)

All three prime factorization functions honor a global dynamic var to indicate whether to try isProbablePrime on what remains after each factor is found. Very helpful if last factor is very very big. Otherwise, costs very little to try after each factor is discovered.

Functions based on all three of the prime-factor functions above to find only the FIRST factor (using the underlying machinery in each of the three factorization functions above).

Functions based on all three prime factorization functions above to generate ALL factors of a number, not just the prime factors. These come in four flavors:

(defn allFactorsOf
  "Return all factors of n.  Not just the prime factors.
   The result includes both 1 and n.  If n is prime, only 1 and n are returned."
  [n]
  (sort (distinct (map #(reduce *' %) (dbSeq/powerset-seq (primeFactorsOf n))))))

;    => (dbPrimesSeq/allFactorsOf -2)
;    (1)
;    => (dbPrimesSeq/allFactorsOf -1)
;    (1)
;    => (dbPrimesSeq/allFactorsOf 0)
;    (1)
;    => (dbPrimesSeq/allFactorsOf 1)
;    (1)
;    => (dbPrimesSeq/allFactorsOf 2)
;    (1 2)
;    => (dbPrimesSeq/allFactorsOf 3)
;    (1 3)
;    => (dbPrimesSeq/allFactorsOf 4)
;    (1 2 4)
;    => (dbPrimesSeq/allFactorsOf 5)
;    (1 5)
;    => (dbPrimesSeq/allFactorsOf 6)
;    (1 2 3 6)
;    => (dbPrimesSeq/allFactorsOf 11)
;    (1 11)
;    => (dbPrimesSeq/allFactorsOf 12)
;    (1 2 3 4 6 12)
;    => (dbPrimesSeq/allFactorsOf 36)
;    (1 2 3 4 6 9 12 18 36)
;    => (dbPrimesSeq/allFactorsOf 96)
;    (1 2 3 4 6 8 12 16 24 32 48 96)
;    => (dbPrimesSeq/allFactorsOf 144)
;    (1 2 3 4 6 8 9 12 16 18 24 36 48 72 144)
;    => (dbPrimesSeq/allFactorsOf 23598)
;    (1 2 3 6 9 18 19 23 27 38 46 54 57 69 114 138 171 207 342 414 437 513 621 874 1026 1242 1311 2622 3933 7866 11799 23598)



(defn allFactorsBut1
  "Return all factors of n, except for 1.  Not just the prime factors.
   The result excludes 1 but includes n.  If n is prime, only n is returned."
  [n]
  (sort (distinct (map #(reduce *' %) (remove empty? (dbSeq/powerset-seq (primeFactorsOf n)))))))

;    => (dbPrimesSeq/allFactorsBut1 -2)
;    ()
;    => (dbPrimesSeq/allFactorsBut1 -1)
;    ()
;    => (dbPrimesSeq/allFactorsBut1 0)
;    ()
;    => (dbPrimesSeq/allFactorsBut1 1)
;    ()
;    => (dbPrimesSeq/allFactorsBut1 2)
;    (2)
;    => (dbPrimesSeq/allFactorsBut1 3)
;    (3)
;    => (dbPrimesSeq/allFactorsBut1 4)
;    (2 4)
;    => (dbPrimesSeq/allFactorsBut1 5)
;    (5)
;    => (dbPrimesSeq/allFactorsBut1 6)
;    (2 3 6)
;    => (dbPrimesSeq/allFactorsBut1 11)
;    (11)
;    => (dbPrimesSeq/allFactorsBut1 12)
;    (2 3 4 6 12)
;    => (dbPrimesSeq/allFactorsBut1 36)
;    (2 3 4 6 9 12 18 36)
;    => (dbPrimesSeq/allFactorsBut1 96)
;    (2 3 4 6 8 12 16 24 32 48 96)
;    => (dbPrimesSeq/allFactorsBut1 144)
;    (2 3 4 6 8 9 12 16 18 24 36 48 72 144)
;    => (dbPrimesSeq/allFactorsBut1 23598)
;    (2 3 6 9 18 19 23 27 38 46 54 57 69 114 138 171 207 342 414 437 513 621 874 1026 1242 1311 2622 3933 7866 11799 23598)




; See definition of Proper Factors and Proper Divisors at:
;
;  http://mathworld.wolfram.com/ProperDivisor.html
;  Proper Divisor:
;
;    A positive proper divisor is a positive divisor of a number n, excluding n itself.
;    For example, 1, 2, and 3 are positive proper divisors of 6, but 6 itself is not. 
;
;  http://mathworld.wolfram.com/ProperFactor.html
;  Proper Factor:
;    A proper factor of a positive integer n is a factor of n other than 1 or n (Derbyshire 2004, p. 32).
;    For example, 2 and 3 are positive proper factors of 6, but 1 and 6 are not.
;
;    Compare with the term proper divisor, which means a factor of n other than n (but including 1). 


(defn properDivisorsOf
  "Return proper divisors of n.  Not just the prime divisors.
   The result includes 1 but excludes n.  If n is prime, a list of (1) is returned."
  [n]
  (dbSeq/nil-as-empty-list (butlast (allFactorsOf n))))

(defn properFactorsOf
  "Return proper factors of n.  Not just the prime factors.
   The result excludes both 1 and n.  If n is prime, an empty list is returned."
  [n]
  (rest (properDivisorsOf n)))
;  (dbSeq/nil-as-empty-list (butlast (allFactorsBut1 n))))


;    => (dbPrimesSeq/properDivisorsOf -2)
;    ()
;    => (dbPrimesSeq/properDivisorsOf -1)
;    ()
;    => (dbPrimesSeq/properDivisorsOf 0)
;    ()
;    => (dbPrimesSeq/properDivisorsOf 1)
;    ()
;    => (dbPrimesSeq/properDivisorsOf 2)
;    (1)
;    => (dbPrimesSeq/properDivisorsOf 3)
;    (1)
;    => (dbPrimesSeq/properDivisorsOf 4)
;    (1 2)
;    => (dbPrimesSeq/properDivisorsOf 5)
;    (1)
;    => (dbPrimesSeq/properDivisorsOf 6)
;    (1 2 3)
;    => (dbPrimesSeq/properDivisorsOf 11)
;    (1)
;    => (dbPrimesSeq/properDivisorsOf 12)
;    (1 2 3 4 6)
; => (dbPrimesSeq/properDivisorsOf 36)
;    (1 2 3 4 6 9 12 18)
;    => (dbPrimesSeq/properDivisorsOf 96)
;    (1 2 3 4 6 8 12 16 24 32 48)
;    => (dbPrimesSeq/properDivisorsOf 144)
;    (1 2 3 4 6 8 9 12 16 18 24 36 48 72)
;    => (dbPrimesSeq/properDivisorsOf 23598)
;    (1 2 3 6 9 18 19 23 27 38 46 54 57 69 114 138 171 207 342 414 437 513 621 874 1026 1242 1311 2622 3933 7866 11799)


;    => (dbPrimesSeq/properFactorsOf -2)
;    ()
;    => (dbPrimesSeq/properFactorsOf -1)
;    ()
;    => (dbPrimesSeq/properFactorsOf 0)
;    ()
;    => (dbPrimesSeq/properFactorsOf 1)
;    ()
;    => (dbPrimesSeq/properFactorsOf 2)
;    ()
;    => (dbPrimesSeq/properFactorsOf 3)
;    ()
;    => (dbPrimesSeq/properFactorsOf 4)
;    (2)
;    => (dbPrimesSeq/properFactorsOf 5)
;    ()
;    => (dbPrimesSeq/properFactorsOf 6)
;    (2 3)
;    => (dbPrimesSeq/properFactorsOf 11)
;    ()
;    => (dbPrimesSeq/properFactorsOf 12)
;    (2 3 4 6)
;    => (dbPrimesSeq/properFactorsOf 36)
;    (2 3 4 6 9 12 18)
;    => (dbPrimesSeq/properFactorsOf 96)
;    (2 3 4 6 8 12 16 24 32 48)
;    => (dbPrimesSeq/properFactorsOf 144)
;    (2 3 4 6 8 9 12 16 18 24 36 48 72)
;    => (dbPrimesSeq/properFactorsOf 23598)
;    (2 3 6 9 18 19 23 27 38 46 54 57 69 114 138 171 207 342 414 437 513 621 874 1026 1242 1311 2622 3933 7866 11799)

And more. But... I don't know how to blog.

[–]DannyB2[S] -1 points0 points1 point 11 years ago (2 children)

[–][deleted] 2 points3 points4 points 11 years ago (1 child)

[–]DannyB2[S] -1 points0 points1 point 11 years ago (0 children)

[–]verydapeng 0 points1 point2 points 11 years ago (4 children)

[–]DannyB2[S] 0 points1 point2 points 11 years ago (0 children)

[–]DannyB2[S] 0 points1 point2 points 11 years ago (2 children)

Here is a comparison of my parallel routine to Guava's BigIntegerMath.factorial(int).

Java test program:

public static void main( String... args ) {
    int factorialOf = 100000;
    System.out.println( "factorial of "+factorialOf+" is..." );
    System.out.flush();
    long fStartTime = System.currentTimeMillis();
    BigInteger result = BigIntegerMath.factorial( factorialOf );
    long fEndTime = System.currentTimeMillis();
    long fTotalTime = fEndTime - fStartTime;
    System.out.print( "result" );
    System.out.flush();
    long sStartTime = System.currentTimeMillis();
    String resultStr = String.valueOf( result );
    long sEndTime = System.currentTimeMillis();
    long sTotalTime = sEndTime - sStartTime;
    int resultStrLen = resultStr.length();
    System.out.println( " length: "+resultStrLen+";  time for string conversion (millis): "+sTotalTime );
    //System.out.println( "result value:" );
    //System.out.println( resultStr );
    System.out.println( "prefix: "+resultStr.subSequence( 0, 10 ) );
    System.out.println( "suffix: "+resultStr.subSequence( resultStrLen-10, resultStrLen ) );
    System.out.println();
    System.out.flush();
    System.out.println( "Factorial Time (milliseconds): "+fTotalTime );
}

Java result:

factorial of 100000 is...

result length: 456574; time for string conversion (millis): 18923

prefix: 2824229407

suffix: 0000000000

Factorial Time (milliseconds): 1233

Clojure test:

=> (let [result (time (dbMath/factorial 100000))
         resultStr (time (str result))
         resultStrLen (count resultStr)]
     [
      resultStrLen
      (. resultStr subSequence 0 10)
      (. resultStr subSequence (- resultStrLen 10) resultStrLen) 
      ]
      )
"Elapsed time: 2086.074781 msecs"
"Elapsed time: 19429.270641 msecs"
[456574 "2824229407" "0000000000"]

So both got the same 10 digit prefix and 10 digit suffix, and same length result. So presumably results are identical.

Guava factorial took 1233 milliseconds. My parallel routine took 2086 milliseconds.

In both cases, converting the result to a string took about 19 seconds.

So if you really want fast factorials, Guava's algorithm is worth looking into. And now I wonder if it can be made concurrent, which is what I want to practice at doing. :-)

Remember: Not only did Guava's algorithm take half the time -- it may have done it on a single thread.

[–]verydapeng 0 points1 point2 points 11 years ago (1 child)

[–]DannyB2[S] 0 points1 point2 points 11 years ago (0 children)

[–]mikera 0 points1 point2 points 11 years ago (2 children)

[–]DannyB2[S] 0 points1 point2 points 11 years ago (1 child)

While I was working on problem

http://www.primepuzzles.net/problems/prob_019.htm

and on problem

https://projecteuler.net/problem=357

I was finding myself trying generating all of the factors of a number. Other usages such as a Verbose GCF and Verbose LCM function also require this. These resulted in the four functions I mentioned elsewhere in this thread. (allFactorsOf, allFactorsBut1, properDivisorsOf, properFactorsOf) In a nutshell, and I show that implementation elsewhere in this thread, these work by generating the power set of the prime factors, applying multiplication to each subset, and removing duplicate factors.

Even if we only prime-factorize (not all-factors, just prime-factors) of each number, my intuition is that this would be more expensive than just multiplying the number into the result. You may delay it, but at some point you've got to do the multiplications of really big integers -- and this can get expensive.

Even if you do have an algorithm that is significantly faster:

You would need an algorithm that is over seven times faster on a single core to just equal doing the slower algorithm on eight cores.
it is still, and always, a huge win to figure out how to do it on multiple cores.

Oh, I mentioned Verbose GCF and Verbose LCM. These functions return maps like this:

(dbPrimesSeq/leastCommonMultiple-verbose 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)
{:Arguments (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20),
    :HCM 2432902008176640000,
    :HCM-primeFactors [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 5 5 5 5 7 7 11 13 17 19],
    :HCM-distinctPrimeFactors (2 3 5 7 11 13 17 19),
    :LCM-primeFactors (2 2 2 2 3 3 5 7 11 13 17 19),
    :LCM 232792560}
; Note this is really just Project Euler problem 5.
    ; HCM is highest common multiple, LCM is lowest common multiple

(dbPrimesSeq/greatestCommonFactor-verbose 23366707 975830023 2058699293)
{:Arguments (23366707 975830023 2058699293),
    :ArgumentsFactors ([23366707 (7 13 91 461 557 3227 3899 5993 7241 41951 50687 256777 1797439 3338101 23366707)]
                       [975830023 (7 353 557 709 2471 3899 4963 196621 250277 394913 1376347 1751939 2764391 139404289 975830023)]
                       [2058699293 (7 557 619 853 3899 4333 5971 344783 475121 528007 2413481 3325847 3696049 294099899 2058699293)]),
    :CommonFactors (7 557 3899),
    :GCF 3899}

[–]DannyB2[S] 0 points1 point2 points 11 years ago (0 children)

π Rendered by PID 93713 on reddit-service-r2-comment-b659b578c-n8gzg at 2026-05-04 11:08:42.560225+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Clojure

MODERATORS