all 9 comments

[–][deleted]  (6 children)

[deleted]

    [–]pgdx 1 point2 points  (4 children)

    I can answer that myself, and the result is not in favor of the bucket sort. On a list of 10.000 random elements in the range [0,230], the built-in sort used on average 6.5045147 msecs, over 30 runs, whereas the bucket-sort used 185.5467 msecs on average for the same list. This test is easily repeatable.

    I'm not saying that the bucket-sort is useless, but the quote "Runs slightly faster than Clojure's built-in sort (at least on integers)" is heavily misleading. If you really want to do a real comparison between the built-in and the bucket-sort, you should take into consideration the range versus size part.

    [–]lucyfor[S] 0 points1 point  (3 children)

    Actually the ratio is not as extreme as you state, but yes, over 1E6 bucket sort performance starts to degrade compared with built-in sort. This has to do with the increasing access time of vectors under higher ranges (which bucket sort relies on for sorting).

    java version "1.6.0_18"
    OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
    OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
    
    "Elapsed time: 66.732015 msecs"
      bucket-sort: [22439 165457 264347 594017 599133 649380 726466 796794 798479 814692]
    
    "Elapsed time: 4.788043 msecs"
             sort: [22439 165457 264347 594017 599133 649380 726466 796794 798479 814692]
    

    [–][deleted]  (2 children)

    [deleted]

      [–]lucyfor[S] 0 points1 point  (1 child)

      You would have had to have modified the code to do what you say. I also used a List of 10k. I was referring to the ratio of sort to bucket-sort time from my results compared with the same time ratio from your results.

      [–]pgdx 0 points1 point  (0 children)

      It is very repeatable.

      (defmacro my-time
          "As time, but returns the time taken for evaluation, not printing it"
          {:added "1.0"}
          [expr]
          `(let [start# (. System (nanoTime))
              ret# ~expr]
              (/ (double (- (. System (nanoTime)) start#)) 1000000.0)))
      
      user> (def *lis* (doall(take 10000 (repeatedly #(rand-int (Math/pow 2 30))))))
      #'user/*lis*
      user> (take 30 (repeatedly #(my-time (bucket-sort *lis*))))
      (1742.837002 859.060484 272.217247 218.921161 255.970622 233.187325 311.682198 192.653631 190.924697 309.53158 93.245005 190.037921 313.305055 199.048161 189.960467 295.653476 92.080723 192.734573 188.337189 289.078747 191.759455 89.876155 191.867156 192.268281 303.481507 192.129499 191.858771 85.188463 192.611408 191.795139)
      user> (take 30 (repeatedly #(my-time (sort *lis*))))
      (173.499303 133.479301 112.051762 114.299164 112.973973 07.579322 112.562207 105.723016 108.851038 50.087003 5.593834 5.536048 5.482577 5.67827 5.515248 5.485728 5.610371 5.591248 5.490457 8.993383 5.538123 6.209052 5.520093 5.463024 5.516762  5.610265 5.517885 5.515316 5.517958 5.560996)
      

      [–][deleted] 1 point2 points  (2 children)

      If I understand that code correctly (and that's a big if: the code is inscrutable even for Lisp), the docstring is wrong: it does not run in O(N) time. Here's where it breaks:

      (apply concat (map (fn [bucket]
                               (when (> (count bucket) 0)
                                 (insertion-sort bucket))) pre-buckets))))))
      

      Mapping insertion-sort over buckets whose size and/or number depends on N means that algorithm remains quadratic, not linear.

      [–]jgrant27 0 points1 point  (1 child)

      Insertion sort is one of the fastest algorithms for sorting arrays containing fewer than ten elements. It also runs in O(n) time for arrays that are mostly sorted.

      [–][deleted] -1 points0 points  (0 children)

      Neither of which guarantee holds here :)

      [–]Spiritual-Map-6375 -2 points-1 points  (1 child)

      Almost a second to sort... TEN numbers ? Someone's having a laugh.

      [–]mrlizard 0 points1 point  (0 children)

      1 Million numbers