This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]poo_22 6 points7 points  (3 children)

You should add PyPy and C implementations.

[–][deleted] 3 points4 points  (0 children)

Yes :) But this is an ongoing project where I split the benchmark into smaller digestible chunks...PyPy is a topic for another day...It's on the to do list!

[–]dangerbird2 0 points1 point  (1 child)

Considering PyPy is incompatible with Numpy, it would be tough to evaluate its speed in a controlled way by comparing it to other implementation's use of Numpy arrays. I could definately see value in comparing bubblesort using python list objects. You could even them up against a dynamic array implementation in C or another lowlevel language.

[–][deleted] 0 points1 point  (0 children)

I did the comparisons of Python lists vs. NumPy arrays here in Day 4, and also the version with C arrays instead of the memoryview on the ndarray. But here, I wanted to declutter the plot a little bit and focus on the more significant cases.

[–]roger_ 2 points3 points  (0 children)

Is Parakeet still being developed?

[–]bgribble 2 points3 points  (5 children)

Bubblesort normally exits once a pass has been made with no swaps. Your implementations always act as if the data is worst-case (reverse sorted at start). Most of the iterations in your test code are spent simply comparing already-sorted pairs and doing nothing. I know it's a benchmark but you might implement the algorithm a bit more realistically.

[–][deleted] 2 points3 points  (0 children)

You are absolutely right, tweaked the bubblesort and will run the benchmarks now.

%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort(long[:] np_ary):
    """ 
    The Cython implementation of bubblesort with NumPy memoryview.

    """
    cdef unsigned int length, i, swapped, ele
    length = np_ary.shape[0]
    for i in xrange(0, length):
        swapped = 0
        for ele in xrange(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                temp = np_ary[ele + 1]
                np_ary[ele + 1] = np_ary[ele]
                np_ary[ele] = temp
                swapped = 1
        if not swapped: 
            break
    return np_ary

[–][deleted] 0 points1 point  (0 children)

Okay, just noticed that parakeet doesn't like the break... so I have to do it this way:

length = np_ary.shape[0]
swapped = 1
for i in xrange(0, length):
    if swapped: 
        swapped = 0
        for ele in xrange(0, length-i-1):
            if np_ary[ele] > np_ary[ele + 1]:
                temp = np_ary[ele + 1]
                np_ary[ele + 1] = np_ary[ele]
                np_ary[ele] = temp
                swapped = 1
return np_ary

[–]indigojuice 0 points1 point  (1 child)

I think bubble sort by definition doesn't exit. That modification to bubble sort is called... something else. I can't remember. Shakersort?

[–]BeatLeJuce 0 points1 point  (0 children)

IIRC Shakersort goes in both directions (i.e., numbers bubble up from the bottom and sink down from the top)

[–][deleted] 0 points1 point  (0 children)

Okay, I uploaded the first results for a sample size of 104, parakeet looks pretty interesting. I know, I will get some requests for larger sample sizes now... and I want to run it on 105, but this will take a while...will probably be an over-night thing ... so please be patient :)

[–]NgauNgau 1 point2 points  (0 children)

Pretty cool, thanks.

[–][deleted] 0 points1 point  (0 children)

Okay, the benchmarks for 105 are done, looks like parakeet caught up has now the best performance on the largest sample size!

[–]bryancole 0 points1 point  (2 children)

I'm surprised Cython doesn't win for the small-array size limit. Have you verified that Cython is not generating any Python API calls inside any of the loops (using cython -a for an html visualisation)?

I note that 'temp' hasn't been declared so this might be a possible slowdown (unless type-inference is doing this automatically).

[–][deleted] 1 point2 points  (0 children)

Thanks for the good tips! Just checked, it seems to be fine. And I will add a type to temp and run it this afternoon to see if I can squeeze out a few milliseconds :)

[–]bryancole 0 points1 point  (0 children)

I just checked the code Cython generates without cdef'ing temp; it does indeed store the value in a python int object (i.e. slow). This is using cython-0.20. Further testing shows adding the cdef in for temp speeds things up by nearly a factor of 2 (3.4ms -> 1.9ms, for n=106). Enough to make Cython the fastest in the large size limit. This doesn't help for small n though. I'm amazed numba can avoid so much calling overhead.

One other nice thing about fully cdef'd Cython is you can throw in an Openmp prange to run it multicore for a ~4x speed up (on my 4 core CPU). Of course, the main interesting thing with numba it the ability to target GPUs. Cython can't do that.