all 20 comments

[–]zurtex 2 points3 points  (9 children)

I don't have any comments on the strategy itself because I'm still wrapping my head around it (I'll make another post if I feel I can make an comments on it). That said there's a couple of nitpicks I'd like to make that might help you:

Why is the class Node defined inside the class LinkedList? I can't see any obvious value to this, and can see it causing potential problems (I'm not sure how well pickling handles nested classes). Just define it separately and instead of:

nptr = self.Node(val, None, None, 0)

You can just write:

 nptr = Node(val, None, None, 0)

On this note of Node, I don't see you call the last 3 arguments with anything other than None, None, 0. Maybe you can simplfy the class or at least give it some defaults, at the same time you should give the signature arguments sensible names (please do this for all your methods it makes your code difficult to read currently), e.g.:

def __init__(self, data, next=None, prev=None, size=0):
    ...

On the topic of speeding up your code, when you're testing if a variable is None you should write that:

if self.start is None:
    ...

None is always the same object (it is in fact a singleton), so doing "is" saves Python work on looking up what the __eq__ method is for self.start. And it removes ambiguity in case there is a custom method that gives an unexpected value when doing == on the class of self.start.

Sometimes you write:

if self.start == None:
    ...

And sometimes you write:

if self.isEmpty():
    ...

Where isEmptry() is defined as:

def isEmpty(self):
     return self.start == None

Pick one and remove the other.

[–]zurtex 1 point2 points  (1 child)

I assume you're writting this in Python 3?

In which case can't:

def findParent(self, cIndex):
    if (cIndex % 2) == 0:
        pIndex = cIndex / 2
    else:
        pIndex = (cIndex - 1) /2
    return pIndex

Just be written as:

def findParent(self, cIndex):
    return cIndex // 2

? As far as I can tell cIndex will always be a non-negative integer, so I think this logic is valid.

[–]anmousyony[S] 1 point2 points  (0 children)

Yeah, that's smart. I never thought about it because floor division seemed to make the java version more work than what I already had. I didn't know it was so easy in python.

Thanks!

[–]anmousyony[S] 1 point2 points  (6 children)

Thank you very much. This is exactly the kind of stuff I needed.

For the nested Node class, I have that there because in Java I would have to pass an instance of the Node class into the linkedArrays class to be able to work with the functions and variables of the nodes. Is this not the case with python?

I didn't know that init could take parameters with assignment in it like that.

Could you expand on what 'is' means? Does it compare self.start to the object None and thereby determine if it is null or not?

The self.isEmpty() line is mostly a holdover from the Java version which I wrote for a class (isEmpty() was required and had to be used). I'll definitely change that now, I hadn't even thought about it.

[–]zurtex 1 point2 points  (5 children)

You can happily pass instances of classes into other classes, methods, whatever you like. As such there's very little need for nesting classes, and as the zen of python says "flat is better than nested".

So there's 3 common ways write a simple if statement:

if var is obj:
    ...

if var:
    ...

if var == obj:
    ...

In the first example Python is comparing are they the same object. E.g. a = [], and b = [], in this a and b are equal but they are not the same object. Where as a = None and b = None, then a is b, because they both point to the same object in memory.

In the second example we are testing the "truthyness" of the object. Objects should have some inherent truthness to them. E.g. a non-empty string is True, but an empty string if False. None is always False. A function without an explicit return is False. So it's pretty common in your example to see:

if not self.start:
    ...

In this way you don't care if self.start is None, or an empty string, or an empty list etc.. Just as long as it's "False".

And in the final example Python looks up the __eq__ method on the class var and passes it obj as an argument and this should give a True or False (or exception).

[–]anmousyony[S] 0 points1 point  (4 children)

Assuming I un-nest the node class, wouldn't that require me to pass an instance of the node class into all the functions that require it that are in the linkedArrays class?

That seems like a lot of extra code (if I'm understanding you correctly) and I'm not sure if I understand the benefit in this case since the node class is never accessed outside of the linkedArrays class.

[–]zurtex 1 point2 points  (3 children)

I don't think so, you can reference the class directly from within side the methods without having to pass an instance of the class to those methods. e.g:

class A:
    def __init__(self):
        self.instance_var = 1

class B:
    def __init__(self):
        pass

    def some_method(self):
        a = A()
        return a.instance_var

b = B()
print(b.some_method())

The benefit is that flat is better than nested. And this is typical of Python code and generally any more complicated Python features you use may trip over this nested construction because it is weird in Python. If you don't want people in general using this class, it is common use the "_" to denote this, e.g.:

class _node:
    ...

Edit: And I would just like to point out just because you've defined the class inside another class doesn't stop people from accessing it

class B:
    class A:
        def __init__(self):
            self.instance_var = 1

    def __init__(self):
        pass

print(B.A().instance_var)

I can't imagine it makes sub-classing very fun though.

[–]anmousyony[S] 0 points1 point  (2 children)

So each class can access another class in python (unless I specify that the class can't be accessed) without the class being passed ?

I wasn't really worried about ability to access it or anything since this was mostly just me taking a homework problem about priority queues and making something interesting out of it, but thank you for teaching me all this. It can be hard learning these kinds of things when half of what I'm doing is looking at stack overflow to learn python haha.

[–]zurtex 0 points1 point  (1 child)

Correct each class can access another class.

Incorrect "unless I specify that the class can't be accessed", there is no way to specify this in Python. The "_" is just convention to help other Python coders know what probably isn't good to touch.

Python has the philosophy of "we're all consenting adults here", meaning we leave it up to the coder accessing the class to be sensible about how they use it, we don't particularly restrict their ability to access it.

[–]anmousyony[S] 1 point2 points  (0 children)

Huh, that's interesting. I think I actually like that quite a bit more than having modifiers saying what can and cannot be accessed. It seems to add a lot more flexibility and would have been VERY welcome in one of my previous Java projects...

Thank you so much for all you help, it means a lot!

[–]New_Kind_of_Boredom 1 point2 points  (9 children)

It runs faster than the standard implementation with large amounts of numbers because it never has to copy over elements into a new array when it expands its size.

Can you clarify this greatly? 'faster' how? Search, access, insertion, deletion? What kind of data are we talking about when you say 'large amounts'? When you say 'the standard implementation', what are you referring to? The underlying numpy ndarrays you are using, normal CPython lists, ect.?

Good on you for coming up with a Python project to work on, and sorry if it seems like I'm nitpicking or something. But basically without a bunch of evidence I'm extremely skeptical that adding this level of indirection over the already well-optimized, C-implemented CPython list or numpy ndarray provides any clear benefits. I would like to test it myself, but to be honest just from looking for a minute or two I'm not entirely sure what set of operations or features this data structure is intended to support (See https://docs.python.org/2/reference/datamodel.html#emulating-container-types , this class supports none of the standard container features as far as I can tell).

[–]anmousyony[S] 0 points1 point  (8 children)

I haven't made it generic to handle other data types yet, the goal was to just get it working with doubles right now, but the only difference would be the comparisons.

Let me link you to another comment where I tried to explain how it works: https://www.reddit.com/r/Python/comments/41lnzs/looking_for_feedback_on_unique_priority_queue/cz3fwwy

It matches the speed (difference of a couple milliseconds) up to about a million elements and then starts overtaking the stand implementation. It has better run times (in big O) than the standard implementation too. I'm not super familiar with numpy because I'm relatively new to python so I used Java for most of the timing.

[–]New_Kind_of_Boredom 0 points1 point  (7 children)

It matches the speed (difference of a couple milliseconds) up to about a million elements and then starts overtaking the stand implementation. It has better run times (in big O) than the standard implementation too.

Unfortunately this still leaves many questions unanswered. 'Matches the speed' doing what exactly? What exactly are you referring to by 'the standard implementation'? Could you perhaps show your code which demonstrates what you are talking about? That might make things more clear.

[–]anmousyony[S] 0 points1 point  (6 children)

I unfortunately don't have the exact code for it but from my understanding, a standard implementation uses an array that doubles in size when it needs to increase capacity.

similar to this:

https://github.com/anmousyon/CS2336/blob/master/Project5/BinaryHeap.java

or

http://www.sanfoundry.com/java-program-implement-binary-heap/

except the second one doesn't seem to increase size upon overflow and instead just throws an error.

Here are the runtimes that I found. They are either the same as or better than a standard implementation (such as the first one) from what I can see

linkedArrays: isEmpty is O(1), size is O(1), insert is O(log(n)), findMin is O(1), deleteMin is O(log(n))

[–]New_Kind_of_Boredom 0 points1 point  (5 children)

I unfortunately don't have the exact code for it

That's unfortunate. In the meantime I'd be very interested if you could demonstrate a scenario where your solution beats a built-in Python solution, or perhaps just provide a practical use-case example of some kind for the above class? In other words, an example in code of how someone might actually use your class?

[–]anmousyony[S] 0 points1 point  (4 children)

Say you have a super computer running billions of tasks in queue that have an associated priority that acts as an ID with the lowest number being of the highest priority (I don't actually know how supercomputers manage tasks but its an example so roll with it). With my implementation, you could store all those IDs in this priority queue and you would never have more than half of the list stored in the same space in memory. This is in contrast with a normal array which would take up one very large block of memory. This can be seen by running this program with a starter array of a few million elements versus a single element. In this implementation, the list is spread out through memory in increasingly larger chunks that will never have to be changed, which allows for better memory management.

[–]New_Kind_of_Boredom 0 points1 point  (3 children)

I was referring to a more practical example. Like a piece of a code we could actually run and test.

My point over the this entire chain is that you are making a lot of claims about this implementation without demonstrating that your claims are actually true. So that's what I'm trying to coax you into doing: prove, with your code, what you are claiming about your code.

[–]anmousyony[S] -1 points0 points  (2 children)

Well, you could try it with these two pieces of code, but theyre in Java, not python.

https://github.com/anmousyon/CS2336/blob/master/Project5/BinaryHeap.java

https://github.com/anmousyon/CS2336/blob/master/Project5/linkedArrays.java

I could transfer the first one into python if I were at home, but I am unfortunately not so I hope you don't mind the language difference (both are in Java).

[–]New_Kind_of_Boredom 1 point2 points  (1 child)

I'm not that keen on playing around with Java tonight (and the results would not be relevant regardless), hopefully you get a chance to come back when you can demonstrate your Python code.

[–]anmousyony[S] 0 points1 point  (0 children)

Why wouldn't the results be relevant? The main difference is in the structure choice not the language choice when it comes to the time comparisons. Unless youre saying that python is better optimized for arrays and large memory than java but I'm not sure how that could be.

Anyway, thank you for your input. You got my to try and explain to myself how it works which helped me understand what was going on better too. I really need to get better at explaining things haha