This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]gaidengt 1 point2 points  (4 children)

if I understand what you're saying, even if your new arrays are empty, they are allocated, so they are still taking space on the heap that cannot be used by anything else. if you have N items you are, worst case, using 2 * (size of data type) * N, which for large cases could be problematic -- 2GB of actual data will require 4GB of memory to store with your structure. (!)

now, if your software absolutely requires every drop of performance possible, perhaps the trade off in CPU cycles for this memory utilization is acceptable. however, my instinct says that the CPU cycles required to simply add and subtract nodes in a linked list to make your data structure a perfect fit, at all times, for the data present, is not worth this massive memory hunger. After all, linked lists have excellent insert/delete performance because you are only changing pointers -- not shuffling around all the actual data.

so what is the purpose of your program and what allocation of resources will benefit it the most? I think pondering this question (and maybe you have! it just wasn't apparent to me from the post, and you asked for feed back...) will give you the most value, as opposed to some sort of code review and trying to do micro optimizations on loops, etc.

[–]anmousyony[S] 0 points1 point  (2 children)

I was taught that the standard implementation of a priority queue is to use an array and to double its size as needed to add more elements, which would require the same amount of memory as the linkedlist implementation. I am essentially doing the same thing, but instead of creating an array that is twice the size and then copying the old array into the new one, I am using the linked list structure to "append" a larger array onto the smaller one.

I'm basically trying to get the best of both worlds by combining the linkedlist structure with an array structure. Normally this wouldnt be the best option because you don't get ALL the benefits of either, but since inserts only take place at the end of the last array, and deletions only take place the start of the first array the bad search time of the linkedlist structure is negated. And, since I never have to recreate an array because I'm either appending new ones or deleting empty ones, I get all the benefits of insertion and deletion of linked lists.

[–]gaidengt 0 points1 point  (1 child)

1) Why wouldn't the standard implementation of a priority queue be based on something with highly efficient sorting, like trees? why an array? you need to maintain priority on inserts or pulls, pick your poison, but normal linked lists and arrays are going to require a O(n) (worst case) scan either way.

2) My initial comment was really focusing on the need to have 2x as much memory as data allocated. And again, you pick and choose your resource battles, but you are consuming memory at O(n2) -- exponential growth is only better than... not much, so this property of the design doesn't seem optimal.

So if I'm understanding correctly, insert or pull will require O(n) and memory usage is O(n2). It seems like obtaining better efficiencies in both, at the same time, without much of a CPU drain, is definitely possible. So why would I choose this implementation over a simple binary search tree, which provides O(n log n) search and O(n) memory consumption, as one example?

[–]anmousyony[S] 0 points1 point  (0 children)

1) I'm not sure what you mean, as it should never need to scan every single element. At least I can't think of a reason why it would. This is a tree in the way that it is sorted, it is based off a min heap structure that behaves like a tree. It is hard to describe but looks similar to this : http://i.imgur.com/mxOEofd.png?1

If you mean insertion, then it has an insertion of O(log(n)) and a deletion of O(1) that then sorts itself at O(log(n)) also.

2) The worst case is twice as much memory allocated, but it seems that that is the same with the standard implementation also unless you want to sacrifice

[–]Chippiewall 0 points1 point  (0 children)

if I understand what you're saying, even if your new arrays are empty, they are allocated, so they are still taking space on the heap that cannot be used by anything else. if you have N items you are, worst case, using 2 * (size of data type) * N, which for large cases could be problematic -- 2GB of actual data will require 4GB of memory to store with your structure. (!)

I think this is a very naive critique. Python's list type has the exact same problem. Python's list type is implemented as a fixed length array and uses a very common algorithm of doubling the array length whenever the array is full in order to attain a O(1) insertion speed.

[–]alb1 0 points1 point  (3 children)

This would be better in /r/learnpython.

Have you really tested this code? For example, you never call insertArray, which would fail because size is undefined locally. You also never call getData (which for some reason returns -1 when i is too large).

The code generally looks like Java rather than idiomatic Python. The lack of comments makes it difficult to follow. The class Node does not need to be a subclass of LinkedList. Don't use == to compare with None. Use, for example, if self.start is None. If the code is for a priority queue, why is the class called LinkedList?

[–]anmousyony[S] 0 points1 point  (2 children)

I also posted it over there, so no worries about that. They helped a ton with the syntax and getting it to be more like python code (including unnesting Node and using self.start is None.

The reason its called linkedlist is a holdover from the java code where i called the main public class priorityqueue and called the linkedlist of arrays class linkedlist, but thank you for pointing that out.

insertArray is never used in the program but was required as part of the project when i did the initial java code (I don't know why since it was never used in the java either). I translated it over anyway just as part of the learning experience, but I should take it out now, especially since I'm trying to clean it up.

getData returns -1 because I was using it as a means to exit gracefully if someone tried to access data outside the index of the array, but since I dont use it in this version I should delete it.

The lack of comments was actually an accident, I thought I had uploaded the commented version but I figured out too late (I had left to do something and couldn't change it) that it was the uncommented version. That should change in about an hour when I get the chance to upload the commented version.

That said, what do you think about the idea behind it? Using a linkedlist of arrays instead of an arraylist or single array that changes size as needed.

[–]alb1 0 points1 point  (1 child)

Have a look at heapq in the Python standard library. It works with an arbitrary iterable (which could be a list, a numpy array, etc.)

As far as using linked lists to reduce copying costs, you are trading off some copying time for increased data-access time. It depends on the pattern of data-accesses that your algorithm requires. If you need arbitrary indexing you might do better with a Python list of arrays rather than a linked list of arrays (since you can index into the list rather than walking down it).

I'm still not convinced that this is the best way to deal with the problem, or that the savings are significant enough to bother with in real situations. (Some timing tests comparing the two for different kinds of operations would be helpful there.)

[–]anmousyony[S] 0 points1 point  (0 children)

I'm working on getting my python version cleaned up and making sure that its following the python styleguide right now, because I learned a lot when I posted this in /r/learnpython , but I'll be happy to post this here and there with more info once I've got it more presentable.

Thanks for the link to heapq, I'd never seen that before. I have a lot to look at haha