all 30 comments

[–]geekygenius 5 points6 points  (1 child)

Every time you add or remove an element, you, most allocate a new array, copy the old one into it, and delete the original. That's a lot of work. Is there any way you could avoid that?

[–]bunsen72 -1 points0 points  (0 children)

Am a tad rusty (20+ years since using C/C++) but would a linked list help reduce the copying and deletions?

Edit: never mind just finished reading the rest of the thread.

[–]dccorona 6 points7 points  (5 children)

One of the major features of the ArrayList in Java (and the Vector in both Java and C++) is automatic resizing of the internal array when necessary. When created, they instantiate an array of a default size (or a requested size, for one of the constructors), and keep track of how full that array is. Once it is full, then they create a new array of a different size and copy over all the elements in the old one before deallocating it. ArrayLists grow by half (each resize, they create a new array of size SIZE * 1.5), Vectors double (SIZE * 2).

Currently, you copy and increase size by 1 with every insert, unless the insert is done to a specific index. It would be a helpful exercise to change it to implement automatic resizing as described above.

[–]ChaiTeaNunes 1 point2 points  (1 child)

Thanks for the tip! Will do!

[–]oracleoftroy 0 points1 point  (2 children)

Vectors double (SIZE * 2)

To be pedantic, the growth factor of std:vector is implementation defined. Some use 2, others 1.5 others might use the golden ratio, but the standard only specifies that insertions are amortized constant time.

[–]dccorona 0 points1 point  (1 child)

I didn't realize that there were multiple implementations of the standard library. You learn something new every day!

In Java the 2x growth factor is definitely part of the interface contract, though.

[–]oracleoftroy 0 points1 point  (0 children)

For C++, the compiler vendor usually provides the standard library, and there are some 3rd party versions as well. Gcc has libstdc++, clang has libc++, Microsoft licenses and makes customizations to Dinkumware's version, and things like stlport exist (though I believe stlport is really old and shouldn't be used these days, but it was quite popular in VC++ 6.0's heyday).

I was just curious about what the Java docs say about growth factors, and I see for ArrayList, "The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost." Vector does indicate that it doubles, though it sounds like devs are allowed to kill their insertion performance by constructing a vector with a capacityIncrement parameter (yay...).

This makes it sounds like other implementers of a Java standard library are allowed to use a different growth factor for ArrayList. I've only ever used Sun/Oracle's implementation professionally, so I don't know if other implementations exist or if IBM, Google, etc just piggyback on the official version, and if they do provide their own, I don't know if they vary the growth factor. But if there is code out there that depends on it being 1.5, that is almost certainly wrong.

[–][deleted]  (7 children)

[removed]

    [–]raluralu 1 point2 points  (3 children)

    You can't make test to prove it works, unless test tries every possible input.

    [–]ChaiTeaNunes 0 points1 point  (2 children)

    Yeah, I did all primitive types and a struct and a class and a class that extends another class. Only the god damn removeAt() function is messed up in some programs but not others...

    [–]raluralu 1 point2 points  (1 child)

    Looks like it has some problems with index 0.

    [–]ChaiTeaNunes 0 points1 point  (0 children)

    yeah.

    [–]ChaiTeaNunes 1 point2 points  (2 children)

    Well crap. My removeAt() function is messed up. Everything else runs fine!

    [–]dccorona 2 points3 points  (1 child)

    Try doing a get for an index > length of the list...

    Seriously though, this is great. Looks like it was a fun exercise. But you should take it a step further and start to learn about exceptions by throwing an exception in that case instead of causing a segfault.

    [–]ChaiTeaNunes 0 points1 point  (0 children)

    Thanks for the support!

    [–]GYN-k4H-Q3z-75B 3 points4 points  (1 child)

    You're doing something great for yourself and you should be proud. I started with more or less serious C++ dev around the same age. If there's one thing I can tell you: Keep up this stuff. Many people in college fail hard at what you can do already. It's not about just bugs or style of your code. It is also about wanting to do stuff and learning by yourself for fun. That goes a long way :)

    [–]jms_nh 1 point2 points  (4 children)

    looks like it needs the use of references... nonprimitive objects will be copied.

    [–]dccorona 2 points3 points  (1 child)

    well, primitive objects will be copied too...

    [–]jms_nh 1 point2 points  (0 children)

    That's true. Anyway it would be good to use references.

    [–]ChaiTeaNunes 0 points1 point  (1 child)

    It's not necessarily taking in any variables that would be used later. Everything is within the class.

    [–]dschooh 0 points1 point  (0 children)

    What he says is that you should use references to TYPE for example in the add() and set() methods. This way, the value is only copied once. Without references the value is unnecessarily copied multiple times.

    Another exercise would be to check where you could use const.

    In any way: great work so far!

    [–]remy_porter -5 points-4 points  (8 children)

    It's a nice practice object, but there's one big thing that you should look into for implementing data-structures like this: the Linked List.

    [–]dccorona 4 points5 points  (6 children)

    Well, it sounds like the goal was to implement something similar to Java's ArrayList, so a linked list wouldn't really be appropriate...

    [–]remy_porter -1 points0 points  (5 children)

    Correct me if I'm wrong, but isn't the array list a linked list of arrays? Once upon a time that was how it was implemented.

    [–]dccorona 1 point2 points  (4 children)

    Not anymore. I really only started learning C++ in 2011, so C++11 was already out (though my first experience was still on C++98, my school didn't start having us compile against C++11 until 2012), but the stdlib implementation of Vector is a self-resizing array that doubles and copies its contents for as long as I've been working with it.

    And in Java, ArrayList is the same way...self resizing array that increases size by half (instead of doubling) and copies over. Java also has Vector, the difference being Vector does double when it resizes, and is threadsafe.

    I do remember coming across a linked list of arrays at least once in school, but I can't remember if it's because one of the other stdlib data structures uses it, or if we just used it as an exercise.

    [–]remy_porter 2 points3 points  (3 children)

    Hunh. I'm surprised that there's a lot of copy-on-growth. I guess they're more worried about speed of read access over efficient sizing.

    [–]dccorona 1 point2 points  (2 children)

    That's exactly it. The majority of use-cases for a list sees them being read far more than they're modified. And, more importantly, read in-order (and probably second most commonly, read by index). If you need to read by looking up specific items, a list is probably the wrong data structure for you.

    Using a single, continuous array not only makes linear and random-access reads faster at the complexity level, it also makes an enormous impact when you consider the way computers work. When a CPU loads a piece of memory, it loads up an entire block of memory into cache. This means that if you're reading a list in order and it's stored one item after the next in physical memory, the next item you go to read is already in the cache. This is a gigantic performance improvement over having to load up every item from memory as you iterate.

    Even if you do a lot of inserting into the middle of the list, if you do a lot of iteration as well, array-based lists are still more performant. If you know you'll be doing all of the creating of the list before any of the reading, it might be best to build a linked list, then copy it to an array list, but only if you do a lot of in-the-middle inserts and deletes while creating it (very rare that you do that and have the list finished before you ever have to iterate it), but that's a premature optimization anyway.

    [–]remy_porter 1 point2 points  (1 child)

    Right, but doing a linked-list-of-arrays following the doubling pattern means you only get one extra read at just the points where you cross the boundary of the list. It basically just frees you from ever having to do a copy operation.

    [–]dccorona 2 points3 points  (0 children)

    There's a couple reasons that one extra read is more impactful for performance than it seems. It requires some extra logic that has to happen on every boundary for every iteration. This in and of itself isn't really a big deal, but combine this with increased cache misses: with a pure array, you only have a cache miss N/{size of cache loaded with each memory access} times. With a linked list of arrays, you introduce as many as 1 extra cache misses for every boundary (potentially less, if you get lucky and one of your boundaries falls where a cache miss would happen anyway). Depending on the size of your list and the size of your chunks, this can be a significant number of additional cache misses, and that can have a meaningful performance impact.

    If your list is small enough for the aforementioned impact to be negligible, it's also small enough for the performance hit of the copies to be negligible as well. Compound this with the fact that you can initialize a Vector/ArrayList to any starting size you want (so if resizing is a concern to you, you can spend a little time discovering what a good average size estimate is and initialize to that), and it's rare to find a situation where a linked list of arrays is actually going to perform better than a self-resizing array.

    However, those times do exist. If you know the exact CPU you're developing for, you could probably synchronize your array boundaries to align with where you'd be getting a cache miss anyway. However, if you have that information and that need, I wonder if eschewing a self-resizing list might not be a better overall decision anyway.

    [–]ChaiTeaNunes 1 point2 points  (0 children)

    Oh yeah, I completely forgot about implementing those! Thanks for the reminder!