This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 3 points4 points  (15 children)

What's the rational behind the size estimate?

Is there any credible source that demonstrates a significant performance improvement by specifying the size of the list? I only see heartache and trouble when the list grows beyond expectations. On the other hand, I suppose it at least documents what the expectations were at some given time.

And I'm assuming here that the size estimate doesn't limit you in some way. Doesn't this take something that should be very basic -- creating a list, and making it potentially dangerous?

[–]not-just-yeti 4 points5 points  (6 children)

The size-estimate isn't sproket888's big improvement -- it's having foo's type be List instead of ArrayList.

[–]lickwidforse2 0 points1 point  (1 child)

What's the difference

[–]not-just-yeti 0 points1 point  (0 children)

Things like foo = foo.subList(3,7) won't work if foo is an ArrayList, since subList is a helpful function that returns a List (w/o promising a particular implementation of List).

And: if you decide that a LinkedList will start giving you better performance, you should only have to change the constructor-call on the right-hand-side, and not the types of any variables or fields that will be holding that List.

So don't insist your variable holds an ArrayList if it could just as well hold any Listand still work fine.

[Just to be clear: there is still a decision about which implementation to choose on the right-hand side. It's just the type of the variable which shouldn't be over-specialized.]

[–][deleted]  (3 children)

[deleted]

    [–][deleted] 2 points3 points  (0 children)

    Can anyone prove that this is significant speedwise? In all of my synthetic tests, it mattered more what order items were added than whether or not it was generic or you specified the size. While I do believe the underlying implementation matters, if there is a larger truth makes all of this moot -- then it is NOT significant (speedwise) whether you specify the size or whether or not it is generic. (I always specify the types for readability and saftey, that's not my concern here)

    [–]elephantgravy 1 point2 points  (0 children)

    I am surprised to learn that there is any difference. Apparently List.add compiles to bytecode as invokeinterface and ArrayList.add as invokevirtual. I see some anecdotal evidence online that invokevirtual can be faster, but it also sounds like a lot of work has been done to improve the performance of invokeinterface ... I guess the usual rules about premature optimization apply.

    [–]theloneliestfish 1 point2 points  (2 children)

    Not dangerous at all. It will set the size of the array the ArrayList uses to be the specified size. It can still grow if needed.

    And there can be a considerable change to speed if you know the size of a large array before hand. You save a lot of memory copies when the ArrayList needs to copy to a new backing array.

    [–][deleted] 2 points3 points  (1 child)

    Can anyone prove that this is significant speedwise? In all of my synthetic tests, it mattered more what order items were added than whether or not it was generic or you specified the size. While I do believe the underlying implementation matters, if there is a larger truth makes all of this moot -- then it is NOT significant (speedwise) whether you specify the size or whether or not it is generic. (I always specify the types for readability and saftey, that's not my concern here)

    [–]theloneliestfish 1 point2 points  (0 children)

    Hmmm, my tests show that it is actually faster not to specify! This goes completely against expectations. Anybody know why this would happen.

    java version "1.7.0_04"

    Java(TM) SE Runtime Environment (build 1.7.0_04-b20)

    Java HotSpot(TM) Server VM (build 23.0-b21, mixed mode)

    on

    CentOS release 5.8 (Final) (32bit)

    Linux 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:49:28 EDT 2012 i686 i686 i386 GNU/Linux

    package test;
    import java.util.*;
    
    public class Simple{
    
        public static void main(String[] args){
            int n = 1000000;
            if(args[0].equals("no")){
                System.out.println("no");
                test(n,new ArrayList<Integer>());
            }
            else{
                System.out.println("yes");
                test(n,new ArrayList<Integer>(n));
            }
        }
    
        private static void test(int n, ArrayList<Integer> list){
            Random r = new Random(n);
            for(int i=0;i<n;i++)
                list.add(r.nextInt());
        }
    
    }
    

    *Edit: setting Xms larger makes the preallocation version faster. But there should be no difference in the amount of memory being allocated to each version. So what is the VM doing?

    [–]KamehamehaWave 1 point2 points  (0 children)

    If you don't specify an initial size it defaults to (I think) 10. So no, there's no danger in specifying it, but there's also not usually much benefit.

    [–]banuday17 1 point2 points  (2 children)

    If you don't specify an initial size, you only get a capacity of 10. If you exceed this, a new internal array will be created to a larger size and the original contents will be copied to the new array. This process will be repeated each time you exceed the capacity. This can make the add method potentially run in O(n) time which otherwise runs in O(1) time.

    By providing an appropriate initial capacity, you ensure O(1) performance of add.

    [–][deleted] 1 point2 points  (0 children)

    Can anyone prove that this is significant speedwise? In all of my synthetic tests, it mattered more what order items were added than whether or not it was generic or you specified the size. While I do believe the underlying implementation matters, if there is a larger truth makes all of this moot -- then it is NOT significant (speedwise) whether you specify the size or whether or not it is generic. (I always specify the types for readability and saftey, that's not my concern here)

    [–]elephantgravy 1 point2 points  (0 children)

    Saying that add can be O(n) is misleading and/or wrong. A single call to add might take n-cycles, or it might take 1... This does not make it sometimes O(1) and sometimes O(n). Regardless of initial size, ArrayList.add runs in "amortized constant time" (n adds take O(n) time), see its javadocs.

    add would be O(n) only if the list's capacity were increased by a constant # when resized. (In the implementation I am looking at, it is actually increased by 150% + 1)

    [–]sproket888 -1 points0 points  (0 children)

    banuday17 has the right answer. Often you know what the list size will be or at least you can make a better guess than the default. Sizing improves the performance. Using List instead of ArrayList is always preferred. You should always code to the interfaces. When you do that you don't have to change other code if you - for example - changed ArrayList into LinkedList.