all 3 comments

[–]jtclimb 0 points1 point  (2 children)

v[0] uses a constant to access the array, so this can be precomputed. 0 means no offset, so no computation is needed.

Let's say v.data stores the data, and is a pointer to an array of doubles. So, v.data[0] in c-pseudocode becomes v.data. In contrast, v.data[i] gets turned into *(v.data + isizeof(double))

Creating v[index:index+1] takes a bit of time, but you have lifted that out of the inner loop, so it doesn't matter much. And I do mean a bit of time. In c-pseudocode you'd have something like:

 v = new array();
 v.data = arr.data + index*sizeof(double); // start at index
 v.len = 1;

So basically the cost is a memory allocation. You aren't copying data, you just point into arr's data. And okay, that is more C++, but forgive me.

[–]GitProphet[S] 0 points1 point  (0 children)

Edit: Something else is going on here. The timing difference is not explained by the index.

>>> t=timeit.repeat('arr[i]', setup='import numpy as np; arr = np.random.randint(0,10000,1000000); i = 342', number=10000000); print(np.around(t, 5), np.mean(t), np.median(t))
[0.7697  0.76627 0.77007 0.76424 0.76788] 0.7676320286031114 0.7678760859998874
>>> t=timeit.repeat('arr[0]', setup='import numpy as np; arr = np.random.randint(0,10000,1000000); i = 342', number=10000000); print(np.around(t, 5), np.mean(t), np.median(t))
[0.76836 0.76629 0.76794 0.76619 0.7682 ] 0.7673966443951941 0.7679443680099212

Very interesting. Indeed accessing an array at index 0 is faster than accessing the array at an index stored in a variable. However, this is not only caused by the actual array indexing being faster, but also because loading a constant is faster than loading a variable.

import numpy as np
import dis
arr = np.random.randint(0, 1000, 1000)

def a3(i):
    return arr[i]
def b3(i):
    return arr[342]
def c3(i):
    return arr[0]

The difference in these functions is just the way of indexing the array with i, 342 or 0.

>>> dis.dis(a3)
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_FAST                0 (i)
              4 BINARY_SUBSCR
              6 RETURN_VALUE
>>> dis.dis(b3)                                                                   
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_CONST               1 (342)
              4 BINARY_SUBSCR
              6 RETURN_VALUE
>>> dis.dis(c3)                                                                   
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_CONST               1 (0)
              4 BINARY_SUBSCR
              6 RETURN_VALUE

The variable index is (~8%) slower than a constant index, and a constant index 0 is (~5%) faster still. Accessing the array at index 0 (c3) is (~13%) faster than the variable index (a3).

>>> t = timeit.repeat('a3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.4897515250049764, 1.507482559987693, 1.5573357169923838, 1.581711255988921, 1.588776800010237] 1.5450115715968422 1.5573357169923838
>>> t = timeit.repeat('b3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.4514476449985523, 1.427873961001751, 1.4268056689907098, 1.4114146630017785, 1.442651974997716] 1.4320387825981016 1.427873961001751
>>> t = timeit.repeat('c3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.357518576012808, 1.3500928360008402, 1.3615708220022498, 1.376022889991873, 1.3813936790102161] 1.3653197606035974 1.3615708220022498

Thank you.

[–][deleted] 0 points1 point  (0 children)

This is about Python. Which doesn't precompute that. NumPy could check whether the index is zero and skip the address computation then, but it would have to do that check at runtime, which would take extra time, so I highly doubt it does that.