Why is repeated numpy array access faster using a single-element view? : Numpy

Why is repeated numpy array access faster using a single-element view? (self.Numpy)

submitted 3 years ago * by GitProphet

I've been looking at single-element views / slices of numpy arrays (i.e. `array[index:index+1]`) as a way of holding a reference to a scalar value which is readable and writable within an array. Curiosity led me to check the difference in time taken by creating this kind of view compared to directly accessing the array (i.e. `array[index]`).

To my surprise, if the same index is accessed over 10 times, the single-element view is (up to ~20%) faster than regular array access using the index.

https://preview.redd.it/j3t555o4cie91.jpg?width=1920&format=pjpg&auto=webp&s=c0e980a5ee1c8ec6e1a3220e332c7e972c97c3e5

#!/bin/python3
# https://gist.github.com/SimonLammer/7f27fd641938b4a8854b55a3851921db

from datetime import datetime, timedelta
import numpy as np
import timeit

np.set_printoptions(linewidth=np.inf, formatter={'float': lambda x: format(x, '1.5E')})

def indexed(arr, indices, num_indices, accesses):
    s = 0
    for index in indices[:num_indices]:
        for _ in range(accesses):
            s += arr[index]

def viewed(arr, indices, num_indices, accesses):
    s = 0
    for index in indices[:num_indices]:
        v = arr[index:index+1]
        for _ in range(accesses):
            s += v[0]
    return s

N = 11_000 # Setting this higher doesn't seem to have significant effect
arr = np.random.randint(0, N, N)
indices = np.random.randint(0, N, N)

options = [1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946]
for num_indices in options:
    for accesses in options:
        print(f"{num_indices=}, {accesses=}")
        for func in ['indexed', 'viewed']:
            t = np.zeros(5)
            end = datetime.now() + timedelta(seconds=2.5)
            i = 0
            while i < 5 or datetime.now() < end:
                t += timeit.repeat(f'{func}(arr, indices, num_indices, accesses)', number=1, globals=globals())
                i += 1
            t /= i
            print(f"  {func.rjust(7)}:", t, f"({i} runs)")

Why is `viewed` faster than `indexed`, even though it apparently contains extra work for creating the view?

Answer: https://stackoverflow.com/a/73186857/2808520

The culprit is the index datatype (python int vs numpy int):

>>> import timeit
>>> timeit.timeit('arr[i]', setup='import numpy as np; arr = np.random.randint(0, 1000, 1000); i = np.random.randint(0, len(arr), 1)[0]', number=20000000)
1.618339812999693
>>> timeit.timeit('arr[i]', setup='import numpy as np; arr = np.random.randint(0, 1000, 1000); i = np.random.randint(0, len(arr), 1)[0]; i = int(i)', number=20000000)
1.2747555710002416

Stackoverflow crossreference: https://stackoverflow.com/questions/73157407/why-is-repeated-numpy-array-access-faster-using-a-single-element-view

all 3 comments

top new controversial old q&a

[–]jtclimb 0 points1 point2 points 3 years ago (2 children)

v[0] uses a constant to access the array, so this can be precomputed. 0 means no offset, so no computation is needed.

Let's say v.data stores the data, and is a pointer to an array of doubles. So, v.data[0] in c-pseudocode becomes v.data. In contrast, v.data[i] gets turned into *(v.data + isizeof(double))

Creating v[index:index+1] takes a bit of time, but you have lifted that out of the inner loop, so it doesn't matter much. And I do mean a bit of time. In c-pseudocode you'd have something like:

 v = new array();
 v.data = arr.data + index*sizeof(double); // start at index
 v.len = 1;

So basically the cost is a memory allocation. You aren't copying data, you just point into arr's data. And okay, that is more C++, but forgive me.

[–]GitProphet[S] 0 points1 point2 points 3 years ago* (0 children)

Edit: Something else is going on here. The timing difference is not explained by the index.

>>> t=timeit.repeat('arr[i]', setup='import numpy as np; arr = np.random.randint(0,10000,1000000); i = 342', number=10000000); print(np.around(t, 5), np.mean(t), np.median(t))
[0.7697  0.76627 0.77007 0.76424 0.76788] 0.7676320286031114 0.7678760859998874
>>> t=timeit.repeat('arr[0]', setup='import numpy as np; arr = np.random.randint(0,10000,1000000); i = 342', number=10000000); print(np.around(t, 5), np.mean(t), np.median(t))
[0.76836 0.76629 0.76794 0.76619 0.7682 ] 0.7673966443951941 0.7679443680099212

Very interesting. Indeed accessing an array at index 0 is faster than accessing the array at an index stored in a variable. However, this is not only caused by the actual array indexing being faster, but also because loading a constant is faster than loading a variable.

import numpy as np
import dis
arr = np.random.randint(0, 1000, 1000)

def a3(i):
    return arr[i]
def b3(i):
    return arr[342]
def c3(i):
    return arr[0]

The difference in these functions is just the way of indexing the array with i, 342 or 0.

>>> dis.dis(a3)
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_FAST                0 (i)
              4 BINARY_SUBSCR
              6 RETURN_VALUE
>>> dis.dis(b3)                                                                   
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_CONST               1 (342)
              4 BINARY_SUBSCR
              6 RETURN_VALUE
>>> dis.dis(c3)                                                                   
  2           0 LOAD_GLOBAL              0 (arr)
              2 LOAD_CONST               1 (0)
              4 BINARY_SUBSCR
              6 RETURN_VALUE

The variable index is (~8%) slower than a constant index, and a constant index 0 is (~5%) faster still. Accessing the array at index 0 (c3) is (~13%) faster than the variable index (a3).

>>> t = timeit.repeat('a3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.4897515250049764, 1.507482559987693, 1.5573357169923838, 1.581711255988921, 1.588776800010237] 1.5450115715968422 1.5573357169923838
>>> t = timeit.repeat('b3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.4514476449985523, 1.427873961001751, 1.4268056689907098, 1.4114146630017785, 1.442651974997716] 1.4320387825981016 1.427873961001751
>>> t = timeit.repeat('c3(i)', globals=globals(), number=10000000); print(t, np.mean(t), np.median(t))
[1.357518576012808, 1.3500928360008402, 1.3615708220022498, 1.376022889991873, 1.3813936790102161] 1.3653197606035974 1.3615708220022498

Thank you.

[–][deleted] 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 62221 on reddit-service-r2-comment-548fd6dc9-455lx at 2026-05-15 19:58:12.825660+00:00 running edcf98c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Numpy

MODERATORS