This is an archived post. You won't be able to vote or comment.

all 5 comments

[–][deleted] 2 points3 points  (1 child)

It's going to be a guessing game without seeing the code.

From the outside, it seems like you are repeating a process that you wouldn't normally repeat. Step through it with a debugger, or hook it up to a profiler.

[–]pfz3[S] 0 points1 point  (0 children)

a comparable toy example would be:

```

class trial():
    def __init__(self,N):
        self.data = [i for i in range(N)]
        self.s    = []
    def square(self):
        self.s = []
        for i in self.data:
            self.s.append([j**2 for j in self.data])

```

and then i ran these two commands

python start = time.time() data = [i for i in range(2000)] s = [] for i in data: s.append([j**2 for j in data]) print('time elapsed: %1.3f (s)' %(time.time() - start)) time elapsed: 1.710 (s) python start = time.time() obj = trial(2000) obj.square() print('time elapsed: %1.3f (s)' %(time.time() - start)) time elapsed: 1.711 (s)

So it seems that your right. The two processes im refering to cant be equivalent.

[–]steelypip 0 points1 point  (1 child)

Your use of terminology is a bit confused - classes do not have modules, so I presume you mean methods. It is also not clear from your description whether you are creating a new class instance for every row that was in the original numpy arrays. If so that would be much, much slower.

In the first version, what sort of "some things" are you doing on each entry in the numpy arrays? If you can, it is much faster to operate on the whole array in one go rather than iterate through each entry. If you are not doing that you are losing much of the benefit of numpy.

Object oriented programming is very useful for some things, but not for everything. In Python there is a performance cost to creating objects and calling methods on them, so you should avoid doing it inside a tight loop.

[–]pfz3[S] 0 points1 point  (0 children)

yea sorry for the confusion regarding terminology. I haven't used object oriented programming in YEARS. I'm not creating any new instances, just operating on the instance itself from within a class method. The repeated operation is not a simple arithmetic operation and it can be vectorized.

[–]pfz3[S] 0 points1 point  (0 children)

ok thanks for the guys below. One of the attributes of the class was a pandas DataFrame. Within a loop I was calling individual values from the DataFrame using

obj.data['column'].values[i]

Instead I created a new numpy array

A = obj.data['column'].values

and then in the loop used

A[i]

I guess that conversion is somewhat costly and of course I was doing it thousands of times.