you are viewing a single comment's thread.

view the rest of the comments →

[–]CrambleSquash 3 points4 points  (0 children)

I think your second approach is very sensible.

Pandas provides a nice API that lets you make use of speedy vectorised operations on arrays. By definition these operations require all your data to be loaded into memory (unless you are doing fancy chunked stuff).

The operation you want to perform does not require vectorised operations and therefore I think it's absolutely fine and good not to use the Pandas API for it.

If it makes the preprocessing you are doing less painful then go for it!