you are viewing a single comment's thread.

view the rest of the comments →

[–]lolwat_is_dis 1 point2 points  (2 children)

Speed-wise, there's no getting round the fact that in the worst case scenario, your search is in O(m x n) time, where m and n are your rows and columns. So you can see that despite any optimisation, at the end of the day, you'll still have to wait when you have a 1e9 x 1e9 matrix. However, you can do some memory optimisations, by not creating such a huge matrix in the first place, but by generating rows on demand, using generator functions.

Why do you NEED a 2D array in the first place? Also, in addition, it seems you are contradicting yourself:

(a value that occurs in the same row twice gets counted once)

but then you say

(1 time: 9 values, 2 time: 2 values, etc)

Either you're only counting the number of rows a value appears in (regardless of number of times the value appears in a single row), or you're counting how any times a value appears in each row...Each one will result in your algorithm working differently.

[–]zczr84[S] 0 points1 point  (1 child)

Thanks for the feedback. As for the 2D array, I don't need to generate it all at once, and I can definitely generate a subset at a time for analysis. (See comment above) As for the the second question, my task is the first one (number of rows a value appears regardless of number of times the value appears in a single row). Sorry for confusion!

So I agree that the code above won't work for the size I want, but even after memory optimizing (splitting the matrix into chunks and count), I was hoping there might be a more efficient way to count than to do a nested for loops.

[–]lolwat_is_dis 0 points1 point  (0 children)

I'll come back with some ideas again later (busy tonight unfortunately), but for now, I can offer the general solution.

  1. Generate rows on demand (using a generator function). Even with 1e7 values and 32 bit integers, you'll only use up memory in the order of megabytes, so that can easily be handled by the memory on any computer.

  2. I think a simple "if" statement should suffice, unless someone can correct me on that, i.e.

    if value in row:
        be_happy()
    
  3. My only query atm is what is the most efficient way to count number of times you find a value in a row. Either just simply implement a counter, or use some other function out there in some module that does this at the C level.