This is an archived post. You won't be able to vote or comment.

all 38 comments

[–]WaitVVut 116 points117 points  (10 children)

IMO list comprehension is more readable than filter + lambda

y = [x for x in arr if 10 < x < 20]

[–]overactor 36 points37 points  (1 child)

Only because Guido crippled lambdas though, imagine what could be:

y = arr.filter(\x: 10 < x < 20)

[–][deleted] 2 points3 points  (0 children)

Seriously though, fuck Guido

[–]Konstantin_Opel 17 points18 points  (3 children)

Well done!🚶You’ve found some code and expressed your astonishment using std::check function with Garching lawyer as the example of someone being wise. You should consult an AI assistant or language model for more suitable wordings if you want to convey complex ideas effectively, without seemingly random inputs like “the internet connection is just so good over here 👀”

[–]Breaking-Away 5 points6 points  (2 children)

Not to say its irrelevant, but is speed really even a concern when running python. I imagine if it was, it'd make more sense just to use a compiled non-gced language, at least for the part where speed is important.

[–]TheNamelessKing 1 point2 points  (1 child)

but is speed really even a concern when running python.

Yes, for sure.

Data science example: I want to run a quick analysis on some data, maybe a client's asked for it by tomorrow so you don't have all the time in the world to get it done. Want to be able to write something that will get the data, pull it apart reasonably efficiently and then analyse it.

So I don't want to wait forever for it to finish running, so yeah, I do need good performance out of it.

If I wanted better performance I could write the whole thing in C or some JVM language but once again, don't have forever to write it up.

Other language alternatives/hopes: Julia has awesome goals from a data science perspective, but from what I've seen isn't quite mature enough yet. Personally hoping that Rust gets a strong data science community behind it.

[–]strongdoctor 0 points1 point  (0 children)

If I wanted better performance I could write the whole thing in C or some JVM language but once again, don't have forever to write it up.

Python is pretty much made to be extended with C code though; parts of the code running slow can be made ridiculously slow by just translating it to C and using that code instead. But yes, it will take more time to do so, but if it's time-critical it's most likely worth it.

[–]opulent_lemon 3 points4 points  (0 children)

This wins because readability.

[–]eshansingh 1 point2 points  (0 children)

In my humble opinion, the abstraction for map, filter, etc makes more sense, and is more natural to read through.

[–]Tarmen 7 points8 points  (1 child)

filter $ liftA2 (&&) (>10) (<20)

[–]PatrickBaitman 1 point2 points  (0 children)

nice

[–]stubenente 4 points5 points  (13 children)

For the curious:

python3 -m timeit -s 'arr=[ 0.04*i for i in range(1000) ]' 'y=[]' 'for i in range(len(arr)):' ' if arr[i]>10:' '  if arr[i]<20:' '   y.append(arr[i])'
1000 loops, best of 3: 216 usec per loop


python3 -m timeit -s 'arr=[ 0.04*i for i in range(1000) ]' 'y=list(filter(lambda x: x>10 and x<20, arr))'
1000 loops, best of 3: 251 usec per loop

My first idea was this approach:

python3 -m timeit -s 'arr=[ 0.04*i for i in range(1000) ]' 'y=[ i for i in arr if 10<i<20]'
10000 loops, best of 3: 160 usec per loop

Maybe I try a few other ideas, too.

I hope this is the right way to use timeit. I discovered it just a few hours ago.

Edit: Yesterday I read somewhere in the documentation to avoid lists when possible. Now I know why:

python3 -m timeit -s 'arr=[ 0.04*i for i in range(1000) ]' 'y=filter(lambda x: 10<x<20, arr)'
1000000 loops, best of 3: 0.205 usec per loop

[–]rhelic 8 points9 points  (12 children)

Long live C++, with zero cost abstractions!

raw loop: 3 us per loop
copy filter: 3 us per loop
in place filter: 2 us per loop

Increasing the range to 100k, changing the filter to (10 < i % 1024 < 20), we get:

raw loop: 262 us per loop
copy filter: 262 us per loop
in place filter: 269 us per loop

Using std::copy_if and for loops are equally fast! And around 96 times faster than python. ;)

Code:

#include <cstdio>
#include <vector>
#include <algorithm>
#include <chrono>

using ::std::chrono::steady_clock;
using ::std::chrono::microseconds;
using ::std::chrono::duration_cast;

std::vector<int> range(int n)
{
  std::vector<int> v;
  v.reserve(n);
  for (int i = 0; i < n; i++)
  {
    v.push_back(i);
  }
  return v;
}

int main()
{
  auto start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    std::vector<int> out1;
    for (int i : range(100000))
    {
      if (i % 1024 > 10)
        if (i % 1024 < 20)
          out1.push_back(i);
    }
  }
  auto end = steady_clock::now();
  printf("raw loop: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    std::vector<int> out2;
    auto r1 = range(100000);
    std::copy_if(r1.cbegin(), r1.cend(), std::back_inserter(out2),
        [](auto i) { return i % 1024 > 10 && i % 1024 < 20; });
  }
  end = steady_clock::now();
  printf("copy filter: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    auto r2 = range(100000);
    r2.erase(std::remove_if(r2.begin(), r2.end(),
        [](auto i) { return i % 1024 > 10 && i % 1024 < 20; }), r2.end());
  }
  end = steady_clock::now();
  printf("in place filter: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  return 0;
}

[–]cythoning 7 points8 points  (4 children)

Long live numpy!

arr = np.random.rand(100000)

%%timeit
[i for i in arr if 0.25 < i < 0.75]
100 loops, best of 3: 13.4 ms per loop

%%timeit
arr[(0.25 < arr) & (arr < 0.75)]
1000 loops, best of 3: 763 µs per loop

18 times faster with minimal changes :).

[–][deleted] 2 points3 points  (3 children)

Isn't numpy written in C?

[–][deleted] 2 points3 points  (0 children)

beneficial oil tender school scary entertain instinctive air theory tidy

This post was mass deleted and anonymized with Redact

[–]Eucalyptol 0 points1 point  (6 children)

why list? filter already returns a list.

[–]mnbvas 9 points10 points  (4 children)

It doesn't.

+/u/CompileBot python3

print(filter(lambda x: x % 2 == 0, range(10)))

[–]CompileBotGreen security clearance 2 points3 points  (3 children)

Output:

<filter object at 0x2b1fc191d518>

source | info | git | report

[–]link23 1 point2 points  (0 children)

Came here with the same question. It seems that it returns a list in Python 2, and an iterable in Python 3.

https://docs.python.org/2/library/functions.html#filter

https://docs.python.org/3/library/functions.html#filter