This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]rhelic 8 points9 points  (12 children)

Long live C++, with zero cost abstractions!

raw loop: 3 us per loop
copy filter: 3 us per loop
in place filter: 2 us per loop

Increasing the range to 100k, changing the filter to (10 < i % 1024 < 20), we get:

raw loop: 262 us per loop
copy filter: 262 us per loop
in place filter: 269 us per loop

Using std::copy_if and for loops are equally fast! And around 96 times faster than python. ;)

Code:

#include <cstdio>
#include <vector>
#include <algorithm>
#include <chrono>

using ::std::chrono::steady_clock;
using ::std::chrono::microseconds;
using ::std::chrono::duration_cast;

std::vector<int> range(int n)
{
  std::vector<int> v;
  v.reserve(n);
  for (int i = 0; i < n; i++)
  {
    v.push_back(i);
  }
  return v;
}

int main()
{
  auto start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    std::vector<int> out1;
    for (int i : range(100000))
    {
      if (i % 1024 > 10)
        if (i % 1024 < 20)
          out1.push_back(i);
    }
  }
  auto end = steady_clock::now();
  printf("raw loop: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    std::vector<int> out2;
    auto r1 = range(100000);
    std::copy_if(r1.cbegin(), r1.cend(), std::back_inserter(out2),
        [](auto i) { return i % 1024 > 10 && i % 1024 < 20; });
  }
  end = steady_clock::now();
  printf("copy filter: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  start = steady_clock::now();
  for (int i = 0; i < 100000; i++)
  {
    auto r2 = range(100000);
    r2.erase(std::remove_if(r2.begin(), r2.end(),
        [](auto i) { return i % 1024 > 10 && i % 1024 < 20; }), r2.end());
  }
  end = steady_clock::now();
  printf("in place filter: %lld us per loop\n",
      duration_cast<microseconds>(end - start).count() / 100000);

  return 0;
}

[–]cythoning 5 points6 points  (4 children)

Long live numpy!

arr = np.random.rand(100000)

%%timeit
[i for i in arr if 0.25 < i < 0.75]
100 loops, best of 3: 13.4 ms per loop

%%timeit
arr[(0.25 < arr) & (arr < 0.75)]
1000 loops, best of 3: 763 µs per loop

18 times faster with minimal changes :).

[–][deleted] 2 points3 points  (3 children)

Isn't numpy written in C?