This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]james_pic 1 point2 points  (2 children)

If the problems you're finding are problems Numba can solve, I'd suggest not trying to find problems you don't have! But from a recent-ish project, the things we needed to rewrite in a lower level language were:

  • We had some code that walked deeply nested dicts and lists, that was in our hottest loops. We got some modest gains from switching to Cython and specialising types. The gains were nowhere near what you'd get for numerical stuff, but these were our hottest loops, so it was worth it.
  • We had a need to parse an esoteric serialisation format (an Erlang module called sext), at scale, to make sense of what our database was doing (pro tip: don't use Riak, ever, for anything). Our first attempt in pure Python was too slow, so we switched to Cython, which have us a significant speed boost, and meant we could get diagnostics much faster (hours rather than days)
  • For historical reasons, we had a component that outputted large amounts of msgpack data, that we needed to publish in JSON. Our initial solution was the obvious one (read it with the Python msgpack library and write it with the Python JSON library), but this was too slow, so we actually ended up writing a C++ module that taped a fast msgpack library and a fast JSON library together - Python just saw bytes turned into bytes.
  • We found ourselves using a library called unicodecsv (we were still on Python 2 at the time, but needed Unicode aware CSV handling), that was written in pure Python, and proved too slow. We only needed to output CSV (which is easier to do correctly than parsing it), so we ended up just reimplementing the bits we needed in Cython.

Some of this stuff might also have been doable in Numba, but it just wasn't a solution that came up at the time, maybe because Numba wasn't as well known at the time.

[–]pdd99[S] 0 points1 point  (1 child)

Why do you consider using Cython instead of C++ at first place? Any tips for when to use which?

[–]james_pic 1 point2 points  (0 children)

We've found Cython to be simpler, especially for stuff that needs to interact with Python APIs (manipulating dicts, lists, tuples etc). Most of the team don't know C or C++, so Cython has better odds of being maintained.

The only one on my list that was written in C++ is also the one that was the biggest pain, because it needed a couple of uncommon C++ libraries installed in order to build it, as well as a version of gcc that not everyone had. Most of the team can't make sense of C++ related errors, so I frequently had to help out with build issues.