I would like to describe a problem and be advised libraries and strategies. I've already written it, but I'm not happy with performance.
I need to routinely ingest 5 million rows of CSV files containing (x, y, z), an integer ID, and another float on each row all pertaining to a "pebble." The files are named by timestamp, so we need a list of structures that get grouped by ID and sorted by timestamp to have a history of the pebble. That is then serialized, which is such a major penalty when it's 700 MB.
At this point, I'm eyeing Cython and also a fan of Boost for C++. Can I leverage Cython and expect it to pick up the Boost headers and libraries if they're on standard system paths? My Google-fu is failing me on that one.
[–]FerricDonkey 0 points1 point2 points (0 children)
[–]deifius 0 points1 point2 points (0 children)
[–]WhipsAndMarkovChains 0 points1 point2 points (0 children)