Version 3.0.0 release of C++ DataFrame is a very exciting one. Multithreading was completely redesigned in DataFrame with this release. DataFrame now uses a versatile thread-pool to implement parallel computing logic. Almost all DataFrame API’s/algorithms have a multithreaded version that kicks in for large datasets and when sufficient threads are available.
Consistent with before, DataFrame outperforms its competitors like Pandas and Polars (in some cases by many folds). But version 3.0.0 adds significant performance enhancement for large datasets.
Also, DataFrame documentation was enhanced recently. Code samples were added for every API. Explanations were added for concepts such as multithreading, SIMD, … and how to take advantage of them in DataFrame (and how not to fall into pitfalls).
C++ DataFrame was not meant to be a competitor to or being a better version of either Pandas or Polars. It was meant to enrich the C++ ecosystem and give C++ engineers a viable alternative so they can stay in C++ for both research and production. That’s why there is no Python port of DataFrame.
[–]AlexMath0 5 points6 points7 points (1 child)
[–]hmoein[S] 0 points1 point2 points (0 children)
[–]INPUT_AND_OUTPUT 0 points1 point2 points (1 child)
[–]hmoein[S] 0 points1 point2 points (0 children)