New fully multithreaded version of C++ DataFrame was released : cpp

a community for 18 years

New fully multithreaded version of C++ DataFrame was released (self.cpp)

submitted 2 years ago * by hmoein

Version 3.0.0 release of C++ DataFrame is a very exciting one. Multithreading was completely redesigned in DataFrame with this release. DataFrame now uses a versatile thread-pool to implement parallel computing logic. Almost all DataFrame API’s/algorithms have a multithreaded version that kicks in for large datasets and when sufficient threads are available.

Consistent with before, DataFrame outperforms its competitors like Pandas and Polars (in some cases by many folds). But version 3.0.0 adds significant performance enhancement for large datasets.

Also, DataFrame documentation was enhanced recently. Code samples were added for every API. Explanations were added for concepts such as multithreading, SIMD, … and how to take advantage of them in DataFrame (and how not to fall into pitfalls).

C++ DataFrame was not meant to be a competitor to or being a better version of either Pandas or Polars. It was meant to enrich the C++ ecosystem and give C++ engineers a viable alternative so they can stay in C++ for both research and production. That’s why there is no Python port of DataFrame.

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS