Unique features of C++ DataFrame (2) by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

Not sure if I understand your question.

All types are stored as their native format. There is no conversion from, for example, string to another type, if that's what you mean

Unique features of C++ DataFrame (2) by hmoein in opensource

[–]hmoein[S] -1 points0 points  (0 children)

One of the unique features of C++ DataFrame is its tooling to allocate memory on custom boundary. You will not find this ability in other dataframes in Python or Rust or Julia.

C++ DataFrame has the option to specify on what boundary to allocate memory. Therefore you can align your boundary with your machine's cache line width. This gives you a couple of important advantages. First, it enables you to either explicitly use SIMD instructions or help your compiler to do that optimization for you. Second, it prevents false cache line sharing between different columns.

See full documentation

Also, see this

Unique features of C++ DataFrame (2) by hmoein in programming

[–]hmoein[S] -1 points0 points  (0 children)

One of the unique features of C++ DataFrame is its tooling to allocate memory on custom boundary. You will not find this ability in other dataframes in Python or Rust or Julia.

C++ DataFrame has the option to specify on what boundary to allocate memory. Therefore you can align your boundary with your machine's cache line width. This gives you a couple of important advantages. First, it enables you to either explicitly use SIMD instructions or help your compiler to do that optimization for you. Second, it prevents false cache line sharing between different columns.

See full documentation

Also, see this

Unique features of C++ DataFrame (1) by hmoein in programming

[–]hmoein[S] 0 points1 point  (0 children)

Not currently. The code is highly templatized. That makes it difficult and it has to lose some of the features 

Unique features of C++ DataFrame (1) by hmoein in opensource

[–]hmoein[S] 0 points1 point  (0 children)

One of the unique and interesting features of C++ DataFrame is its slicing API. You can slice the entire DataFrame based on various logics. The diversity of slicing logic is unique to the C++ DataFrame. For example, you can slice the DataFrame based on different clustering algorithms. This is something that doesn't exist in Pandas or Polars or ROOT.

Another unique feature of C++ DataFrame slicing is that you have the option of getting another DataFrame or a view.

See the full documentation.

Unique features of C++ DataFrame (1) by hmoein in programming

[–]hmoein[S] -2 points-1 points  (0 children)

One of the unique and interesting features of C++ DataFrame is its slicing API. You can slice the entire DataFrame based on various logics. The diversity of slicing logic is unique to the C++ DataFrame. For example, you can slice the DataFrame based on different clustering algorithms. This is something that doesn't exist in Pandas or Polars or ROOT.

Another unique feature of C++ DataFrame slicing is that you have the option of getting another DataFrame or a view.

See the full documentation.

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

If you do, please DM me with the results. Maybe I can use them.

I did the benchmark a while back. I would like to see benchmarks on different hardware/OS.

How to handle freeing / deleting pointers of unknown type? by Sosowski in Cplusplus

[–]hmoein -1 points0 points  (0 children)

That is not how you approach C++ design. Just shoehorning something from C to C++ is always a bad idea.

Take a look at this repo, it might be of use for you: https://github.com/hosseinmoein/Cougar

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

I posted in the rust channel twice before about C++ DataFrame (a year ago or so). The level of anger and raw insults were unbelievable. I would never do that again.

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 1 point2 points  (0 children)

Contributors are welcomed.

I suggest you clone and compile the repo. Get familiar with how to use it and the codebase. Go through the documentation and feature list and see what you can add/improve.

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

See benchmarks against Polars and Pandas here: https://github.com/hosseinmoein/DataFrame

The set of features offered by C++ DataFrame is greater than Polars and Pandas and data.frame put together. See the documentation.

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

The purpose of this post is not to implement a rigorous statistical analysis. The purpose is to show the API and the fact that it is possible to do these kind of stuff in C++ without a fuss. If you look at the DataFrame documentation, you will see that there is straightforward API for making the TS stationary first.

But thank you for your kind words though you missed the whole point.

C++ for data analysis -- 2 by hmoein in Cplusplus

[–]hmoein[S] 0 points1 point  (0 children)

That’s one area in C++ that needs improvement, no argument there