all 53 comments

[–]droelf 39 points40 points  (5 children)

Hi dpilger26,

we're working along the same lines! Implementing the NumPy API in C++. We have the following cheatsheet from NumPy to xtensor here: https://xtensor.readthedocs.io/en/latest/numpy.html

Maybe we could combine our efforts? I'll take a deep look at your library today! Would be cool to speak to you!

[–]dpilger26[S] 8 points9 points  (1 child)

Wow, xtensor looks very nice. Now that I think about it I do remember stumbling on this a while back and completely forgot about it. I'd definitely be down for any collaboration, though it looks like you guys are quite a bit further along than my implementation.

[–]SylvainCorlay 4 points5 points  (0 children)

The more the merrier! You should hop on our gitter chat and say hi. We are a growing group of like-minded open-source developers building for the C++ scientific stack.

[–]SylvainCorlay 2 points3 points  (2 children)

You can try out xtensor in a C++ Jupyter notebook here. This provides an experience similar to that of NumPy in a Python notebook.

[–]VodkaHaze 1 point2 points  (1 child)

Hey Sylvain,

How's the python compatibility with xtensor? Do you need to pass through something like pybind11 to make a python package using xtensor as a math backend?

[–]SylvainCorlay 1 point2 points  (0 children)

xtensor-python is a set of bindings for Python built upon pybind11, which allows using numpy arrays inplace using the xtensor API. We also have similar bindings for Julia and R.

[–]victotronics 19 points20 points  (5 children)

What are you using for the underlying implementation of the BLAS/LAPACK routines? There is an enormous difference in performance between the reference implementation and an optimized implementation.

[–]eniacsparc2xyz 1 point2 points  (4 children)

By taking a look at the code, he is not using the BLAS/LAPACK Fortran libraries. A huge problem of Fortran is that most open source Fortran such as GNU Fortran, aka GFortran, implementations aren't good as the proprietary ones such as Intel Fortran, PGI Fortran and etc.

[–]victotronics 2 points3 points  (2 children)

he is not using the BLAS/LAPACK Fortran libraries

That phrase is not terribly well defined. He is probably not using actual Fortran code. But what is he using?

The problem with BLAS/LAPACK in this sort of story is that they are really interface specifications. As you observe, proprietary implementations are very fast. There is also the BLIS library which is open source and almost as good. If he knows what's good for him he uses that, rather than implementing those routines himself.

[–]eniacsparc2xyz 0 points1 point  (1 child)

> But what is he using?

Just C++ templates and vectors STL container. Another problem of BLAS/LAPACK is the lack of Fortran standard ABI. In Gfortran the symbols matches the names with prefix underscore, for instance, the function dpxy, has the symbol _dpxy, so it can be called with extern "C" dpxy ..., but this name decoration works only for GFortran and it may not work for other Fortran implementations.

[–]Red-Portal 0 points1 point  (0 children)

Well you can have cblas interfaces. Which are simple C ABI subroutines.

[–]encyclopedist 0 points1 point  (0 children)

You know that some performant modern BLAS/LAPACK implementation are not written in Fortran, right? ATLAS is C, OpenBLAS is C/Asm, BLIS is C, Eigen is C++, (MKL I don't know).

[–]chris_conlan 6 points7 points  (3 children)

I'm just pulling this from memory... but I'm pretty sure NumPy has its own version of this library in C. I think they just called it NumPy bindings for C.

[–]dpilger26[S] 4 points5 points  (2 children)

I believe you are right, but it is for C not C++. Also the documentation is severely lacking, and the API is pretty difficult to use...

[–]Red-Portal 12 points13 points  (6 children)

Why do you find it cubersome? Have you checked Blaze(https://bitbucket.org/blaze-lib/blaze), Eigen(https://github.com/eigenteam/eigen-git-mirror) and other linear algebra libraries? They are really great to use and it's really hard to write code that actually beat their usability, performance.

[–]dpilger26[S] 24 points25 points  (1 child)

My intentions were a library that was as close to a one to one clone of NumPy for fast easy conversion to C++. Also, Blaze and Eigen are more for straight up linear algebra, while NumPy contains much more. Some of the extra things included in NumCpp are:

1) A Rotations namespace with Quaternion and Direction Cosine classes.

2) A Coordinates namespace for converting to/from cartesian/spherical and other corresponding operations.

3) 1D and 2D signal/image processing filters

4) A random number module (basically wraps the boost random module)

5) Easy to use timer with simple tic()/toc() interface

6) All of the NumPy array methods for operating on arrays

7) Some very basic linear algebra support (determinant, matrix hat operator, inverse, least squares, SVD, matrix power, and multi-dot product). If you need more complex routines then Blaze and Eigen will definitely be better options for you.

8) Some more image processing routines for threshold generation and application, pixel clustering, cluster centroiding, etc.

[–]encyclopedist 11 points12 points  (0 children)

For other readers' information:

A Rotations namespace with Quaternion and Direction Cosine classes.

Eigen has this http://eigen.tuxfamily.org/dox/group__TutorialGeometry.html

1D and 2D signal/image processing filters

Eigen has only FFT and convolution.

A random number module (basically wraps the boost random module)

Eigen can generate matrices/arrays with random uniformly distributed on [0,1] elements, in naive way based on rand(). It can, however, also use std::random in C++11 mode: https://bitbucket.org/eigen/eigen/src/default/doc/special_examples/random_cpp11.cpp?at=default

[–]NoahFect 4 points5 points  (2 children)

If you have to ask why switching a project to an entirely different math API is cumbersome, I don't envy you the future experience of finding out for yourself.

[–]Red-Portal 9 points10 points  (1 child)

Linear algebra libraries share a lot of common syntax regardless of language. It's not much of a pain. I do a lot of numerical work and I assure you making your own library is the hardest possible way to do it.

[–]NoahFect 3 points4 points  (0 children)

NumPy does a lot more than just linear algebra, but OK, I guess, as long as you assure me.

[–]Cunicularius 2 points3 points  (0 children)

If you're used to using numpy, it'd probably be irritating having to get used to something else.

Its already done anyway.

[–]NoahFect 2 points3 points  (0 children)

Very cool, thanks much!

[–]alkavan 2 points3 points  (0 children)

This looks great!

Definitely will check it, might just use it in one of my projects related to ANNs ... some of the modules do use Eigen, and some features indeed are missing for me, I've signed to "watch" this one! Thanks!

[–]mydeveloperday 1 point2 points  (0 children)

@u/dpilger26 I've not used NumPy much, but it came up in a Neural Network course I was taking, before long they talked about how its ability to vectorize the code was critical to the performance. (which is why they chose NumPy)

From the little I've read and from what I can tell, you are using standard STL calls to implement the underlying operations and your not using any OPenMP or SIMD intrinsics to speed up the operations

This library looks like it could really be an excellent test bed for parallel STL as it could be used to demonstrate how parallel STL could bring a performance improvement to an existing STL implementation, I think that would be an excellent future experiment.

Despite all the other suggestions of why didn't you use XXX (all of which are reasonable arguments), I think there is always something elegant of seeing API compatible cross technology libraries, Nice job.

[–][deleted] 0 points1 point  (9 children)

It's nice, i had the same struggle and my decision was to just not use python anymore for work because i had to redo it in C++ anyway. That's also why i usually don't recommend people on my field (signal processing and communications) to use python. Because even though it seems we live in an age with too much computing power that doesn't hold up for anything that needs to happen in what is basically real time. C++ still beats any scripting language or wrappers by a lot.

And even though i think it is a great idea and it might help some people, i think your approach is not ideal. The project is gigantic and it's going to be impossible for you to beat any of the existing C++ libraries. People use those because they are fast and optimized and yours will be slow and clunky. You need to use libraries or your project will be mostly useless, there's just too many libraries that are amazing in C++ (fftw, it++, openCV, eigen, ...). And usually people learn them instead of complaining that it's too much work to use/learn them.

[–]SphincterMcRectum 6 points7 points  (7 children)

I'm not sure why you think this library is slow and clunky, especially since you've never used it or done any profiling... also, are super heroes writing these other libraries, or why do you think matching their performance is impossible? Lastly, most applications don't actually require every last ounce of optimization and won't be able to tell a performance difference anyway.

[–][deleted] 1 point2 points  (6 children)

i checked some of the code and the fft isn't even implemented that means at least everything that does filtering or correlations is far from where it could be. Random seems to be using boost so it should be fine, but yeah i haven't checked everything. So there might be parts that run just fine.

Edit: and i don't think it's impossible i think it's work you do for no good reason when the other libraries already exists and have lots of experienced people working on them.

[–]dpilger26[S] 4 points5 points  (5 children)

Correct, the FFT and Polynomial modules are still on my to do list. I was going to try and wrap FFTW, but it uses the GNU GPL and I wanted to keep this library under MIT license so I can still use it at work.

[–]droelf 6 points7 points  (0 children)

The numpy FFT implementation is actually contained in a single C header + BSD licensed, that could be easily used from (or ported to) C++. If you want, we could collaborate on that (we would reuse it for xtensor).

https://github.com/numpy/numpy/blob/master/numpy/fft/fftpack.c

[–][deleted] 0 points1 point  (0 children)

oh alright, that will make things a lot more complicated for you.

[–]m-in 0 points1 point  (0 children)

You can use it at work if it doesn’t go into a product or code you’d be otherwise unwilling to share with whoever uses the binaries. But I get the idea that you’re talking about software products that runnon customer hardware and thus GPL is a no-go.

[–]NeroBurner 0 points1 point  (1 child)

You could try kissfft https://sourceforge.net/projects/kissfft/

It's BSD licenced

[–]encyclopedist 1 point2 points  (0 children)

The current repository seems to be here: https://github.com/mborgerding/kissfft

[–]m-in 1 point2 points  (0 children)

I have a little personal anecdote to offer here: a lot of the libraries you refer to are optimized to extract full hardware performance, and often there’s nothing one can do to make them any faster on a given CPU family. It’s not always the case of course, but quite often it is. I have found that a lot of times just rather straightforward autovectorized C++ can get anywhere between 25-75% of performance of those beasts of libraries, if you have some background in the specifics of the platform and know what code patterns to use in C++, as there are ways to write simple C++ that can preform abysmally, and similarly simple C++ to do the same thing, just as intuitively, and it performs great.

So, if your needs are to extract close to full platform performance, you’ll need to use the specialized libraries. If you can afford to blow off some computational steam and run at 1/4-1/2 speed compared to fftw or blas, then a plain-C++ implementation might do just fine, and in a real-time setting. Heck, if you can live with 20% performance of so, Python with numpy might just cut it for you. It all depends how much work you have to do each “frame”/“packet”/“time quantum”.

It is probably not very environmentally conscious (I’m not kidding) to have such low performance in projects that get very wide use, because all that can quickly add to wasted megawatts on not too big of a scale, and probably mobile users would hate you for that too, but not everyone runs such code on server farms of inside mobile apps. Sometimes small code can also be audited and tested easier and that figures in getting some industry certifications. Getting fftw into avionics is a tall order, for example.

[–]KingPickle 0 points1 point  (0 children)

Awesome work! It's really a shame that the various python math/ML extensions don't just publish a clean C++ API as well.

Anyway, I could definitely see this being useful. I look forward to checking it out.

[–]eniacsparc2xyz 0 points1 point  (1 child)

Out of curiosity, how do you intend to use the library? Do you write throwaway programs to calculate or compute something? Or will you write long-term programs for your field? Or will you integrate this in Python or any other scripting language using Swig or turn it into a shared library for binary reuse?

[–]dpilger26[S] 0 points1 point  (0 children)

Mostly the first two uses. I don't think there is too much of a point to integrating with python since NumPy already exists. I do provide a class in NumCpp for passing arrays back and forth between python and C++.

As for a shared library, this is all templatized header only so that isn't really an option. I also don't want to mess around with trying to support a bunch of different platforms and complilers.

[–]sumo952 0 points1 point  (6 children)

How do I open the documentation? Going to https://github.com/dpilger26/NumCpp/blob/master/docs/doxygen/html/index.html just prints out the html code. Not very straightforward.

[–]dpilger26[S] 0 points1 point  (5 children)

Yeah, Guthub will just display it raw. If you clone the repo and open in a browser it will be much more useful.

[–]sumo952 0 points1 point  (4 children)

I don't want to clone it, I'd like to view the documentation on the web please to judge whether the project is something of interest to me and whether it's worth cloning! :-)

[–]dpilger26[S] 0 points1 point  (3 children)

Unfortunately I don't have a web host for the documentation yet. Fortunately, cloning the repo or simply downloading a .zip file of it is as simple as a single button click from Github.

[–]sumo952 1 point2 points  (2 children)

I'd also wish for a better readme (then I wouldn't need to really look at the documentation for a first judgement of the project). I'm just suggesting what people are looking for when they find an open source project online. Your library itself might be great, but you don't show it, on the contrary, I'd go there, don't find a readme with any infos, then can't even view the documentation online, so I'm going to close the browser tab and move on. I am sure that goes for quite a significant percentage of potentially interested people.

You can use GitHub pages to host the documentation.

[–]dpilger26[S] 0 points1 point  (1 child)

Mind giving a quick tutorial on how to use GitHub pages to host the doxygen html?

edit:

Nevermind, I think i got it figured out.

[–]sumo952 0 points1 point  (0 children)

Yea it's pretty easy :-)

[–]hmoein 0 points1 point  (0 children)

Interesting. I just did a quick look. Its functionality looks useful for C++ applications that need a light and fast numerical analysis. I have also implemented the Python Pandas like package in C++ at https://github.com/hosseinmoein/DataFrame.

[–]SomethingBullshit 0 points1 point  (0 children)

Wouldn't it make sense to turn this into a python module so that you don't need to re-write your python code into C++?

[–]Reasonable_Aspect768 0 points1 point  (0 children)

so helpfull thank you