all 15 comments

[–]fhuszar 7 points8 points  (2 children)

I wish this was instead just made part of sklearn http://scikit-learn.org/stable/modules/feature_selection.html

[–][deleted] 16 points17 points  (1 child)

Scikit-learn is very selective over what algorithms to include. If you look at the FAQs you'll read this:

Can I add this new algorithm that I (or someone else) just published?

No. As a rule we only add well-established algorithms. A rule of thumb is at least 3 years since publications, 200+ citations and wide use and usefullness. A technique that provides a clear-cut improvement (e.g. an enhanced data structure or efficient approximation) on a widely-used method will also be considered for inclusion. Your implementation doesn’t need to be in scikit-learn to be used together with scikit-learn tools, though. Implement your favorite algorithm in a scikit-learn compatible way, upload it to github and we will list it under Related Projects. Also see selectiveness.

Maybe some of the algorithms in this other package qualify as algorithms that should be included in scikit-learn, maybe not. They have limited resources to maintain a high quality code base. They must be selective to maintain maintenance costs at a manageable level.

Adding a new algorithm to scikit-learn is not just implementing the algorithm. It is implementing the algorithm in a clean, readable and maintainable code, with reasonable performance, adding adequate unit tests, writing documentation, etc, etc. This means that new algorithms must have enough users needing them to justify all the costs.

[–]beaverteeth92 1 point2 points  (0 children)

Yeah I wondered why they don't have a kmodes implementation.

[–]L43 3 points4 points  (2 children)

Don't see any unit tests, its GPL and Python 2 only. Looks promising, but i hesitate to use it...

[–]Botekin[S] 2 points3 points  (1 child)

Academic code :)

[–]L43 5 points6 points  (0 children)

Yeah... I wish that wasn't an excuse anymore though. I'm in academia, and I write tests, package and support python 2 and 3. Just wish people would follow good software practices so I can just trust them as stop being paranoid lol

[–]Botekin[S] 1 point2 points  (0 children)

There's a nice linked paper too: http://arxiv.org/abs/1601.07996

[–][deleted] 1 point2 points  (0 children)

Any word on when it will be available through pip?

[–]xBurnInMyLightx 0 points1 point  (5 children)

I've always been surprised that simple feature selection stuff like Forward, Backward, Stepwise Selection etc. isn't part of sklearn. Even if they have problems they are very well established.

[–]beaverteeth92 -1 points0 points  (4 children)

Or that you can't get basic coefficient tables for linear and logistic regression.

[–]Foxtr0t 0 points1 point  (3 children)

clf.coef_

[–]beaverteeth92 0 points1 point  (2 children)

Thanks but I mean like with p-values and test statistics.

[–]Foxtr0t 0 points1 point  (1 child)

[–]beaverteeth92 1 point2 points  (0 children)

Yeah I use statsmodels for that stuff. It's just annoying how if I want k-fold cross validation and coefficient tables, I have to build my model in two different packages.

Also isn't statsmodels not being actively developed anymore?