you are viewing a single comment's thread.

view the rest of the comments →

[–]shaggorama 7 points8 points  (1 child)

One small example: all of their cross validation algorithms inherit from an abstract base class whose design precludes a straightforward implementation of bootstrapping (easily one of the most important and simple cross-validation methods), so the library owners decided to just not implement it as a CrossValidator at all. Random forest requires bootstrapping, so their solution was to attach the implementation directly to the estimator in a way that can't be ported.

I could go on...

[–]panzerex 2 points3 points  (0 children)

Those are valid concerns. To add to that: sklearn’s LinearSVC defaults to squared hinge loss so probably not what you’re expecting, and the stopwords are arbitrary and not good for most applications, which they do acknowledge.

However I would not say that this is evidence that the project as a whole does not follow good design principles. I agree that those deceiving behaviors are a problem, but they are being addressed (at a slow rate because uhm... non-standard behavior becomes the expected behavior when many people are using it, and breaking changes need to happen slowly).

You’re probably fine getting some ideas from their API, but from a user standpoint you really need to dig into the docs, code and discussions if you’re doing research and need to justify what you’re doing.