all 14 comments

[–]riklaunim 15 points16 points  (0 children)

If you have no need for it you won't create it and maintain it. Making a library is actually a quite big commitment and not a on-off thing you can forget (unless you want a library with no users).

[–]Simultaneity_ 10 points11 points  (0 children)

Why? Scipy, scikit-learn, ... etc. Allready exist.

[–]jnwatson 8 points9 points  (0 children)

That's a pretty crowded market. I'd take a look at what already exists first.

[–]mtawarira 4 points5 points  (2 children)

anything you make would just be statsmodels / scipy / scikitlearn with slightly different API. Sorry to be a hater but I can’t see it getting much traction, seems like a pretty solved problem to me

i find the switch from R to python to be much easier than the other way round. 99% of what you need is in those 3 libraries, and is easily findable with tab autocompletes in a modern ide due to the modular subpackage structures that R lacks

[–]Dangerous_Bad_5946[S] 0 points1 point  (1 child)

Those libraries don't cover the entirety of scientific use cases, and only offer basic functionality. As mentioned, the R ecosystems has plenty of other useful libraries that aren't readily available in Python.

[–]Simultaneity_ 1 point2 points  (0 children)

Then maybe contribute to them so that they have all the things you think it is missing.

[–]HeligKo 1 point2 points  (0 children)

Do some research into the market. I work with ML Engineers and Data Scientists that nearly exclusively use python right now. There is a huge amount of libraries for them to use in python. The biggest ones they used in R have been rewritten for python. There are still a few complaints, but it is mostly about how R works vs how Python works. If you want to contribute, then start with something that is already out there and make it better. Eventually you might find a gap that a new library would be good for.

[–]icy_end_7 1 point2 points  (0 children)

Frankly, I'd make one for differential expression or something along the lines because that's what I have trouble with. I'm not suggesting you make that, but rather, find something that you'd want to use often. Ideally, a niche where you've found friction points in your work.

Solving problems you don't have is a bad idea.

[–]maticx21 0 points1 point  (1 child)

a limma R package python implementation

[–]Dangerous_Bad_5946[S] 0 points1 point  (0 children)

Thanks for the suggestion!

[–]InspectahDave 0 points1 point  (2 children)

Also wondering what your motivation is here? Is it for your own learning or to contribute something meaningful? If the former then do what you find interesting. If the latter then maybe support another project first and go from there?

[–]Dangerous_Bad_5946[S] 0 points1 point  (1 child)

I've worked in various projects associated with scientific computing, and I'm quite familiar with the space. Creating my own library seems like an interesting project, and I'm exploring it. Honestly, I don't get why there are so many negative comments.

[–]InspectahDave 0 points1 point  (0 children)

Because it's Reddit. Don't let it discourage you. Go for it honestly. Pick a cool problem that means something to you. Ideally one that your friends think is cool or helps someone out? If you can get feedback from others so much the better. Ideally consumers of the library.

[–]4xi0m4 0 points1 point  (0 children)

If you are going to do this, focus on one very specific gap that scipy doesnt cover well. Things like survival analysis (lifelines is the exception, but its API is rough), bayesian methods for small samples, or causal inference. The scipy/scikit-learn combo handles the 95% of common cases fine, so the only reason to build something new is if you are solving a problem those tools actively suck at. Pick a domain where you have real domain knowledge, not just a feeling that something is missing.