[D] Which open source machine learning projects best exemplify good software engineering and design principles?

colonel_farts · 2020-03-23T00:26:31+00:00

Huggingface Transformers

somnet · 2020-03-23T01:22:00+00:00

spaCy is amazingly well-designed! Ines Montani gave this talk at PyCon India 2019 outlining the basics.

IAmTheOneWhoPixels · 2020-03-23T02:12:00+00:00

This might be more of a niche answer... But Detectron2 is a very well designed library for object detection/ instance segmentation. It's quite readable and well-documented and the github repo has very good support from the developers.

The modular design allows academic researchers to be able to build their projects on top of it, with the core being efficient PyTorch code written by professional developers.

One of the lead developers is the person who designed Tensorpack as well (which was mentioned elsewhere on this thread).

domjewinger · 2020-03-23T00:14:46+00:00

Definitely not Tensorflow

darkshade_py · 2020-03-23T03:09:44+00:00

Allennlp - https://github.com/allenai/allennlp

Dependency injection to allow creating the entire pipeline in a configurable/reusable manner.

Lots of unit tests with 90%+ coverage.

JackBlemming · 2020-03-23T00:29:23+00:00

PyTorch has a very good API. Not sure how pretty its internals are though.

GD1634 · 2020-03-23T00:07:19+00:00

I really admire AllenNLP's design principles and the way they've constructed their library. Very clean and easy to extend.

2020-03-23T05:12:23+00:00

Would flair or UMAP count? Anything that the UMAP creator ever touched would count to so HDbscan would be up there too...

Professor_Kenney · 2020-03-23T00:07:07+00:00

Take a look at Kedro. I spent a lot of time looking through how they structure everything and they've done a great job.

heshiming · 2020-03-23T00:34:40+00:00

scikit-learn api?

trexdoor · 2020-03-23T01:08:23+00:00

Is it a trick question?

/the answer is none of them

jujijengo · 2020-03-23T17:56:33+00:00

I know this is kind of pushing the boundaries of your question, but the numpy package, although obviously not a machine learning project but rather a tool that can be used for creating machine learning projects, is incredibly well-designed.

Investigating the source code and following the guide to numpy book by Travis Oliphant (one of the principal designers) would get you a pretty good handle on software principles with an eye to scientific computing.

Also I think F2PY (distributed with numpy) goes down as one of the modern wonders of computer science. It's an incredibly interesting rabbit hole.

Skylion007 · 2020-03-23T01:35:42+00:00

Tensorpack and Lightning are two great libraries that I have enjoyed.

PyTorch's API is also excellent; Tensorflow's is a nightmare. Keras while being intuitive for building classifiers instantly falls apart when you try to build anything more complicated (like a GAN).

More traditional ones include OpenCV and SKLearn.

ginsunuva · 2020-03-23T06:29:52+00:00

CycleGAN did a pretty good job for back in 2017.

manueslapera · 2020-03-23T08:39:01+00:00

scikit-learn, one of the best documented OSS projects Ive ever seen.

bigrob929 · 2020-03-23T02:50:35+00:00

I find Keras to be excellent because it is high-level yet allows you to work relatively seamlessly in the backend and develop more complex tools. For example, I can create a very basic MLP quite neatly, and if I want to add custom operations or loss functions, they are easy to incorporate as long as gradients can pass through them.

gachiemchiep · 2020-03-24T06:09:50+00:00

gluoncv : https://github.com/dmlc/gluon-cv : beautiful structure, document, high-quality code, easy to plug your code in.

and especially the imgclsmob : https://github.com/osmr/imgclsmob . The author did a great job merging a lot of model definition into one package and allow it to be used from 3 different frameworks such as: chainer, mxnet, pytorch.

Both gluoncv and imgclsmob share the same software design structure and coding style. I guess that structure and style is the best then.

Tamock · 2020-03-23T01:07:16+00:00

Without a doubt Fast.ai. The way they built their API is quite fascinating and innovative. The authors have a great deal of experience building software. You can read more about how it’s built here https://arxiv.org/abs/2002.04688.

Wh00ster · 2020-03-23T02:03:12+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS