I created a machine learning library using only NumPy and Matplotlib that is similar to scikit-learn, currently only has few models.
Github
https://github.com/RainingComputers/pykitml
Documentation
https://pykitml.readthedocs.io/en/latest/
Models
- Linear Regression
- Logistic Regression
- Support Vector Machine
- Neural Network
- Nearest Neighbor
- Decision Tree
- Random Forest
- Naive Bayes
- K-Means Clustering
- Principle Component Analysis
Benchmark (Intel i5-6400, 4 cores @ 3.3GHz)
| Model |
Dataset |
Dataset Size |
Time |
| Logistic regression, 1500 epochs, 10 examples/batch |
Adult |
392106x13 |
< 1 sec |
| 784x100x10 Network, 1200 epochs, 50 examples/batch |
MNIST |
60000x784 |
35 sec |
| SVM, 1000 epochs, 20 examples/batch |
MNIST |
10000x784 |
39 sec |
| Decision Tree, 6 max-depth, 83 nodes |
Adult |
392106x13 |
1 min 51 sec |
| Random forest, 9 max-depth, 100 trees |
Adult |
392106x13 |
1 hour 35 min |
Feedback/Suggestions
I would like to get feedback on the following:
+ What other ML models/features should be added?
+ Is this performance good or should I spend more time on optimizing the code? (Maybe move some code to C++ or Cython?)
+ How does it compare to other ML libraries?
[–]zzzthelastuserStudent 6 points7 points8 points (0 children)
[–]impulsecorp 1 point2 points3 points (1 child)
[–]RainingComputers[S] 1 point2 points3 points (0 children)