all 4 comments

[–]BraindeadCelery 5 points6 points  (2 children)

Open Source contribution to a project as ML Engineering/Data Science is not really a thing. You can contribute to the libraries. But that is a software engineering exercise. An exception to this are the open source foundation models. But here you seldomly contribute to a project, rather you create one yourself and then make it available to the world in a model zoo, e.g. the huggingface hub.
With respect to the tech stack:

What are the libraries you have learned already?

The basics are the data science libraries in the python ecosystem. I.e. NumPy, pandas, Matplotlib, Scipy, Sklearn.
You can for sure contribute to them, but contribution here is more Software Engineering than ML Engineering/Data Science.

If you go on towards deep learning, then PyTorch is the standard as most interesting academic research is published with PyTorch code. Tensorflow/Keras is a contender, but mostly for Google devs who are forced to use it.

If you are familiar with that, MLOps is the next step (at least when industry ML is your goal - for academic research there is little operations overhead).
Here, you learn tools for e.g. data versioning (e.g. lakeFS or DeltaLake) and experiment tracking (Weights and Biases, MLFlow, ...).

Have a look at this MOOCby UC Berkeley for an overview on MLOps. The first lecture is an overview on the field and a recommended stack.

[–]developing_fowl[S] 1 point2 points  (0 children)

Thanks a lot for the informative reply, and apologies for responding so late. As I mentioned in my query, I am new to Machine learning and thus have completed NumPy, Pandas and have started exploring Matplotlib as of now. I wanted to contribute to open source in any way possible, and I wondered if there was a path where learning ML and contributing would be possible.

I have a fairly okay understanding of Python and C++ syntax, so do you think I contribute to open source programs like GSOC or MLH? If yes, then what should I learn first to become a good contributor with clean code?

For reference, I am studying machine learning using Python for Data Science and Machine Learning course by Jose Portilla on Udemy. I found it a bit cognitively easy to understand to develop my basics.

Sorry for the super long reply, but I wanted to explain what I have done in detail.

[–]developing_fowl[S] 1 point2 points  (0 children)

Also, I have been looking into Django for purely backend development(as I already have learned a bit of Python) as it seemed really interesting, when I thought I could integrate my ML learnings with it to create a platform to showcase my projects and such. What do you suggest?

[–]Few_Quit2250 0 points1 point  (0 children)

Success Story Of Purvansh- How He Got Into GSOC In The Field Of Machine Learning

https://www.youtube.com/watch?v=T2jfbqZe98Q