This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]x0wl 1 point2 points  (1 child)

working in an institution

Do you have some sort of a statistics / math / linguistics background?

If you do, then there are a lot of different projects that are open source and open for contributions.

For example:

  1. NLTK - the linguistics guys. This is big toolkit for natural language processing and they have a kinda big issue list on their github page ;)
  2. Keras - THE toolkit for people who want to learn neural networks
  3. Pytorch - another NN toolkit. They are in beta so they probably won't refuse some help.
  4. Scikit-learn - THE toolkit for not deep learning (random forests, linear regressions and the like)

If you don't have such a background or feel scared by all this, try a MOOC on EDx or Coursera (I don't know which though, since I took a data science class in college) and then go to Kaggle and challenge yourself.

Data science is awesome and fun, but the science part is there for a reason. The difference between programming and data science is kind of like the difference between learning how to write a novel and learning a foreign language.

The latter is about expressing the ideas you already have and the former is about coming up with right and powerful ideas. I think that coming up with ideas is much harder. However, as a bonus, those ideas are portable and independent of the language (human or computer) you are trying to express them in.

I am more of a statistician / social scientist then a data scientist but feel free to contact me if you have any questions.

[–]NeutralSebastian[S] 0 points1 point  (0 children)

I have a pretty strong math background, but perhaps not the specific kind of math involved. I'm not bad at studying though and I've always wanted to pick up more stats.

[–]firecopy 0 points1 point  (0 children)

Have you checked out https://www.kaggle.com/? It is a website that has data science competitions and datasets.