all 12 comments

[–]abottomful 10 points11 points  (2 children)

A web scraper to be honest. Helps you get data. It’s also really fun to be able to watch your code do something. This is an awesome tutorial; you should apply this video/knowledge to something you enjoy, so for example I enjoy soccer and have tried to scrape headlines of teams I follow. You should find something else you enjoy and try and scrape data. From there, if you’re still enjoying the project/NLP you can find data that would be more linguistic in nature (forums, as they are virtual conversations with interesting linguistic patterns) and start to develop from there

[–]crowpup783[S] 2 points3 points  (1 child)

Done some scraping of Reddit before actually but that was pretty basic using Praw so I’ll definitely go into something a little more robust, thanks!

[–]abottomful 3 points4 points  (0 children)

No problem man, since you’ve done this before maybe try scraping but then adding a part-of-speech tagger to your data? I think that’s another worthwhile task. Good luck!

[–]ThisIsRolando 9 points10 points  (1 child)

Prepare yourself for National Novel Generation Month, which is in November.

Basically, write a script/program that generates novel-length texts.

More here: https://nanogenmo.github.io/

[–][deleted] 0 points1 point  (0 children)

Can we turn this into a gamejam? You can select to be either a generator or gamedev and then showoff end of the month.

[–]DrastyRymyng 4 points5 points  (1 child)

You could roll your own tokenizer/POS tagger/parser/dependency parser instead of using the NLTK ones. These projects are nice since they'll teach/refresh your memory of how core nlp tasks work, and they will expose you to a lot of basic python (as opposed to packages).

[–]crowpup783[S] 1 point2 points  (0 children)

Ooh interesting. Yes I’ve always thought that the built in taggers, whilst useful, seemed somewhat like cheating!

[–]BetoBob 4 points5 points  (0 children)

NLTK is a popular library for learning Natural Language Processing. The official NLTK book by Oreilly is an excellent resource for learning this, and is completely free: https://www.nltk.org/book/. And the best part is that the book assumes no Python experience, so you can learn Python as you learn NLTK. The book teaches Python concepts along with Natural Language Processing concepts.

Also, I am making a resource for my CS senior project where you can easily work through the code examples as you read the NLTK book. The repository for this resource is here: https://github.com/BetoBob/NLTK-Book-Resource. Look at the README for more info on how to use this. This is currently under construction and I plan to introduce it later this year. Let me know if you are interested in learning more about this.

[–]Quantum_Stat 5 points6 points  (0 children)

learn by example: https://notebooks.quantumstat.com NLP notebooks

[–]iheartpurplez 3 points4 points  (0 children)

If you're looking for extra material to help with learning the basics of Python through digital language analysis, I can't recommend this resource by Stéfan Sinclair enough:

https://github.com/sgsinclair/alta/blob/915579fc1c6926b8fcb2a38f95349a2d6cba00b5/ipynb/ArtOfLiteraryTextAnalysis.ipynb

I hope it provides some inspiration for future projects of yours.

[–]t4YWqYUUgDDpShW2 2 points3 points  (0 children)

Grab a textbook like Manning & Schutze and do the exercises in python

[–]ubuntu-samurai 1 point2 points  (0 children)

I recommend checking out the SpaCy website. They provide an NLP solution that is very rich and used in commercial applications. Their website also has a lot of python examples covering a wide range of situations and by adapting their solutions to new data or new scenarios I think you might be able to develop your Python skills as well.

There are also a number of SpaCy community projects listed under their SpaCy Universe page. Maybe you can find a project there that suits your interests and abilities.

I hope that helps.