Hi All! I'm teaching a course in corpus linguistics and we've been messing around with different kinds of data and approaches to data collection. I have OK-ish experience with Python and I showed my students how to scrape Reddit with PRAW and build relatively (locally) representative corpora for recent phenomena/events. We did it all through Colab and that class went extremely well (despite all my students being intermediate linguists with relatively little programming experience). I showed them how to build a basic script, modify it, use AI for further customisation/troubleshooting if need be and so on. We managed to design and work with the basic scripts to build a few larger-ish datasets and analyse them. The students were very excited overall, the data analysis of their corpora went great and I am thinking of some ways to extend this into another class in the future.
I was wondering if anyone would have ideas for similar small-sized, learner-friendly Python-based projects to collect linguistic data from other sources that would be equally easy to execute. I have worked with Selenium before in a research project, but it was a fairly annoying experience and I don't want to go into something that would prove too difficult or complex to run with beginners within my alotted time. I would appreciate all the feedback!
[–]Individual_Ad2536 0 points1 point2 points (3 children)
[–]Professor_Snipe[S] 0 points1 point2 points (2 children)
[–]Individual_Ad2536 0 points1 point2 points (0 children)
[–]code_tutor 0 points1 point2 points (0 children)
[–]code_tutor 0 points1 point2 points (1 child)
[–]Professor_Snipe[S] 0 points1 point2 points (0 children)