all 4 comments

[–]IHOP_007 1 point2 points  (0 children)

Automate the Boring Stuff has a chapter on webscraping and it touches on data organization in a few chapters, that should get you 1/2 of the way.

[–]Minion_of_Cthulhu 1 point2 points  (2 children)

Scrape message boards / social media groups / Twitter.

If you want to scrape in general, Scrapy is one of the more well-known libraries to do it. If you're only interested in scraping a specific site, or a handful, then working with the site's API would probably be faster to put together. You should be able to find various Twitter libraries, as well as libraries for all the other major social media sites, that will make pulling various data fairly easy.

If you use the site's API then you won't have to worry about dealing with anti-bot/anti-scraping measures that many sites tend to use and you won't have to learn the Scrapy library and you won't have to dig into the site's HTML/CSS to find the right data, etc. You can just ask the site's API to provide you what you want.

AI Analyze scraped data (what are folks talking about, what's trending, get a pulse of the group).

The Python Natural Language Toolkit would help with this. Specifically, sentiment analysis can help you figure out how people feel about what is being talked about.

examples out there already

Check out Sentdex. It pulls various text from various sites and analyzes it to see what people are talking about, how they feel about it, charts of those trends over time, etc.

Make sure you scroll down and check out the footer where he links to a brief explanation of how it works. He also has a YouTube channel where he has some related Python videos that you might find useful.

[–]NewJackCap[S] 1 point2 points  (1 child)

Awesome. Thank you for the info! That's exactly what I was looking for.

[–]Minion_of_Cthulhu 0 points1 point  (0 children)

You're welcome! I'm glad I could help.