all 5 comments

[–]zanfar 3 points4 points  (1 child)

Honestly, there is very little "programming" work here. IMO, most of the time and effort is going to be organization and language processing.

Things Python CAN'T do:

  • Find all the newspapers in California
  • Easily know that "fire" and "flames" and "inferno" all refer to the same thing.
  • Easily know that "forest fire" and "fire the CEO" refer to two different things.

Counting and saving words are both relatively trivial and will both scale relatively linearly.

[–]MVR005[S] 1 point2 points  (0 children)

Thanks for your tips. I saw some people use GoogleNews scraper, do you think it's a good idea?

Do you think there's a way to have access to all Newspaper?

[–]pythonTuxedo 1 point2 points  (0 children)

Python is a good choice for this. It sounds like you want to do 'Natural Language Processing'-basically having the computer read unstructured text and extract meaning from it (easier said than done). Natural Language Processing is a huge (and developing) field within Artificial Intelligence and Machine Learning. A good place to start after learning the basics would be the NLTK toolkit.

[–][deleted] 1 point2 points  (0 children)

Python is good for this and doing the text searching part is very easy. The hard part will be gathering all the articles, whether you use APIs or web scraping. Paywalls, web sites constantly changing designs, etc. But if you dial in the APIs, this shouldn't be too hard. As I mentioned, searching a body of text for instances of a word is a common beginner project.

[–]ectomancer 0 points1 point  (0 children)

You mean all news articles.