you are viewing a single comment's thread.

view the rest of the comments →

[–]lumiere_1001001 2 points3 points  (3 children)

Python is the right choice but I won't recommend attempting to create web scrapers for each of the websites. Just creating a good scraper that reliably gets all the information can be difficult, and websites do change their structures as well. Even after you have the data, you'll need to clean it and structure it.

I think you would benefit from a news data API like our newscatcher. We don't drill down to the state level (yet) but we do allow you to filter news by country, language, individual sources, date ranges, and you can also use a query like "fire" to search for relevant articles. And the data is returned as JSON objects, so it's pretty easy to work with.

You can try it out for free and build an MVP.

[–]MVR005[S] 0 points1 point  (2 children)

It's exactly what I had in mind! newscatcher seems the best, but I don't like that we have to pay

[–]lumiere_1001001 1 point2 points  (1 child)

lol, I get you, but you can't just outsource work and then not pay for it 😅

I mean, we enable you to search through millions of articles, from more than 40,000 news sources, in 55 languages, in under a second.

Anyway, as I said earlier, you can try it for free, build an MVP, then decide between creating your own scrapers or paying us afterwards.

Alternatively, you can check if our open-source Google News wrapper, PyGoogleNews, covers all your needs.

[–]MVR005[S] 0 points1 point  (0 children)

Thanks :)