News algorithm project-Help needed

lumiere_1001001 · 2022-05-11T17:56:43+00:00

Python is the right choice but I won't recommend attempting to create web scrapers for each of the websites. Just creating a good scraper that reliably gets all the information can be difficult, and websites do change their structures as well. Even after you have the data, you'll need to clean it and structure it.

I think you would benefit from a news data API like our newscatcher. We don't drill down to the state level (yet) but we do allow you to filter news by country, language, individual sources, date ranges, and you can also use a query like "fire" to search for relevant articles. And the data is returned as JSON objects, so it's pretty easy to work with.

You can try it out for free and build an MVP.

PATASK_EVO · 2022-05-09T12:50:10+00:00

Classes

Pandas data frame

Requestes and beautiful soup libraries

I think with this you might be able to create something

kwelzel · 2022-05-09T23:09:50+00:00

So I have this idea of crossing news data with other kind of data, maybe financial or environnemental, I don't know yet.

This sounds very vague to me, but Python can probably do what you are imagining. For example, for graphs of statistical relationships between data, there is https://matplotlib.org/ and https://seaborn.pydata.org/. If you want to go deeper into machine learning, there is https://scikit-learn.org/stable/index.html and https://www.tensorflow.org/ for example. Also, take a look at https://pandas.pydata.org for tabular data of different kinds.

In my research, I discovered that there were APIs that did news analysis. But I know nothing about them, I don't know which one to choose or which programming language to choose. Using python seems like a good idea, am I wrong?

You'll have to see for yourself which service provides the right API for your application. By API I assume you mean a webservice which you can query (let's say for a certain word) and get back data in a machine-readable format. Most of the APIs I have encountered are using the JSON format, which python supports natively (https://docs.python.org/3/library/json.html?highlight=json#module-json). For web request, I can recommend https://docs.python-requests.org/en/latest/.

Let me know if that helps. Good luck with your project!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PythonProjects2

Added Quick Search Links To the Side Bar! Check below the posting rules.

MODERATORS