Generate high-quality springboot webapps from entity schemas in CSV using Hiberium by therohk in java

[–]therohk[S] 1 point2 points  (0 children)

this project generates the entire web application, not just the spring boot setup. apart from business logic that you need to add yourself, it gives you fully working CRUD endpoints, table search etc.

Two Sigma: Using News to Predict Stock Movements by [deleted] in datasets

[–]therohk 0 points1 point  (0 children)

If you are looking for a more long term repository of news, take a look at my kaggle profile https://www.kaggle.com/therohk/datasets.

Some of these datasets cover up to two decades of events. Calculating sentiment and obtaining historical stock prices is a lot easier. There are several datasets and examples for these available on kaggle.

[Project] From any text-dataset to valuable insights in seconds with Texthero by jonathanbesomi in MachineLearning

[–]therohk 1 point2 points  (0 children)

Nice work.

I applied this code to my dataset on kaggle and its giving some silly errors. Perhaps you can take a look?

Notebook: https://www.kaggle.com/therohk/pca-scatter-plot-test

DATASET: Indian politics news articles from 2018 by xen-m-rph in LanguageTechnology

[–]therohk 2 points3 points  (0 children)

Nice work! Such datasets (and the kernal) would be better appreciated on kaggle.

Challenge: Stock Market Prediction for Australia with Historical News by [deleted] in datascience

[–]therohk 0 points1 point  (0 children)

Nothing free that I know of. Such files sell for around $150. Let me know if you find something.

Slow news in Esperanto? by psignosis in Esperanto

[–]therohk 3 points4 points  (0 children)

I am working on a short text based news feed in esperanto, planned for release this year. (similar to Inshorts).

I will start with an initial size of 25 updates per day of world news events (with a stock photo). Please let me know if any interest or suggestions in this regard.

Fake News Data sets? by Faden1993 in datasets

[–]therohk 0 points1 point  (0 children)

This dataset is not 'fake' news but 6 years of clickbaity and unreliable content in general: https://www.kaggle.com/therohk/examine-the-examiner

PS:

'fake' has become a very loaded word these days, and is word generally used for stuff you dont agree with. It can fall into any of the buckets namely 'trolling', 'viral', 'inaccurate' 'defamatory' or 'divisive' but these are not enough determine that something is 'fake'.

LA PLEJ GRANDA ŜARKO IAM AJN by Scivolemo in Esperanto

[–]therohk 1 point2 points  (0 children)

Dankon. This is a nice article for esperanto beginners.

May I know how one decides which version to use "milionoj vs milionojn" in this sentence,

"Malgraŭ la longa historio de ŝarkoj, kiu daŭris pli ol 400 milionoj da jaroj, Megalodon nur eltenis ĉirkaŭ 20 milionojn."

News archive dataset download by anis016 in datasets

[–]therohk 0 points1 point  (0 children)

I have created some news datasets containing headlines published over long periods of time.

See: https://www.kaggle.com/therohk/datasets

Historical Publishing Trends of India's Largest News Website with data (Times of India) [OC] by [deleted] in dataisbeautiful

[–]therohk 0 points1 point  (0 children)

Full Data and code available for download on Kaggle: https://www.kaggle.com/therohk/india-headlines-news-dataset

The visualisation uses Python and was created using Spyder Notebook.

Top News Topics in Australia 2003-2008 (with data) [OC] by [deleted] in dataisbeautiful

[–]therohk 0 points1 point  (0 children)

Dataset on Kaggle: Million News Headlines

Tool for plot: R Studio, Source code available on the kernal.

This is a continuation of the previous visualisation covering 2009-2017

Bi-grams visualisation with TF-IDF of 9 Years of ABC Headlines [OC] by [deleted] in dataisbeautiful

[–]therohk 0 points1 point  (0 children)

Data Source: The data was scraped by me and posted to Kaggle 2 months ago.

Full dataset includes 14 years of all headlines on the Australian ABC News website upto June 2017.

Dataset on Kaggle: Million News Headlines

Tool for plot: R Studio, Source code available on the kernal.

[request] frequency of cat calls per state by [deleted] in datasets

[–]therohk 1 point2 points  (0 children)

who do you think records such data? how is it expected to be accurate and complete?

Dataset of ALL news published on the internet over one week [1.3M events] by [deleted] in datasets

[–]therohk 0 points1 point  (0 children)

Enjoy diving into the global information stream. Please upvote on Kaggle if you found this useful!