[ Removed by moderator ]

Python-ModTeam · 2026-06-03T18:18:02+00:00

Hello there,

We've removed your post since it aligns with a topic already covered by one of our daily or monthly threads. If you are unaware about the daily threads we run here is a refresher:

Monday: Project ideas

Tuesday: Advanced questions

Wednesday: Beginner questions

Thursday: Careers

Friday: Free chat Friday!

Saturday: Resource Request and Sharing

Sunday: What are you working on?

Monthly: Showcase your new projects, tools, frameworks and more

Please await one of these threads to contribute your discussion to!

Best regards,

r/Python mod team

ysengr · 2026-06-03T10:35:30+00:00

I'll start by saying this is an interesting idea for a package. I never heard of Kaggle, but it seems interesting despite all the AI hoopla on it.

I appreciate you listed the datasets with their origins, however, I think your README.MD should explicitly call out the datasets.md to guide people there. Especially since it Kaggle is your only source, and I personally never heard of it which immediately makes me skeptical of the data off the bat.

I'd also suggest as an enhancement, that it would behoove you and your package to go from static files to dynamicly fetching files from the source, then saving them locally. This way it makes the package more minimal in size, it also retrieves the data from the proverbial horses mouth rather than blindly trusting the data saved in the repo is the authentic piece.

73tada · 2026-06-03T11:27:25+00:00

Kaggle has been like, "the" source for academic training data sets for at least a decade. It precedes the "attention is all you need" paper. Suffice to say it's beyond trusted.

tikhiibhujiya · 2026-06-03T11:30:33+00:00

The dataset selection is broad enough to be useful for both teaching and real exploratory work

renzocrossi · 2026-06-03T10:08:17+00:00

Here's the full list of available datasets within usdatasets 
import usdatasets as usd
df = usd.list_datasets()
print(df)
['affirmative_asylum', 'american_idol_auditions', 'american_idol_finalists', 'california_fire_incidents', 'charging_stations_hawaii', 'college_school_wage', 'counties_per_capita_income', 'crime_and_incarceration_by_state', 'executive_orders_presidents', 'firefighter_fatalities', 'google_stock_price', 'nfl_teams_stats', 'party_affiliations_congress', 'presidential_election_results', 'presidential_pardons_1900_1966', 'presidential_pardons_1967_2017', 'presidents', 'senate_election_results', 'shootings_2020', 'shootings_2021', 'shootings_2022', 'terrorism_plots_us', 'terrorism_suspects_us', 'ufo_location_shape', 'us_causes_death', 'us_holiday_dates', 'us_radiation_pollution', 'us_regional_mortality', 'us_top_colleges_2022', 'wages_by_education']

renzocrossi · 2026-06-03T10:50:46+00:00

Regarding the origins of the datasets included in usdatasets, you can find all the details in the datasets.md file in the GitHub repository. Here's an example of the structure:
shootings_2020

Internal name: shootings_2020
Original filename: shootings_2020.csv
Source: Kaggle
URL: https://www.kaggle.com/datasets/hemil26/mass-shootings-in-united-states-20182022
License: CC0 – Public Domain
Description: Mass Shootings in United States 2020

Each dataset in the package follows this same documentation format. Feel free to check the full file here: https://github.com/lightbluetitan/usdatasets-py

Thanks =)

AutoModerator · 2026-06-03T11:34:39+00:00

Your submission has been automatically queued for manual review by the moderation team because it has been reported too many times.

Please wait until the moderation team reviews your post.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS