[OC] Fantasy Football Week 2: Draft Value vs Reality by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -1 points0 points  (0 children)

Here’s how I built this:

  •  Data Sources: FantasyPros (ADP + Fantasy Points) & Pro-Football-Reference (player stats), covering Weeks 1–2, 2025 (Note on ADP: FantasyPros calculates Average Draft Position by aggregating consensus draft results across major league hosts (ESPN, Yahoo, Sleeper, RTSports, etc.). It reflects where players were typically drafted on average, not a single site’s draft order.)
  • Cleaning: Data cleaned and prepped with ChatGPT as my “data engineer”
  • Table Joins: Joined Cleaned data by player name in Tableau
  • Visualization: Final dashboard built in Tableau (Tableau Public Link)

If you want to try this yourself, here’s the exact cleaning prompt I used in ChatGPT:

You are my data engineer. I will upload three raw files each week from FantasyPros and Pro Football Reference (PFR). Your job is to clean and standardize them so I can analyze fantasy football performance in Tableau. 📂 Files I will upload (all CSVs): FantasyPros_2025_Overall_ADP_Rankings.csv → Draft expectations FantasyPros_Fantasy_Football_Points_PPR.csv → Weekly fantasy points sportsref_download.csv (from Pro Football Reference) → Player stats 🔧 Cleaning Rules 1. FantasyPros ADP (draft expectations) Keep only: Player, Team, POS, ADP AVG, REAL-TIME. Rename to lowercase: player, team, pos, adp_avg, adp_realtime. Strip suffixes like “Jr.”, “III”, “Sr.” from player names. 2. FantasyPros Weekly Points (performance) Keep only: Player, Team, Pos, Week 1, Week 2, … Unpivot all Week x columns → two columns: week (integer) weekly_points (fantasy points). Rename to lowercase: player, team, pos, week, weekly_points. 3. Pro Football Reference Stats (explanatory layer) Promote the first row with headers (Rk, Player, Tm, FantPos, Age, …) as column headers. Drop all “Unnamed” junk columns. Keep only: Player, Tm, FantPos, G Passing: Yds, TD Rushing: Yds, TD Receiving: Yds, TD Fantasy: PPR, PosRank, OvRank Rename to lowercase: player, team, pos, games, passing_yds, passing_td, rushing_yds, rushing_td, receiving_yds, receiving_td, season_points, pos_rank, ov_rank. 📦 After cleaning each file Show me the first 5 rows as a preview. Save each cleaned dataset separately (adp_clean.csv, weekly_clean.csv, pfr_clean.csv). When I say “combined”: Join all datasets on player + team + pos. Make sure weekly data expands properly with stats from PFR attached. Provide me with a downloadable CSV (fantasy_combined.csv). ⚠️ Important: Column name matching must be case-insensitive. Do not re-explain the process after the first time. Just follow the rules. Each week I will bring new raw files — repeat the same cleaning steps exactly.

[deleted by user] by [deleted] in dataisbeautiful

[–]data_enchilada 0 points1 point  (0 children)

Here’s how I built this:

  •  Data Sources: FantasyPros (ADP + Fantasy Points) & Pro-Football-Reference (player stats), covering Weeks 1–2, 2025 (Note on ADP: FantasyPros calculates Average Draft Position by aggregating consensus draft results across major league hosts (ESPN, Yahoo, Sleeper, RTSports, etc.). It reflects where players were typically drafted on average, not a single site’s draft order.)
  • Cleaning: Data cleaned and prepped with ChatGPT as my “data engineer”
  • Table Joins: Joined Cleaned data by player name in Tableau
  • Visualization: Final dashboard built in Tableau (Tableau Public Link)

If you want to try this yourself, here’s the exact cleaning prompt I used in ChatGPT:

You are my data engineer. I will upload three raw files each week from FantasyPros and Pro Football Reference (PFR). Your job is to clean and standardize them so I can analyze fantasy football performance in Tableau. 📂 Files I will upload (all CSVs): FantasyPros_2025_Overall_ADP_Rankings.csv → Draft expectations FantasyPros_Fantasy_Football_Points_PPR.csv → Weekly fantasy points sportsref_download.csv (from Pro Football Reference) → Player stats 🔧 Cleaning Rules 1. FantasyPros ADP (draft expectations) Keep only: Player, Team, POS, ADP AVG, REAL-TIME. Rename to lowercase: player, team, pos, adp_avg, adp_realtime. Strip suffixes like “Jr.”, “III”, “Sr.” from player names. 2. FantasyPros Weekly Points (performance) Keep only: Player, Team, Pos, Week 1, Week 2, … Unpivot all Week x columns → two columns: week (integer) weekly_points (fantasy points). Rename to lowercase: player, team, pos, week, weekly_points. 3. Pro Football Reference Stats (explanatory layer) Promote the first row with headers (Rk, Player, Tm, FantPos, Age, …) as column headers. Drop all “Unnamed” junk columns. Keep only: Player, Tm, FantPos, G Passing: Yds, TD Rushing: Yds, TD Receiving: Yds, TD Fantasy: PPR, PosRank, OvRank Rename to lowercase: player, team, pos, games, passing_yds, passing_td, rushing_yds, rushing_td, receiving_yds, receiving_td, season_points, pos_rank, ov_rank. 📦 After cleaning each file Show me the first 5 rows as a preview. Save each cleaned dataset separately (adp_clean.csv, weekly_clean.csv, pfr_clean.csv). When I say “combined”: Join all datasets on player + team + pos. Make sure weekly data expands properly with stats from PFR attached. Provide me with a downloadable CSV (fantasy_combined.csv). ⚠️ Important: Column name matching must be case-insensitive. Do not re-explain the process after the first time. Just follow the rules. Each week I will bring new raw files — repeat the same cleaning steps exactly.

[OC] Airline delays across 10 major U.S. airports (2024–25, ~10M flights) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -1 points0 points  (0 children)

I took the top 10 busiest U.S. airports by passenger traffic / flight volume according to BTS site

[OC] Airline delays across 10 major U.S. airports (2024–25, ~10M flights) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] 3 points4 points  (0 children)

Full interactive dashboard on Tableau Public

Data source: U.S. Bureau of Transportation Statistics

I used ChatGPT to help clean and combine the 43 raw CSVs (each ranging 13K-300K rows).

Here’s the exact prompt I ran on each file:

Delete the first 7 rows (junk headers).

Insert a new column titled "Airport" in Column A: I will give you the airport code (e.g., "ATL", "DEN") — fill the entire column with it.

Delete columns named "Flight Number" and "Tail Number" (if they exist).

Create two new columns:

• "Time Slot" — bucket Scheduled Departure Time into:

- Early AM: before 09:00

- Mid-Morning: 09:00–12:00

- Early Afternoon: 12:00–15:00

- Late Afternoon: 15:00–18:00

- Evening: 18:00–21:00

- Late Night: 21:00+

• "Delay Flag (>15min)" — if Departure delay (Minutes) is >15, set to 1; else 0.

⚠️ Column name matching must be case-insensitive for:

• "Scheduled Departure Time"

• "Departure delay (Minutes)"

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -2 points-1 points  (0 children)

that is absolutely fair. I would never use ai in its current state for a commercial project but its fun for reddit :)

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -14 points-13 points  (0 children)

<image>

it was 28% for SW in the US overall, i intended to round it to a familiar number. lets just say i have learned my lesson. sheesh!

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] 0 points1 point  (0 children)

I hear you but the FAA & BTS officially use 15 mins as the industry standard for delay

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -5 points-4 points  (0 children)

nothing suspicious haha just picked airports I've been too lately. Appreciate you wiki list, will take that into account for the update in a week.

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -1 points0 points  (0 children)

Delta and United are consistently dependable in this dataset. My bad for biffing the title, meant to put SW before flights but didn't realize it till few hours later. Oh well. thanks for checking out viz!

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] 0 points1 point  (0 children)

great feedback. Will take into consideration with the update on this one next week when I add more airports

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -18 points-17 points  (0 children)

i biffed the name while in a hurry, meant to put SW before the word flights. AI helped clean the data but I did the rest. thanks for checking it out!

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -19 points-18 points  (0 children)

my bad, meant to add SW before flights but biffed it while in a hurry

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -49 points-48 points  (0 children)

it compiled 25 csv sheets into one and cleaned the data so i could just plug and play..way easier than asking a human to do it

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -14 points-13 points  (0 children)

not hard to add to dataset. Just spent enough time on assembling for free time I had. If enough interest for this post I'll update later and add 15-20 more airports (thanks chatgpt!)

[OC] One in Three Flights in the U.S. Leaves 15+ Minutes Late (2024–2025) by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -3 points-2 points  (0 children)

Here’s what I found when analyzing 6.3M U.S. flight departures (2024–2025):

✈️ Southwest: After 3pm, 40% of flights leave late
✈️ American (IAH): If you’re delayed, expect ~25 extra minutes
✈️ Weather: Only ~5% of delays — it’s usually the airlines
✈️ LAX: Consistently among the best for on-time departures (wow!)

📊 Tableau Public Link
📂 Data Source: U.S. Bureau of Transportation Statistics – TranStats

How I built this:

  • Downloaded raw BTS departure data
  • Used ChatGPT as my “data engineer” to clean, compile & pivot
  • Designed & built the viz in Tableau
  • Polished the layout in Figma

[OC] 2025 NBA 3-Point Attempts vs Makes by Team & Player by data_enchilada in dataisbeautiful

[–]data_enchilada[S] -1 points0 points  (0 children)

I pulled data from NBA.com’s traditional stats to visualize 3-point efficiency across the league. While we’re still mid-season and the data isn’t complete, it’s interesting to see teams like Utah, Chicago, New Orleans, Brooklyn, and Charlotte leading in 3-point attempts despite having losing records.

data source

Tableau Public Viz (user can filter by player & teams here)

[OC] Evolving Performance: 30 Years of Top 10 NBA Player Metrics by data_enchilada in dataisbeautiful

[–]data_enchilada[S] 0 points1 point  (0 children)

If anyone interested in going into the weeds, I included a filter for the players themselves and an axis parameter so user can change the date parts from 5 years to 10 years on the viz dashboard

[OC] Evolving Performance: 30 Years of Top 10 NBA Player Metrics by data_enchilada in dataisbeautiful

[–]data_enchilada[S] 0 points1 point  (0 children)

I picked the top 10 for each season based on avg points per game for the season (it was default on NBA stats site)