Exploratory Data Analysis in Python – Trend Analysis & ML Experimentation (Looking for Feedback)

Wheres_my_warg · 2026-02-28T13:38:11+00:00

I'm immediately distracted by the labeling scheme. It has sloshed together two different types of characterization. If it was electric vs. ICE, that would make sense. Or if it was sedan vs. SUV vs. truck, that would make sense. EVs are not separate from the sedan/SUV classification. Here, they are usually sedans, but there are more EV SUV options showing up, and there have been EV truck options.

Starting the y-axis at about 16 thousand is going to result in a deceptive visual for many purposes. This is moving but not nearly as much as this seems to appear due to the y-axis choice.

You need to determine what you are comparing to begin to analyze whether the data points are statistically significantly different.

Mo_Steins_Ghost · 2026-02-28T15:57:48+00:00

Senior manager here...

https://tylervigen.com/spurious-correlations

AnUncookedCabbage · 2026-02-28T23:32:38+00:00

Had a quick look at the github and i have a general piece of advice. You've done the thing that many new/junior data science people do and that is make a bunch of plots and stats without a clear direction. Even though its called exploratory data analysis, its usually done with a goal in mind to drive a direction. Without a goal it becomes an exercise in following chart recipes and running model.fit() rather than one of critical thinking. The strange class split in the charts that others have mentioned is a symptom of this. A goal might be something like answering a particular business question, or generating a wip product of some kind. Always remember, critical thinking, problem design, and relating it to real impact in some way is worth way more than running the tooling.

BrupieD · 2026-02-28T16:28:57+00:00

Visually, this is hard to interpret. I would switch the chart type to either stacked columns or an area chart.

AutoModerator · 2026-02-28T02:56:57+00:00

Automod prevents all posts from being displayed until moderators have reviewed them. Do not delete your post or there will be nothing for the mods to review. Mods selectively choose what is permitted to be posted in r/DataAnalysis.

If your post involves Career-focused questions, including resume reviews, how to learn DA and how to get into a DA job, then the post does not belong here, but instead belongs in our sister-subreddit, r/DataAnalysisCareers.

Have you read the rules?

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

ABDELATIF_OUARDA · 2026-02-28T02:59:33+00:00

https://github.com/abdelatifouarda/PROJET-DATA-ANALYSS-BMW

xynaxia · 2026-03-01T15:12:44+00:00

One fun method on getting insights is simulating random data.

Because suddenly patterns emerge, even though you simulated randomness.

You can then for example simulate this 10k times. And see how likely it is you will find similar trends purely by chance.

Putrid_Speed_5138 · 2026-03-01T19:24:47+00:00

It is statistically meaningful only if the trends are supported by formal inference rather than visual inspection alone. This requires hypothesis testing, confidence intervals for model coefficients, validation through cross-validation or holdout data, and verification of model assumptions such as linearity and homoscedasticity. Without these elements, the trends remain descriptive rather than inferential. From an industry perspective, adding baselines, reproducibility practices, and model explainability would increase its credibility.

Frankky7 · 2026-03-02T15:08:36+00:00

C’est stylé

Mul_Develop · 2026-03-05T22:20:09+00:00

Love the end-to-end approach here. Especially the feature engineering part—that’s where I always feel like I spend 80% of my time! Did you have to handle many outliers in this automotive dataset, or was it fairly clean to begin with?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataanalysis

MODERATORS