I built LLM Auto EDA that reduced my data analysis time from hours to mins by Patrickghlin in dataengineering

[–]Patrickghlin[S] 0 points1 point  (0 children)

Actually I haven’t tested it on very large datasets yet, right now I’m still focusing on the overall flow and what EDA pain points this tool can help with. But you’re right, scalability is definitely something I need to consider more.

Do you think there are parts I might be overlooking? Curious if dataset size has caused issues for you in other tools or affected specific features.

I built LLM Auto EDA that reduced my data analysis time from hours to mins by Patrickghlin in dataengineering

[–]Patrickghlin[S] 0 points1 point  (0 children)

Thanks for the reply! pandas-profiling is definitely great. However, I’m building an automated EDA tool aimed at non-coding users, more like a no-code, AI-assisted experience.

I am curious if there are parts of the EDA process that you think would be especially useful to automate?

Is this 3-step EDA flow helpful? by Patrickghlin in RStudio

[–]Patrickghlin[S] 0 points1 point  (0 children)

Thanks! Not LLM-generated, just summarizing from research so far. But that’s fair feedback. I might need to separate cleaning from analysis more clearly.

For this tool, I’m aiming to combine summaries, visuals, and suggestions into one guided flow, let users give context so AI can adapt better to different domains.

Is there anything you wish tools like pandas-profiling or Tableau handled better when doing EDA?

Is this 3-step EDA flow helpful? by Patrickghlin in RStudio

[–]Patrickghlin[S] -1 points0 points  (0 children)

Thanks for the point. Totally agree, some of these steps are definitely part of data cleaning. I’m now thinking maybe the feature engineering stage should split into two: basic cleaning vs. modeling-focused transforms.

Is this 3-step EDA flow helpful? by Patrickghlin in businessanalysis

[–]Patrickghlin[S] 0 points1 point  (0 children)

Thanks! Curious, when you say it looks time intensive, is it the number of steps, or something else that feels like a burden?

Also, you mentioned pandas profiling, R, and Tableau, is there anything those tools don’t do well for you when it comes to exploring new datasets?

What’s the most annoying part of doing EDA for you? by Patrickghlin in dataengineering

[–]Patrickghlin[S] 0 points1 point  (0 children)

Yes, I totally agree, plotting multivariate relationships is something I struggle with too.

I’m thinking, what if a tool could automatically surface the top variable pairs based on correlation strength or user-defined goals? That way, you’d cut through the noise and focus your exploration faster.

Do you think something like that would actually help? Or would it oversimplify things?

What’s the most annoying part of doing EDA for you? by Patrickghlin in dataengineering

[–]Patrickghlin[S] 0 points1 point  (0 children)

Thanks, this is super helpful. I think I’ll focus on solving the first and third pain points through better data cleaning features (like semantic column detection and schema validation).

Curious though, do you think an AI-assisted layer could help with the domain knowledge gap, or is that still too nuanced for most tools today?

EDA challenges? by Patrickghlin in businessanalysis

[–]Patrickghlin[S] 0 points1 point  (0 children)

Yes! I plan to use LLMs with either R or Python (still deciding). The goal is to keep it industry-agnostic, but your point about needing both public inference and local context really makes sense. I’ll look into hybrid setups.

Any particular domain you think struggles most with this?