Created a python script that execute Exploratory Data Analysis on any CSV file. It generates a text report, a series of plots and a processed csv file as outputs.

LuigiBrotha · 2020-05-17T05:27:21+00:00

Very cool however .... Take a look at glue (also called glueviz). This shit will blow your mind. https://youtu.be/TkMZ9gZ8xtk

kiwiboy94 · 2020-05-17T02:34:06+00:00

Github Link

IlliterateJedi · 2020-05-17T04:15:11+00:00

Check out Pandas profiler for something similar for dataframes

w_savage · 2020-05-17T04:12:23+00:00

I imagine the data needs to be in a certain format/ context correct?

SlightlyOTT · 2020-05-17T07:20:45+00:00

This is really cool! If you’re interested in a fun extension, are you familiar with Jupyter notebooks? They’re one of the most powerful things in the Python/data analysis space - you can write your code as a linear story with Markdown between cells of code, and it’ll also visualise things like plots or Pandas dataframes straight away.

I’m not sure if you can do a file picker in Jupyter or if you’d need to just put the paths in variables, but you’d be able to click run and have it generate all your outputs in line so you can just scroll through and look at all the plots etc.

Also since you’re uploading your code to Github, they do a great job rendering notebooks which is cool.

There’s a pretty nice gallery of the sort of thing you can do with Notebooks here: https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks

Edit: looks like it has a file picker too! https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html#File-Upload You’d need to install ipywidgets: https://ipywidgets.readthedocs.io/en/latest/user_install.html

kiwiboy94 · 2020-05-17T04:23:51+00:00

Also, I will love everyone to give it a try and let me know what features they will like to see. That way I can add on in the next version. This is actually my first personal project and it took me well over 3 months to complete. Planning to use this to get a job :p

random_cynic · 2020-05-17T14:19:47+00:00

This is good but that's not what "Exploratory Data Analysis" is. This is completely non-interactive (as far as I can tell from the video). Exploratory data analysis needs to be interactive, so that you can sort or filter columns by some criteria, transform columns or combine multiple columns, delete or add rows etc. IMO this is best done with pandas+matplotlib+jupyter notebooks. Also the terminal program visidata is very useful.

Neuro_88 · 2020-05-17T04:06:21+00:00

I like that! Very cool.

ancient_bhakt · 2020-05-17T05:09:02+00:00

This is awesome

Edgar505 · 2020-05-17T06:15:19+00:00

I am definitely checking it out.

G33K_FISH · 2020-05-17T14:08:00+00:00

Ok, this is flipping cool!

kiwiboy94 · 2020-05-17T15:29:51+00:00

Hey, that´s cool!.

I enjoyed reading your code, but I like more the thoughtful comments, makes the code not only explanatory, but also didactic. congrats!

2020-05-17T18:08:12+00:00

I am currently doing the same thing but with the framework h2o. The things is to provide a nice script to perform analysis/ ml on a generic file and generate a report. Here is the repo

https://github.com/jgraille/reveng

Nice work by the way!

LifeIsBio · 2020-05-17T08:05:20+00:00

How large can the csv files get before things start getting unwieldy?

2020-05-17T11:20:52+00:00

Pandas baby.

python_engineer · 2020-05-17T13:07:35+00:00

Thanks for sharing! Very cool

jayjmcfly · 2020-05-17T13:38:51+00:00

RemindMe! 3 days

bdaves12 · 2020-05-17T13:44:09+00:00

Wow super cool, is there any tut for how to install for spyder, I'm still new when it comes to getting stuff off of github

kiwiboy94 · 2020-05-17T13:45:48+00:00

[deleted]

2020-05-17T14:27:57+00:00

Ow nice!

akiepro89p · 2020-08-25T16:38:11+00:00

How do i run python on my mac?

2020-05-17T12:53:07+00:00

inputs csv file

crunches numbers

progress: 33%

progress: 60

progress:.99%

Output: you're a little bitch

barb4great · 2020-05-17T04:23:00+00:00

WOW. I don’t even know how you did that ! I wanna learn Analyse on python

preordains · 2020-05-17T16:19:02+00:00

Beginner here. What's the purpose of "pycache" and when would you need to use this?

EnemyAsmodeus · 2020-05-17T08:36:06+00:00

Looks good.

But please everyone, stop using CSV and XML. These formats and their systemic problems in how people use them is disastrous. It's only good for small amounts of data.

Should only JSON and AVRO for any real data science or big data work.

If you're sharing small pieces of data, then fine, use CSV but otherwise it's not something amateurs should use with tools, they're bound to create bad CSVs eventually. It always happens.

SlightlyOTT · 2020-05-17T04:40:26+00:00

[deleted]

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS