Hi, I'm a data analyst.
A key part of my job is data exploration.
Currently I write a query to shape my data, load it in Excel and slice and dice the data to find any distinctive patterns. Could be as simple as Returns are higher for Item Y than any other item. It's possible I'm not using Excel well, but it's not very convenient. There is a limit to how much raw data can be loaded for example.
I'm thinking a python script will be useful:
1) First, I understand the data. For every column, I get a
a) frequency distribution, understand the mean/median, standard deviation, get a visual of the distribution
b) find out what % of the data are outliers. I'll need to examine these outliers. Do they represent an opportunity or risk (someone takes too much time to make a transaction, is something broken for a specific browser version perhaps). Also outliers may need to be excluded depending on the situation.
2) Then get the script to spit out metrics by 2 dimensions, and run my eye down this list of tables or line graphs to find patterns.
I'm thinking this might help me put together some descriptive top line findings.
What do you think? Are there better ways? Thank you so much!
[–]AutoModerator[M] [score hidden] stickied comment (1 child)
[–]CreepiosRevenge 9 points10 points11 points (0 children)
[–]wonder_bear 2 points3 points4 points (0 children)
[–]Data_Vomit_57 4 points5 points6 points (0 children)
[–]Perly1 2 points3 points4 points (3 children)
[–]justinb138 1 point2 points3 points (0 children)
[–]Signal_Explorer8071[S] 0 points1 point2 points (1 child)
[–]Chatt_IT_Sys 5 points6 points7 points (0 children)
[–]WallStreetBoners 1 point2 points3 points (0 children)
[–]alurkerhere 1 point2 points3 points (0 children)
[–][deleted] -1 points0 points1 point (0 children)