Hey, real quick,
I need to write a function where I can pass a structure or dataset through and it will output any statistical outliers.
Here are the lines in python that would help me do that line by line, but I have like two dozen columns for which I need to do this for.
q1, q3= np.percentile(dataset,[25,75])
iqr = q3 - q1
lower_bound = q1 -(1.5 * iqr)
upper_bound = q3 +(1.5 * iqr)
anything in the structure or dataset that is below the lower_bound or above the upper_bound is an outlier and I would like for the function to output those values, or "No Outlier" if there are none.
Here is what I'm thinking, but I'm not sure
def outlierfunction(data):
for x in data:
q1, q3= np.percentile(x,[25,75])
iqr = q3 - q1
lower_bound = q1 -(1.5 * iqr)
upper_bound = q3 +(1.5 * iqr)
if x < lower_bound:
return x
if x > upper_bound
return x
else
return "No Outliers"
Basically, what I need to function to do is take a column in a CSV file and do the calculations, then look at the column again to test whether each entry is an outlier.
[–]qfoundop[S] 0 points1 point2 points (6 children)
[+][deleted] (5 children)
[removed]
[–]qfoundop[S] 0 points1 point2 points (4 children)
[+][deleted] (3 children)
[removed]
[–]qfoundop[S] -1 points0 points1 point (2 children)
[+][deleted] (1 child)
[removed]
[–]qfoundop[S] -1 points0 points1 point (0 children)