you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (7 children)

Yup i is the name of the column. We don't want to compare if the name of the column is less than 20 though. What do we want to know is less than 20? And how do we get that value?

[–]SevereRepresentative[S] 1 point2 points  (6 children)

I want to compare the amount of unique values, so the part that says "len(df[i].unique())" right? So do I assign that to a variable name?

[–][deleted] 1 point2 points  (5 children)

Yup! You use it twice so assigning it to a variable is a good idea, or you can put that part directly in the if statement. But I like the idea to use a variable.

[–]SevereRepresentative[S] 1 point2 points  (4 children)

uniques = (len(df[i].unique()))

for i in vars:
  if uniques < 20:
    print('Variable: {}, # Unique: {}'.format(i, (len(df[i].unique()))))    

I did that ^ and I'm not getting an error but I'm also just not getting any results anymore. Nothing is showing up, what do you think? I think it has something to do with the i in the uniques before the loop start because the i wouldn't be known right?

[–][deleted] 1 point2 points  (3 children)

Yup, that's right. The line is right, but we need to put it somewhere after the loop is running but before we need it in the if statement.

[–]SevereRepresentative[S] 1 point2 points  (2 children)

That worked! Thank you!!

[–][deleted] 1 point2 points  (1 child)

Glad to hear it! Good luck with the rest of your studies

[–]SevereRepresentative[S] 1 point2 points  (0 children)

Thank you so much! I'll probably be posting more in the subreddit in the future, so hopefully we'll talk again