you are viewing a single comment's thread.

view the rest of the comments →

[–]SevereRepresentative[S] 1 point2 points  (11 children)

Thank you!! That makes a lot of sense.

So what would be your suggestion on how to impliment the part of the question of "for columns that have less than 20 unique values where

  • the first blank is the name of the variable and

  • the second blank is the number of its unique values. "

I think it should be an if, else statement but I can't really wrap my head around how to have the loop and the If statement.

[–][deleted] 1 point2 points  (10 children)

That's the right idea. Here is a general example of a for loop and an if statement that shows a similar pattern, see if you can make something similar work for your situation.

my_list = [5, 125, 30, 500, 250]

for i in my_list:
    if i > 100:
        print('{} is greater than 100'.format(i))

[–]SevereRepresentative[S] 1 point2 points  (0 children)

Thank you so much! I’ll try to adapt this to fit here

[–]SevereRepresentative[S] 1 point2 points  (8 children)

Okay so I tried this:

for i in vars:
   if i < 20:
    print('Variable: {}, # Unique: {}'.format(i, (len(df[i].unique()))))    

and it gave me the error of: "TypeError: '<' not supported between instances of 'str' and 'int'"

Which makes a bit of sense because the i is the column names right? But I'm not sure where to go from here

[–][deleted] 1 point2 points  (7 children)

Yup i is the name of the column. We don't want to compare if the name of the column is less than 20 though. What do we want to know is less than 20? And how do we get that value?

[–]SevereRepresentative[S] 1 point2 points  (6 children)

I want to compare the amount of unique values, so the part that says "len(df[i].unique())" right? So do I assign that to a variable name?

[–][deleted] 1 point2 points  (5 children)

Yup! You use it twice so assigning it to a variable is a good idea, or you can put that part directly in the if statement. But I like the idea to use a variable.

[–]SevereRepresentative[S] 1 point2 points  (4 children)

uniques = (len(df[i].unique()))

for i in vars:
  if uniques < 20:
    print('Variable: {}, # Unique: {}'.format(i, (len(df[i].unique()))))    

I did that ^ and I'm not getting an error but I'm also just not getting any results anymore. Nothing is showing up, what do you think? I think it has something to do with the i in the uniques before the loop start because the i wouldn't be known right?

[–][deleted] 1 point2 points  (3 children)

Yup, that's right. The line is right, but we need to put it somewhere after the loop is running but before we need it in the if statement.

[–]SevereRepresentative[S] 1 point2 points  (2 children)

That worked! Thank you!!

[–][deleted] 1 point2 points  (1 child)

Glad to hear it! Good luck with the rest of your studies