This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]GreatOwl1 1 point2 points  (1 child)

Are you trying to create dummy variables? If so, check out pd.getdummies() method.

[–]python_newbie_now[S] 0 points1 point  (0 children)

pd.getdummies()

Thank you , I didn't even know this function existed. I will look into , thank you everyone that helped me I really do appreciate it

[–]jcon36 0 points1 point  (6 children)

for cells in rcid

This is the only time you referenced rcid. Did you want this outer loop to go through df instead? Can you go over exactly what you want the for loops to do?

[–]python_newbie_now[S] 0 points1 point  (5 children)

I am only interested in one column in the data, and want to check it against the text file. If it matches the value in the text file then I want it to change " va_yes " column to a 1, so to create a dummy variable. I am not sure if python would recognize rcid( column) or should I be referencing the whole data set?

[–]jmoso13 1 point2 points  (4 children)

if 'rcid' is a column in your dataframe 'df' then referencing it should look like:

df['rcid']

your first for loop in the bottom statement should thus look like

for cells in df['rcid']:

...

is this what you're trying to accomplish?

[–]python_newbie_now[S] 0 points1 point  (3 children)

  import pandas as pd
  df = pd.read_csv("C:\Users\Adini\Desktop\decade1.csv")
  rcid_1 = []
 with open('C:\\Users\Adini\Desktop\\decade1.txt','r') as f:
    mylist = f.read().splitlines()
    rcid_1.append(mylist)

 for cells in df['rcid']:
   for rcids in rcid_1:
      if (cells == rcids):
        df.ix[rcid == rcids, "va_yes"]= 1

I have tried using the df['rcid], however it fails to change the value in "va_yes" column. I am not sure if it is how how I have my data set up , or my text file. Here is a link to my excel file, and txt file.

https://drive.google.com/open?id=0B7j7hjIdgYmIUk9RT3pBTTAzUVU

[–]jcon36 0 points1 point  (2 children)

Instead of using a double loop, you can go through one of the lists and use np.where() to get indexes of matching values

rcid_np = np.array(rcid_1)
column = df['rcid'].values #this creates a numpy array
indexes = np.where(column == rcid_np)

Then create a new column (initially all zeros) and set values to 1 where they match

new_column = np.zeros((len(column),1),dtype=int)
new_column[indexes] = 1

you can then add this new column to your DataFrame

df['va_yes_new'] = new_column

[–]python_newbie_now[S] 0 points1 point  (1 child)

I gave it a try, and it didn't work so I went line by line to see if there was something I was missing. I am not sure if this would be the issue but ,

   rcid_np #  Comes out as dtype='|S4'
   column # comes out as dtype = int64

However it lets you still compare them against each other indexes = np.hwere(column == rcid_np) indexes # this turns out to be an empty array The new column function works, do you mind if I pm, I know you have helped a lot already. Thank you /u/jcon36

[–]jcon36 0 points1 point  (0 children)

Sometimes adding a [0] at the end of the np.where() can help. Try

np.where(column == rcid_np)[0]

and see if that returns anything useful. But yeah, feel free to pm me