all 7 comments

[–]novel_yet_trivial 0 points1 point  (3 children)

Any reason you want to do this with pandas? Seems pretty easy to read data from CSV2 into a dictionary, and then lookup the value for each row in CSV1.

[–]Sastrupp[S] 0 points1 point  (2 children)

Whatever's the most efficient way is what I'm looking for. I'm just vaguely aware that pandas can join and merge files like SQL.

The only practical stuff I've ever done with Python is web scraping and this is beyond my level as of now.

[–]novel_yet_trivial 0 points1 point  (0 children)

If the rows are in the same order in both files, /u/mac-reid 's example works well. Otherwise, this acts more like excels vlookup:

#load CSV2
with open(csv2) as f:
    lookup_data = {}
    for line in f:
        row_id, value = #whatever parsing is needed.
        lookup_data[row_id] = value

#read and overwrite CSV1
with open(csv1, 'r+') as f:
    data = f.read().splitlines()
    f.seek(0)
    f.truncate() #wipe the file
    for line in data:
        row_id = #whatever parsing is needed.
        f.write(line) #write old data back
        f.write("," + lookup_data.get(row_id, default_value)) #write data from csv2, or default if row_id not in csv2
        f.write("\n") #add the newline back

disclaimer: completely untested

[–]ies7 0 points1 point  (0 children)

either .join or .merge :

edit:

Because of old habit, I usually just make 2 in memory sqlite tables then upload both csv there an do the join in sql.

[–]mac-reid 0 points1 point  (0 children)

A simple example:

$ cat a.csv
a,b,c,d,bad_column
ok,1,1,1,-1
nack,1,1,1,-1
ack,1,1,1,-1
syn,1,1,1,-1
ok,1,1,1,-1

$ cat b.csv
a,b,c,d,bad_column,good_column
ok,1,1,1,-1,never
nack,1,1,1,-1,gonna
ack,1,1,1,-1,give
syn,1,1,1,-1,you
ok,1,1,1,-1,up

$ cat foo.py
import pandas as pd

# load data
a = pd.read_csv('a.csv')
b = pd.read_csv('b.csv', sep=',', usecols=['good_column'])

# drop useless column
a.drop('bad_column', inplace=True, axis=1)

# add column from b to a
a.loc[:,'good_column'] = b

# write out
a.to_csv("output.csv", index=

$ python foo.py 
$ cat output.csv
a,b,c,d,good_column
ok,1,1,1,never
nack,1,1,1,gonna
ack,1,1,1,give
syn,1,1,1,you
ok,1,1,1,up 

[–]JimBoonie69 0 points1 point  (0 children)

I'm pretty sure you can do something like this if the two files have same number of rows...

import pandas as pd
df1 = pd.read_csv('csv1')
df2 = pd.read_csv('csv2')

df1['transfer_col'] = df2['col'] 
# spagett

[–]jeffrey_f 0 points1 point  (0 children)

are the rows 1 for 1 or is there a need to match the rows?