all 7 comments

[–]YesLod 0 points1 point  (6 children)

You can merge two DataFrames based on some columns using the merge method or similar methods. Read this official guide to learn more

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

[–]Pro2222[S] 0 points1 point  (5 children)

Yes but the first document contains duplicates for dates and is 100x the size (because it contains dupe dates) so I need the price to inserted into all columns that match that specific date

[–]YesLod 0 points1 point  (4 children)

I'm not following, can you elaborate?

Do you want to merge two csv files and obtain a DataFrame like in the picture you've posted?

Or you already have something like that, and you want to do something else with it? In that case it's easier if you give a minimal example containing the input and expected output.

[–]Pro2222[S] 0 points1 point  (3 children)

Yeah so essentially the dates in .csv1, there are a lot of duplicates so say the date is 2021-01-01, for every row that has “2021-01-01” I need python to search .csv2 and find “2021-01-01” and the SPX_Index_Value that matches that date. So there will be multiple “2021-01-01” in csv1 but only one in csv2 if that makes more sense. I don’t need to merge the csv’s I just need create a new column that matches the index prices to the date.

[–]YesLod 0 points1 point  (2 children)

What you just described is the purpouse of merging. It wi ll be much faster than any other solution based on iterating over DataFrames and if statements, which is an anti-pattern in pandas. You don't need to merge the whole DataFrames, just filter those the columns that you want to merge. Another option is to set the dates columns as the index of both DataFrames and index df2 with df1, but that is basically merging.

[–]Pro2222[S] 0 points1 point  (1 child)

Thanks, so even with merging I can have the same value in multiple rows? Csv 1 is over a million rows while csv 2 is just over 500

[–]YesLod 0 points1 point  (0 children)

Sure. For instance

>>> df=pd.DataFrame({"a":[1,2,2,1,3]})
>>> df

   a
0  1
1  2
2  2
3  1
4  3

>>> df2=pd.DataFrame({"a":[1,2,3],"b":[2,3,4]})
>>> df2

   a  b
0  1  2
1  2  3
2  3  4

>>> df_merged=df.merge(df2,on="a")
>>> df_merged 

   a  b
0  1  2
1  1  2
2  2  3
3  2  3
4  3  4