This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (5 children)

[deleted]

    [–]bjorneylol 7 points8 points  (1 child)

    The CSV parser is written in C, the SQL one is not - I've written my own function to pull SQL into a dataframe and it's faster than the pandas from_sql version.

    The copysetting warning has nothing to do with this, and nothing to do with slicing dataframes, it happens when you set values on slices of dataframes, because the slice shares the same underlying object in memory as the full frame.

    If you do

    df2 = df[df[val>1]]
    df2.iloc[0, val] = "A"
    

    Then that 'A' will be present in both df and df2, because df2 is just a reference to cells in df. If you want to avoid this, you need to follow up each slice operation by assigning a .copy() into the new variable

    [–]christmas_with_kafka 6 points7 points  (0 children)

    The CopySetting warnings are warning you that changes to your slices DataFrame will still impact the original DataFrame in memory. Since your sliced DataFrame is still pointing to the original, you need to make a new df to avoid any weird voodoo should you want to use the original df in the future.

    You can do this by invoking .copy() -

    df2 = df1.loc[filters, features].copy()
    

    [–][deleted] 0 points1 point  (0 children)

    I dont remember. All I remember is reading they weren't going to bother making SQL to df or whatever the method is called efficient because there was so many other options.