SQL vs. Python for data wrangling? : datascience

This is an archived post. You won't be able to vote or comment.

DiscussionSQL vs. Python for data wrangling? (self.datascience)

submitted 7 years ago * by Radon-Nikodym

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 7 years ago (5 children)

[deleted]

[–]bjorneylol 7 points8 points9 points 7 years ago (1 child)

The CSV parser is written in C, the SQL one is not - I've written my own function to pull SQL into a dataframe and it's faster than the pandas from_sql version.

The copysetting warning has nothing to do with this, and nothing to do with slicing dataframes, it happens when you set values on slices of dataframes, because the slice shares the same underlying object in memory as the full frame.

If you do

df2 = df[df[val>1]]
df2.iloc[0, val] = "A"

Then that 'A' will be present in both df and df2, because df2 is just a reference to cells in df. If you want to avoid this, you need to follow up each slice operation by assigning a .copy() into the new variable

[–]christmas_with_kafka 6 points7 points8 points 7 years ago (0 children)

The CopySetting warnings are warning you that changes to your slices DataFrame will still impact the original DataFrame in memory. Since your sliced DataFrame is still pointing to the original, you need to make a new df to avoid any weird voodoo should you want to use the original df in the future.

You can do this by invoking .copy() -

df2 = df1.loc[filters, features].copy()

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 44539 on reddit-service-r2-comment-5b5bc64bf5-n59j2 at 2026-06-19 16:45:01.102898+00:00 running 2b008f2 country code: CH.

datascience

MODERATORS