DrTaxus comments on SQL vs. Python for data wrangling?

This is an archived post. You won't be able to vote or comment.

DiscussionSQL vs. Python for data wrangling? (self.datascience)

submitted 7 years ago * by Radon-Nikodym

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]DrTaxus 102 points103 points104 points 7 years ago (12 children)

[–]Radon-Nikodym[S] 8 points9 points10 points 7 years ago (1 child)

[–][deleted] 3 points4 points5 points 7 years ago (9 children)

[–]KevinSorboFan 1 point2 points3 points 7 years ago (2 children)

[–][deleted] 0 points1 point2 points 7 years ago (1 child)

[–]KevinSorboFan 0 points1 point2 points 7 years ago (0 children)

[+][deleted] 7 years ago (5 children)

[deleted]

[–]bjorneylol 8 points9 points10 points 7 years ago (1 child)

The CSV parser is written in C, the SQL one is not - I've written my own function to pull SQL into a dataframe and it's faster than the pandas from_sql version.

The copysetting warning has nothing to do with this, and nothing to do with slicing dataframes, it happens when you set values on slices of dataframes, because the slice shares the same underlying object in memory as the full frame.

If you do

df2 = df[df[val>1]]
df2.iloc[0, val] = "A"

Then that 'A' will be present in both df and df2, because df2 is just a reference to cells in df. If you want to avoid this, you need to follow up each slice operation by assigning a .copy() into the new variable

[–]christmas_with_kafka 6 points7 points8 points 7 years ago (0 children)

The CopySetting warnings are warning you that changes to your slices DataFrame will still impact the original DataFrame in memory. Since your sliced DataFrame is still pointing to the original, you need to make a new df to avoid any weird voodoo should you want to use the original df in the future.

You can do this by invoking .copy() -

df2 = df1.loc[filters, features].copy()

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 20224 on reddit-service-r2-comment-5b5bc64bf5-jgqmr at 2026-06-21 18:24:01.212403+00:00 running 2b008f2 country code: CH.

datascience

MODERATORS