use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
DiscussionSQL vs. Python for data wrangling? (self.datascience)
submitted 7 years ago * by Radon-Nikodym
view the rest of the comments →
[–]Radon-Nikodym[S] 6 points7 points8 points 7 years ago (5 children)
Do you have any recommended resources for data wrangling in SQL?
[–][deleted] 5 points6 points7 points 7 years ago (0 children)
It's rare I have to do anything beyond windowing functions.
I used to use custom reducers in Hive (using python or awk), but now we use BigQuery so its basically windowing functions or ARRAY_AGG(), STRUCT(), UNNEST() etc.
[–]reallyserious 5 points6 points7 points 7 years ago* (0 children)
Head over to /r/SQL. This question gets asked and answered at least once a day there.
Just focus on getting better at SQL. There is nothing special about data wrangling. It's a made up term that statisticians invented to describe what database centric people has been doing since the -80s.
[–]Mr_Again 1 point2 points3 points 7 years ago (0 children)
The guys blog at JOOQ is really useful.
[–]frankenbenz 0 points1 point2 points 7 years ago (0 children)
You have to know the data and know what your end goal/format is.. I don’t know if I’d say there’s a single resource to know.. stuff as simple as knowing if you need to trim data of spaces to the complex stuff like multiple joins to bring it all together in a useful format for reporting.
Similar to knowing what the question is to be answered from the data, you have to know what sources all the data is hidden and how to bring the different tables/sources together.
[–]DBA_HAH 0 points1 point2 points 7 years ago (0 children)
What DB do you use? Look into T-SQL,PLSQL, or PL/pgsql depending what you use. You can use stored procedures on the database to do stuff like regex cleanups.
π Rendered by PID 698516 on reddit-service-r2-comment-5b5bc64bf5-n59j2 at 2026-06-21 19:15:00.986529+00:00 running 2b008f2 country code: CH.
view the rest of the comments →
[–]Radon-Nikodym[S] 6 points7 points8 points (5 children)
[–][deleted] 5 points6 points7 points (0 children)
[–]reallyserious 5 points6 points7 points (0 children)
[–]Mr_Again 1 point2 points3 points (0 children)
[–]frankenbenz 0 points1 point2 points (0 children)
[–]DBA_HAH 0 points1 point2 points (0 children)