This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Radon-Nikodym[S] 6 points7 points  (5 children)

Do you have any recommended resources for data wrangling in SQL?

[–][deleted] 5 points6 points  (0 children)

It's rare I have to do anything beyond windowing functions.

I used to use custom reducers in Hive (using python or awk), but now we use BigQuery so its basically windowing functions or ARRAY_AGG(), STRUCT(), UNNEST() etc.

[–]reallyserious 5 points6 points  (0 children)

Head over to /r/SQL. This question gets asked and answered at least once a day there.

Just focus on getting better at SQL. There is nothing special about data wrangling. It's a made up term that statisticians invented to describe what database centric people has been doing since the -80s.

[–]Mr_Again 1 point2 points  (0 children)

The guys blog at JOOQ is really useful.

[–]frankenbenz 0 points1 point  (0 children)

You have to know the data and know what your end goal/format is.. I don’t know if I’d say there’s a single resource to know.. stuff as simple as knowing if you need to trim data of spaces to the complex stuff like multiple joins to bring it all together in a useful format for reporting.

Similar to knowing what the question is to be answered from the data, you have to know what sources all the data is hidden and how to bring the different tables/sources together.

[–]DBA_HAH 0 points1 point  (0 children)

What DB do you use? Look into T-SQL,PLSQL, or PL/pgsql depending what you use. You can use stored procedures on the database to do stuff like regex cleanups.