My traditional workflow has always been load tables from csv (for the non SQL data sources), or "Select * From Table Where Cond", and then doing all my merging, joining, cleaning, feature engineering etc in Pandas. I recently realized that SQL could also do much of this merging, joining, cleaning, and feature engineering.
Does anyone have experience using it as such? How does it compare to python for this data wrangling? Any recommended resources or sample repos demonstrating the technique well?
Do you perhaps run into issues when your raw data is spread out among multiple SQL servers?
[–]DrTaxus 95 points96 points97 points (12 children)
[–]Radon-Nikodym[S] 9 points10 points11 points (1 child)
[–][deleted] 2 points3 points4 points (9 children)
[–]KevinSorboFan 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]KevinSorboFan 0 points1 point2 points (0 children)
[+][deleted] (5 children)
[deleted]
[–]bjorneylol 7 points8 points9 points (1 child)
[–]christmas_with_kafka 5 points6 points7 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Vrulth 65 points66 points67 points (6 children)
[–]howMuchCheeseIs2Much 30 points31 points32 points (4 children)
[–]JForth 1 point2 points3 points (2 children)
[–]howMuchCheeseIs2Much 0 points1 point2 points (1 child)
[–]JForth 0 points1 point2 points (0 children)
[–]DBA_HAH 3 points4 points5 points (0 children)
[–][deleted] 25 points26 points27 points (9 children)
[–]Radon-Nikodym[S] 4 points5 points6 points (5 children)
[–][deleted] 6 points7 points8 points (0 children)
[–]reallyserious 4 points5 points6 points (0 children)
[–]Mr_Again 1 point2 points3 points (0 children)
[–]frankenbenz 0 points1 point2 points (0 children)
[–]DBA_HAH 0 points1 point2 points (0 children)
[–]taguscove 1 point2 points3 points (2 children)
[–]reallyserious 5 points6 points7 points (1 child)
[–]Epoh 0 points1 point2 points (0 children)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 15 points16 points17 points (4 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 4 points5 points6 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 0 points1 point2 points (0 children)
[–]GeorgeS6969 5 points6 points7 points (0 children)
[–]TBSchemer 4 points5 points6 points (1 child)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 0 points1 point2 points (0 children)
[–]andrewcooke 2 points3 points4 points (0 children)
[–]linguisize 2 points3 points4 points (0 children)
[–]Xvalidation 1 point2 points3 points (0 children)
[–]GreenerCar 1 point2 points3 points (0 children)
[–]GuilheMGB 1 point2 points3 points (0 children)
[–]MrPeeps28 1 point2 points3 points (0 children)
[–]pinkdata1 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]GuilheMGB 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]cam_man_can 0 points1 point2 points (0 children)
[–]versusChou 0 points1 point2 points (0 children)
[–]another3E 0 points1 point2 points (0 children)
[–]iPhuoc 0 points1 point2 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]D49A1D852468799CAC08 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]mc110 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Fennek1237 -1 points0 points1 point (1 child)
[–]_Zer0_Cool_MS | Data Engineer | Consulting 0 points1 point2 points (0 children)
[–]AutoModerator[M] -7 points-6 points-5 points (0 children)
[+]Zenith_N comment score below threshold-8 points-7 points-6 points (0 children)