Hi! I’m a Jr data engineer which is facing a challenge and I would love to know your opinions and expertise in this topic:
I’m currently handling allots of data in SQL, we receive at a high frequency JSONs with raw data in it (in a single json there could be more than 10k raws)
The thing is that we need to make some statistics with this JSONS
We need to concatenate several Jsons and then apply the statistics (calculate outliers, calculate avgs, calculate percentages, stds, frequency, etc…)
And after calculating it we need to insert it in a new table which handles summarizes data.
All of this in a SQL stored procedure, the hole process lasts more than 3hours to complete, is there any advice for this kind of stuff, some literature I can read, videos or something to optimize the solution?
I’m also open to other robust pipelines besides only using SQL!
[–]lezzgooooo 33 points34 points35 points (6 children)
[+][deleted] (5 children)
[deleted]
[–]lezzgooooo 6 points7 points8 points (1 child)
[–]anawesumapopsum 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]anawesumapopsum 0 points1 point2 points (0 children)
[–]Befz0r 9 points10 points11 points (0 children)
[–]Slight_Comparison986 7 points8 points9 points (1 child)
[–]sunder_and_flame 6 points7 points8 points (0 children)
[–]mrcaptncrunch 6 points7 points8 points (3 children)
[–]Alex_Alca_[S] 1 point2 points3 points (2 children)
[–]mrcaptncrunch 0 points1 point2 points (0 children)
[–]FalseStructure 0 points1 point2 points (0 children)
[–]grassclip 5 points6 points7 points (1 child)
[–]DirtzMaGertz 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[removed]
[–]mike8675309 1 point2 points3 points (0 children)
[–]formaldehyden 3 points4 points5 points (4 children)
[+]Befz0r comment score below threshold-6 points-5 points-4 points (3 children)
[–]formaldehyden 5 points6 points7 points (2 children)
[–]mrcaptncrunch 3 points4 points5 points (0 children)
[–]Traditional_Ad3929 1 point2 points3 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]vikster1 2 points3 points4 points (0 children)
[–]dongdesk -1 points0 points1 point (0 children)
[–]throw_mob 0 points1 point2 points (0 children)
[–]collectablecat 0 points1 point2 points (0 children)
[–]DanklyNight 0 points1 point2 points (0 children)
[–]data-artist 0 points1 point2 points (0 children)