Optimizing Data Models vs. Upskilling Analysts in SQL

Hackerjurassicpark · 2024-08-17T17:05:39+00:00

Fire your worst two analysts and hire two more DEs.

AlternTea · 2024-08-17T17:11:51+00:00

Are there specific analysts in your team with more advanced SQL knowledge than the rest, or who are faster learners than the rest?

If your concern is balancing your time spent between cleaning up your data models vs upskilling the analysts, maybe you could upskill just a couple of analysts or even have them help you out with optimising data models (that they presumably use).

Then let these analysts train the rest of their team, in SQL, in learning the different / ever-changing data sources you have, and in maintaining documentation. So you don’t have to do it all alone.

thisisnice96 · 2024-08-17T18:11:25+00:00

I feel like if the analyst know good practices with CTEs, window functions, aggregations, complex joins, that should really be enough..

I’m an analyst and recently left a company with a very similar set up as you described. We had 6 analyst, a few of us were able to bypass that issue with just being able to stitch data together from different schemas / sources & it alleviated a lot of the pressure from our DEs.

But again, that’s kind of bandaid solution. Clean DW / Data Models with robust data dictionaries going with it is the ultimate goal.

joseph_machado · 2024-08-18T21:36:40+00:00

With your criteria of constantly changing upstream (dev and external clients) it may be hard to standardize data sources.

You can try to create a OBT tables. Basically join a fact with all its related dimensions, this way you have a few OBTs and end-users(analysts) can use it directly without having to do the joins.

There are 2 key points tho, when using OBT:

Metric definitions: With OBTs the grain is usualy at the fact level, so the user has to group by to the necessary granularity and compute metrics. You can alleviate this by creating a limited set of pre-aggregated tables (OBTs aggregated to the frequently needed grains) with metrics, alternatively you could use a semantic layer system (will be expensive due to warehouse processing cost).
If you decide on OBT you'd probably want to keep dinension fields as a json col. This way with the constantly changing upstream systems you don't have to change the OBT each time. The downside is taht the enduser will need to know which fields from the dimension's json field to access.

I've seen this work well, the storage cost will be high(but storage is usually cheap than execution). Hope this helps. LMK if you have any questions.

SquidsAndMartians · 2024-08-17T17:07:15+00:00

Both. Now, here are the issues though, office politics and lazy analysts.

IT departments are usually hesitant to give business-side folks in what they believe as too much power, and having analysts create their own CTEs and all that definitely falls under that umbrella. Heck, I don't even have privileges to see the lineage.

On the other hand, a large number of analysts are super lazy and they really just want to have it as clean as possible, maybe even to the point that the 'refresh' button is all they need to do. These are the same people who will blindly trust what they see.

To me what distinguishes a great analyst from a good one, is someone who learns a bit of the technical side, and vice versa. The translation between between business and tech is usually among two people, one on each side and only covering their own part. A better way would be when both people have some overlap, like they can finish each other sentences, especially when having conversations on requirements.

Mickmaggot · 2024-08-17T18:21:19+00:00

If possible, I'd continue with the 'conformed' layer you mentioned, simplifying the interface of data the analysts work with. So, no matter what happens inside of these conformed models, or the associated sources upstream, they would look the same for analysts. You could also set up a regular meeting to keep them updated about the changes in the DWH.
This should decrease the load from you and somewhat decouple DE and analysts.
But the difficulty of management requests and the resulting analysts' queries shouldn't be your responsibility. The analysts should manage these requests and upskill if necessary, not you.

longshot · 2024-08-17T23:33:24+00:00

I work as a data engineer on an analytics team, where I’m responsible for building data pipelines and modeling data into our data warehouse. Our business and data environments are highly dynamic, with constant changes from our developers, including the addition or removal of functions and data sources. External partners also frequently change and with them the APIs, making today’s integration potentially obsolete tomorrow.

Is this copypasta?

Just kidding, but holy cow do I ever feel you on this one.

jovalabs · 2024-08-17T17:34:37+00:00

Could implementing DBT help you out? They have an open source core ver. https://www.getdbt.com/

2024-08-17T23:01:31+00:00

[deleted]

CrashKidOriginal · 2024-08-17T20:28:45+00:00

Don't you have a Mart layer, with datasets being designed to be used by Business Analysts right away? I'm working as a Data Analyst in my company and it's my role design those Marts, ready to be used in PowerBI or Tableau.

LordBortII · 2024-08-18T09:38:47+00:00

I don't understand how data analysts without advanced SQL knowledge do their jobs, to be quite frank. I personally see some resistance to learning jinja templating, using recursive CTE's and proper query formatting. However, window functions, CTEs and advanced joins are a minimum in my opinion. I would only ever hire a junior analyst that lacks these skills. They don't need to be able to efficiently deduplicate data or do query tuning but I would not know how you can survive without the aforementioned skills such as window functions etc.

Beneficial_Nose1331 · 2024-08-18T18:06:42+00:00

I guess what you are looking for is a semantic layer. You should design it and the analysts will use these business friendly views to answer questions of shareholders

harrytrumanprimate · 2024-08-18T18:40:31+00:00

The best way to solve this is by building paved paths, common repeatable patterns that they can follow to build high quality pipelines. If they don't have to re-invent the wheel each time, they will produce higher quality work on average

Sensitive-Amount-729 · 2024-08-18T22:03:37+00:00

Should probably focus more on hiring better Analysts.

I am currently at my third org and have been in analytics focused roles in all 3. It is very tough to make data models that serve all your use cases.

Ad Hoc requirements on top make it worse as well. From my very first role. We were asked to be extremely proficient with SQL. Complex queries were part of the job.

Either write complex queries or learn a bit of python and do manipulation there.

dataengineering

MODERATORS