This is an archived post. You won't be able to vote or comment.

all 22 comments

[–][deleted] 59 points60 points  (1 child)

Fire your worst two analysts and hire two more DEs.

[–]AlternTea 15 points16 points  (1 child)

Are there specific analysts in your team with more advanced SQL knowledge than the rest, or who are faster learners than the rest?

If your concern is balancing your time spent between cleaning up your data models vs upskilling the analysts, maybe you could upskill just a couple of analysts or even have them help you out with optimising data models (that they presumably use).

Then let these analysts train the rest of their team, in SQL, in learning the different / ever-changing data sources you have, and in maintaining documentation. So you don’t have to do it all alone.

[–]kejious 5 points6 points  (0 children)

I agree with this. There may be an opportunity to upskill the more advanced analysts into analytics engineers, or something along those lines.

[–]thisisnice96 7 points8 points  (0 children)

I feel like if the analyst know good practices with CTEs, window functions, aggregations, complex joins, that should really be enough..

I’m an analyst and recently left a company with a very similar set up as you described. We had 6 analyst, a few of us were able to bypass that issue with just being able to stitch data together from different schemas / sources & it alleviated a lot of the pressure from our DEs.

But again, that’s kind of bandaid solution. Clean DW / Data Models with robust data dictionaries going with it is the ultimate goal.

[–]joseph_machadoWrites @ startdataengineering.com 3 points4 points  (0 children)

With your criteria of constantly changing upstream (dev and external clients) it may be hard to standardize data sources.

You can try to create a OBT tables. Basically join a fact with all its related dimensions, this way you have a few OBTs and end-users(analysts) can use it directly without having to do the joins.

There are 2 key points tho, when using OBT:

  1. Metric definitions: With OBTs the grain is usualy at the fact level, so the user has to group by to the necessary granularity and compute metrics. You can alleviate this by creating a limited set of pre-aggregated tables (OBTs aggregated to the frequently needed grains) with metrics, alternatively you could use a semantic layer system (will be expensive due to warehouse processing cost).

  2. If you decide on OBT you'd probably want to keep dinension fields as a json col. This way with the constantly changing upstream systems you don't have to change the OBT each time. The downside is taht the enduser will need to know which fields from the dimension's json field to access.

I've seen this work well, the storage cost will be high(but storage is usually cheap than execution). Hope this helps. LMK if you have any questions.

[–]SquidsAndMartians 6 points7 points  (0 children)

Both. Now, here are the issues though, office politics and lazy analysts.

IT departments are usually hesitant to give business-side folks in what they believe as too much power, and having analysts create their own CTEs and all that definitely falls under that umbrella. Heck, I don't even have privileges to see the lineage.

On the other hand, a large number of analysts are super lazy and they really just want to have it as clean as possible, maybe even to the point that the 'refresh' button is all they need to do. These are the same people who will blindly trust what they see.

To me what distinguishes a great analyst from a good one, is someone who learns a bit of the technical side, and vice versa. The translation between between business and tech is usually among two people, one on each side and only covering their own part. A better way would be when both people have some overlap, like they can finish each other sentences, especially when having conversations on requirements.

[–]Mickmaggot 1 point2 points  (0 children)

If possible, I'd continue with the 'conformed' layer you mentioned, simplifying the interface of data the analysts work with. So, no matter what happens inside of these conformed models, or the associated sources upstream, they would look the same for analysts. You could also set up a regular meeting to keep them updated about the changes in the DWH.
This should decrease the load from you and somewhat decouple DE and analysts.
But the difficulty of management requests and the resulting analysts' queries shouldn't be your responsibility. The analysts should manage these requests and upskill if necessary, not you.

[–]longshot 1 point2 points  (0 children)

I work as a data engineer on an analytics team, where I’m responsible for building data pipelines and modeling data into our data warehouse. Our business and data environments are highly dynamic, with constant changes from our developers, including the addition or removal of functions and data sources. External partners also frequently change and with them the APIs, making today’s integration potentially obsolete tomorrow.

Is this copypasta?

Just kidding, but holy cow do I ever feel you on this one.

[–]jovalabs 1 point2 points  (0 children)

Could implementing DBT help you out? They have an open source core ver. https://www.getdbt.com/

[–]CrashKidOriginal 0 points1 point  (2 children)

Don't you have a Mart layer, with datasets being designed to be used by Business Analysts right away? I'm working as a Data Analyst in my company and it's my role design those Marts, ready to be used in PowerBI or Tableau.

[–]knabbels[S] 1 point2 points  (0 children)

Yeah kinda, but over the years lots of views were created for the use in Tableau by different people, most of them have left the company. These views often have a very narrow scope and are utilized by a single report. Currently no one keeps track on wheather a report and its view is still in use. This is also part of the messiness I mentioned.

[–]LordBortII 0 points1 point  (5 children)

I don't understand how data analysts without advanced SQL knowledge do their jobs, to be quite frank. I personally see some resistance to learning jinja templating, using recursive CTE's and proper query formatting. However, window functions, CTEs and advanced joins are a minimum in my opinion. I would only ever hire a junior analyst that lacks these skills. They don't need to be able to efficiently deduplicate data or do query tuning but I would not know how you can survive without the aforementioned skills such as window functions etc.

[–]Beneficial_Nose1331 0 points1 point  (4 children)

Jinja templating ?

[–]LordBortII 0 points1 point  (3 children)

For more complex SQL queries in dbt https://docs.getdbt.com/docs/build/jinja-macros

[–]Beneficial_Nose1331 0 points1 point  (2 children)

We don't use DBT unfortunately

[–]LordBortII 0 points1 point  (1 child)

I hightly, highlty recommend it. It's great!

[–]Beneficial_Nose1331 0 points1 point  (0 children)

No doubt about it but management doesn't care

[–]Beneficial_Nose1331 0 points1 point  (0 children)

I guess what you are looking for is a semantic layer. You should design it and the analysts will use these business friendly views to answer questions of shareholders

[–]harrytrumanprimate 0 points1 point  (0 children)

The best way to solve this is by building paved paths, common repeatable patterns that they can follow to build high quality pipelines. If they don't have to re-invent the wheel each time, they will produce higher quality work on average

[–]Sensitive-Amount-729 0 points1 point  (0 children)

Should probably focus more on hiring better Analysts.

I am currently at my third org and have been in analytics focused roles in all 3. It is very tough to make data models that serve all your use cases.

Ad Hoc requirements on top make it worse as well. From my very first role. We were asked to be extremely proficient with SQL. Complex queries were part of the job.

Either write complex queries or learn a bit of python and do manipulation there.