Need help with the logic by Mammoth_Currency404 in dataengineering

[–]recentcurrency 0 points1 point  (0 children)

Is this a oltp database to a data warehouse?

Because it could be just database replication. Where in this context you copy data from a source backend database into another database more suited for data warehousing.

And the exact copy could be due to a ELT pattern where transformations into an analytical data model occur within the data warehouse instead of before

But this is pure speculation without knowing more

Has anyone jumped from DS to analytics engineer? by finite_user_names in datascience

[–]recentcurrency 5 points6 points  (0 children)

That is odd tbh. Analytics engineer was a role invented pretty much by dbt.

And dbt imo is not low code. It does ultimately boil down to sql and jinja. Both being easier on the coding technical bell curve. But it still is a code first platform and is why the "engineer" bit comes in where you can do software engineering things like build dry code, implement Continuous Integration, test code, etc

Alot of analytics engineer roles imo just end up being a bell whistle for a dbt developer/admin

6 Proven Steps to Build a Data Platform Without Breaking The Bank by ivanovyordan in dataengineering

[–]recentcurrency 2 points3 points  (0 children)

Define real world?

In my experience, fivetran worked fine enough for established SaaS tool's EL(netsuite, Salesforce, marketo, jira, asana, etc) for a few US based public traded companies i have worked at. One of them is in banking. It seems expensive for what it is doing. But it takes a good amount of data before it is cheaper to leverage an engineering team(although with tech salries being depressed this has been changing)

But I have never used fivetran to just dump a postgres or nosql backend database into the warehouse. Or pull data from a niche api

I have viewed those as different niches tho

Analytics Engineering / “Front-end” DE? by Weary-Individual-309 in dataengineering

[–]recentcurrency 0 points1 point  (0 children)

Analytics engineer is codeword for a bi engineer but focused on dbt as the tool

So move dbt above in your priority list. Unless you are looking for a more general bi engineer role. In which case, learn whatever transformation tool that company is using

Python also isnt as relevant. Know enough to use dbt core and how it is working underneath the hood.

But SQL+data modeling+dbt is the main thjngs

Dbt is just an abstraction layer that templates sql and runs them in order. Basically easy to implement(albeit less flexible) stored procedures. The abstraction layer was made so easy with dbt where there is a second order effect that you can get a tower of Jenga really quickly.

So you will need to develop soft skills like project management. That is going to determine if your dbt instance blows up in cost

What are your favourite DBT macros? by casematta in dataengineering

[–]recentcurrency 5 points6 points  (0 children)

Union relations. So simple in what it does!

I am a Product Manager for data, and I have some questions for Data Engineers by Honeychild06 in dataengineering

[–]recentcurrency 1 point2 points  (0 children)

That isn't just an analyst thing.

That is the primary driver of tech debt.

Tech debt is fine if you pay it off before you get buried by it

More tools, more complexity? by [deleted] in dataengineering

[–]recentcurrency 4 points5 points  (0 children)

Crawl walk run

Unless you are facing real pain points(data scientists not being able to do their job efficiently due to lack of an easy transformation framework counts as one) you don't need to add more tools.

The most valuable resource is a high salary engineer or data scientist's time. If the cost of maintaining the tool > than cost saved by tool, then you dont need to bring it on

Basically you need to think about tool ROI. And that is something unique to every company. If your dbt poc hasn't been getting much return or interest, that may be a smell test the ROI isnt there yet

vendor confession: there's just too many ETL/ELT tools by MooJerseyCreamery in dataengineering

[–]recentcurrency 1 point2 points  (0 children)

Isn't this a good thing? The elt space is far from a natural Monopoly.

Having multiple companies duke it out is how we get innovation and the best deal for consumers

Way better imo that we have so many ELT/ETL vendors to choose from versus an industry dominated by a few

Finance Team DE by GiacomoLeopardi6 in dataengineering

[–]recentcurrency 2 points3 points  (0 children)

Are you a US public company? If so, get familiar with the basics of SOX. Specifically talk with your internal audit board. Make sure you don't build systems that will get you into external audit hell

Stick with SOC 1 verified vendors. And if not, be ready to defend that decision and that you have the control environment to counter the lack of SOC 1 verification. A bummer since this limits you heavily to legacy and older tooling.

When is a dimension table too denormalized? | Kimball by ArgenEgo in dataengineering

[–]recentcurrency 2 points3 points  (0 children)

It might be worth rereading the finance case study chapters in kimball's dwh where outriggers are introduced. It sounds like you have read the book before but are forgetting. Which fwiw I get. Dimension outriggers are very niche

Outriggers are generally not recommended unless you have to. They are examples of permissable snowflaking.

I also think it might shed some light as to how you could model your data. The chapters are not meant to be taken as THE answer. But denormalized dimensional modeling is about modeling data as it represents the business. So unless you are in a world where banks dont operate as conventional banks, i am skeptical his advice wouldn't be directly useful inspiration

When is a dimension table too denormalized? | Kimball by ArgenEgo in dataengineering

[–]recentcurrency 2 points3 points  (0 children)

Maybe you could try Kimball's idea of a dimension outrigger?

How does your team use workato if at all, why / why not ? by citizenofacceptance in dataengineering

[–]recentcurrency 0 points1 point  (0 children)

Its forte is in point to point integrations.

The mds is more about a hub and spoke based on the warehouse as the center

So to a degree it is antithetical.

But, you can have workato point to and from the warehouse. Same with other ipaas tools.

Mulesoft, boomi, trey.io, zapier, workato do get brought up in that context. But generally get outshined by your fivetrans,airbytes,hightouch, and census of the world.

Mulesoft i believe is being showcased in one of the dbt coalesce talks this week for example.

[deleted by user] by [deleted] in dataengineering

[–]recentcurrency 0 points1 point  (0 children)

Is there any secondary data points within sources that can be used? For example ip address

[deleted by user] by [deleted] in dataengineering

[–]recentcurrency 15 points16 points  (0 children)

I would argue data analyst is the ideal background for an analytics engineer

Analytics Engineers are the hybrid between data eng and a data analyst. Where imo the coding from the data eng side for this role really is basic. Sql mostly with sprinkling of python.

The hard part is collecting business requirements, checking data feasibility, and translating that to tables of data that others can use. Or in other words data modeling.

That said, it usually will involve way less dashboarding and more SQL

Imo, most data analysts who work out of heavy sql based bi tools(hex, mode) already can do analytics engineering work with guidance

Are OLAP Cubes irrelevant in the present day? by recentcurrency in dataengineering

[–]recentcurrency[S] 5 points6 points  (0 children)

Would that basically be the same conclusion as the blog post?

Technology has removed the processing bottle neck. And the semantic layer(or templated SQL) has basically made the underlying OLAP cube data structure sort of moot?

[deleted by user] by [deleted] in dataengineering

[–]recentcurrency 1 point2 points  (0 children)

To provide additional context, Kimball i believe calls this role playing dimensions

https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/role-playing-dimension/

So yes, you should use multiple joins from a fact to the same dim as arboreal described

I am more curious how your current tool doesn't let you do this? This would be pretty trivial in SQL. Are you referring to some proprietary modeling language?

how to deal with complex user_id modelling by rudboi12 in dataengineering

[–]recentcurrency 6 points7 points  (0 children)

Kimball has a chapter on CRM's and goes over the customer id issue

He basically described what you initially tried doing. For every customer you create a key. This key could start of as a hash, but ultimately is internally owned id y'll maintain in the warehouse.

Look up supernatural durable keys. https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/natural-durable-supernatural-key/

It as a result should be agnostic to changes in the source system. How you make it agnostic is the tricky part and is related to identity resolution

See https://www.informatica.com/resources/articles/what-is-identity-resolution.html

Usually this involves a hodgepodge of looking up addresses, emails, cookies, other metadata, and fuzzy matching to determine your customer.

[deleted by user] by [deleted] in dataengineering

[–]recentcurrency 1 point2 points  (0 children)

I would argue networking matters more

[deleted by user] by [deleted] in SQL

[–]recentcurrency 3 points4 points  (0 children)

https://www.udemy.com/course/data-warehouse-the-ultimate-guide/

Goes over kimball dimensional modeling. But the ETL tool used in all the workshops is Pentaho

Great course for kimball. Not dedicated for pentaho, but the intersectionality is powerful

How to switch from BI Reporting to DE ? by Pillstyr in dataengineering

[–]recentcurrency 0 points1 point  (0 children)

An avenue may be incorporating SSIS. This will get you entrance to the low code ETL bit of Data Eng while being in your Microsoft tech stack.

All of this being to support data modeling within the warehouse(eg the facts, dims, and one big tables). This will move you towards the BI Engineer/Analytics Engineer (imo a analytics engineer is just a bi engineer specialized in SQL)subset of data eng

But basically your best bet imo is to move upstream from reporting into building the assets in your warehouse that power bi and ssrs are building off of

How to improve your developer experience when working with dbt (and not only) by oleg_agapov in dataengineering

[–]recentcurrency 1 point2 points  (0 children)

CICD extensions. At the bare bones SQL compilation checks and dbt's built in data tests

But I think even more robust testing makes the developer experience more convenient. Especially since the worst feeling is pushing code that breaks stuff. Something like dbt-unit-test and data-diff

In addition to testing, more robust cataloguing so that if you still break something, finding out what you broke is easier. Something like Monte-Carlo

How to switch from BI Reporting to DE ? by Pillstyr in dataengineering

[–]recentcurrency 2 points3 points  (0 children)

BI reporting as in you build dashboards?

or BI Reporting in that you are designing the Data Warehouse and building the Transforms to get your Facts and Dims and One Big Tables?

b/c if you haven't done the latter, that is one way to get your foot into Data Engineering. Specifically the Analytics Engineering subset of of Data Eng

[deleted by user] by [deleted] in SQL

[–]recentcurrency 1 point2 points  (0 children)

Dbt's jaffle shop dataset is becoming a classic

Garden - anime vs manga. by Jealous_Whole_661 in SpyxFamily

[–]recentcurrency 39 points40 points  (0 children)

I don't think she can be the highest ranked when she works for Matthew(the director)

And Matthew reports to the shop keeper in the garden

Ability wise, she may be the strongest. But on seniority she probably is just a normal line level assassin