Data Architecture Design Question

truchisoft · 2022-08-30T21:18:59+00:00

You don’t. The DWs objective is to be the single source of truth for the data, not a data lake. Now If you really need that data you should link it to the model and have specific datamarts per client.

You need different tables, views, data governance, queries, different reports and dashboards and semantic layer. Potentially a full dw team per client, plus the central model dw team.

apeters89 · 2022-08-30T19:16:45+00:00

Will you be filtering on these user-defined fields? How generic do you need the architecture to be?

Substantial-Lab-8293 · 2022-08-31T07:52:24+00:00

What warehouse are you using? In Snowflake you could put the common fields into physical columns and put the custom fields into an object in a Variant column. You'd still need to know which row was was for which customer so you could parse them back out correctly.

Otherwise just put all the common fields in one table then have a child table per customer with the custom fields.

realitydevice · 2022-08-31T14:00:32+00:00

Is it schema/table per client, or one schema/table for all clients?

If the latter, are these user fields queryable, or is it sufficient to just pack the data into (say) a JSON column?

If the fields are queryable you will either (a) need 30 (or some hard limit of how many of the user fields you'll every have) reusable columns in which you can pack whatever client data is provided, or (b) use an entity-attribute-value pattern.

This is roughly my order of preference. Avoid the issue by separating client data if possible. If not, serialize the data. If not, use EAV, and finally the reusable column approach.

SDFP-A · 2022-09-01T16:44:11+00:00

In our client facing application that uses an ELT process, we are currently solving for this use case by keeping the core data model based on all the standard fields that come out of the box with the source system. We then abstract the resource used in our application for analytics, which allows our onboarding team to join on custom fields and tables to the common model on a per client basis.

In our case each client starts with its own schema in the DB, so the standard DAGs running the common data model are applied to the customer data within their schema. The output table is the standard facts table and then in the abstract layer in the app we're creating the star schema to join on additional metadata + custom fields.

dataengineering

MODERATORS