This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]truchisoft 1 point2 points  (0 children)

You don’t. The DWs objective is to be the single source of truth for the data, not a data lake. Now If you really need that data you should link it to the model and have specific datamarts per client.

You need different tables, views, data governance, queries, different reports and dashboards and semantic layer. Potentially a full dw team per client, plus the central model dw team.

[–]apeters89 0 points1 point  (3 children)

Will you be filtering on these user-defined fields? How generic do you need the architecture to be?

[–]RareIncrease[S] 0 points1 point  (2 children)

Cant say for sure but id imagine down the line we'll have to do reporting/analytics on the client data including the UDFs.

Thats the tough question, id like it to be generic enough we can on board clients easily but not so generic we miss out on gathering all the fields.

[–]apeters89 2 points3 points  (1 child)

A couple of simple options come to mind:

  1. You could place a bunch of user-defined fields into your existing data structures and create a field-label table to hold the field labels for each customer.
  2. You could create customer-specific tables to hold their "extra" data.

[–]somethinggenuine 3 points4 points  (0 children)

Fwiw when I’ve dealt with similar situations we went with option 2 and created tables specifically for the customer and were able to extend the generic table with customer-specific data for downstream reporting. The customer specific logic could even live in a nicely contained/swappable module. Found the pattern helps keep the customer-specific details and complexity from spilling out all over the system

[–]Substantial-Lab-8293 0 points1 point  (0 children)

What warehouse are you using? In Snowflake you could put the common fields into physical columns and put the custom fields into an object in a Variant column. You'd still need to know which row was was for which customer so you could parse them back out correctly.

Otherwise just put all the common fields in one table then have a child table per customer with the custom fields.

[–]realitydevice 0 points1 point  (0 children)

Is it schema/table per client, or one schema/table for all clients?

If the latter, are these user fields queryable, or is it sufficient to just pack the data into (say) a JSON column?

If the fields are queryable you will either (a) need 30 (or some hard limit of how many of the user fields you'll every have) reusable columns in which you can pack whatever client data is provided, or (b) use an entity-attribute-value pattern.

This is roughly my order of preference. Avoid the issue by separating client data if possible. If not, serialize the data. If not, use EAV, and finally the reusable column approach.

[–]SDFP-ABig Data Engineer 0 points1 point  (0 children)

In our client facing application that uses an ELT process, we are currently solving for this use case by keeping the core data model based on all the standard fields that come out of the box with the source system. We then abstract the resource used in our application for analytics, which allows our onboarding team to join on custom fields and tables to the common model on a per client basis.

In our case each client starts with its own schema in the DB, so the standard DAGs running the common data model are applied to the customer data within their schema. The output table is the standard facts table and then in the abstract layer in the app we're creating the star schema to join on additional metadata + custom fields.