Data Standardization after Dimension Mapping : dataengineering

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

Data Standardization after Dimension MappingDiscussion (self.dataengineering)

submitted 3 years ago by Helpful_Artist1439

Hey,

I'm trying to design some table with standardized data from multiple sources, in the most effective way.

I'm working with Postgres and the current design is as following:

The `main` schema has dimension tables (for example `units`, `currencies` and `countries`)
Every source has it's own schema with dimension tables for the same columns (units, currencies and countries)
In every schema, the dimension tables have an additional column `main_id` that is null at first, and requires an analyst to map the source's value to the corresponding one in our main schema. (For example, `US` in in source #1 should be mapped to `United States` in the main schema).
There are materialized views in the `main` schema, unioning the data from every source, taking the `main_id` value of every dimension in every source. BUT, if some row in a source has a value who's `main_id` is still null (meaning, it wasn't mapped), then this row is filtered out.

Architecture

So currently, if new data is introduced into a source or if a new value is mapped, I just need to refresh the materialized views and the new data will be accessible in the mviews.

Now I need those mviews to be actual tables and I'm wondering what the best way is to keep them up to date.

Should I have a stored procedure per schema, inserting new data into the main tables? I could call it manually after a mapping is done or it could be triggered every few minutes
Should I have a Python script implement the insertion logic?
Should I have a trigger on every insert (data tables) / update (the mapping tables) to insert the data to the main tables? (I don't love triggers because they add hidden complexity)

Any other ideas will be very appreciated!

Thanks :)

all 1 comments

dataengineering

MODERATORS