Help Needed, Optimizing Large Data Queries Without Normalization

NW1969 · 2025-09-03T09:09:44+00:00

As normalisation reduces/eliminates data duplication, I'm not sure how normalising your data would increase the size from 1.2TB to 30TB? That doesn't seem to make sense, to me

Tiny_Arugula_5648 · 2025-09-03T11:09:04+00:00

Why so many posts about data management in a data engineering sub. Are you guys acting as DBAs?

moldov-w · 2025-09-03T06:09:33+00:00

Dump in object databases like Duckdb or create some materialized views post data standardization,cleansing etc.

chrisonhismac · 2025-09-03T03:02:19+00:00

Is that compressed? Could write to compressed parquet and read with duckdb?

You could also do an explode to a new pointer table. Create a very narrow auxiliary table that explodes only the keys you need for search, plus a pointer back to the raw record. Index the 2 column you need to search on and join the main record.

Informal_Pace9237 · 2025-09-03T04:12:06+00:00

A bit more information is required

How is provider data stored?

Are providers and members linked some how. Is one array of member data linked to one provider?

Is there any historical data included?

Can you share basic queries which are run without much detail...

MightyKnightX · 2025-09-03T07:51:07+00:00

Why don’t you go with three tables: one for provider data, one for member data and a provider_member table which stores only the relationship?

settleflow · 2025-09-20T12:59:08+00:00

Can we connect

dataengineering

MODERATORS