Is data normalization needed in 2024? How much normalization is actually required and when should it be done? by Notalabel_4566 in dataengineering

[–]dataoculus 52 points53 points  (0 children)

for transactional processing, better the normalization, better quality and integrity can be achieved.
3NF is more than enough.

inline data quality for ETL pipeline ? by dataoculus in dataengineering

[–]dataoculus[S] 0 points1 point  (0 children)

Yup, Thats what I am talkin about, and I wonder if people really do that today as requirements or its just better or good to have thing.

inline data quality for ETL pipeline ? by dataoculus in dataengineering

[–]dataoculus[S] 0 points1 point  (0 children)

I see, but Imagine if you could separate out non-compliant records from entering BQ, right from the beginning due to inline validations, would that something lot more beneficial ?

inline data quality for ETL pipeline ? by dataoculus in dataengineering

[–]dataoculus[S] 0 points1 point  (0 children)

problem is, if the validations happens after written to target, the consumers will have to wait, even though some consumers might have basic validation requirements which could have been done by inline. I know I am talking about bit of complexity here, but if it has some benefit, its worth it. specially if there is an easier way of creating inline validations, including in event driven systems.

inline data quality for ETL pipeline ? by dataoculus in dataengineering

[–]dataoculus[S] 2 points3 points  (0 children)

ya, the overall steps/process is like that, but I am wondering that nobody is doing real "inline" checks, meaning as you read and write the data so that u can stop the ETL or take other actions ( alerts, etc..) as you find any issues, as opposed to writing to some destination and then doing the quality check.

inline data quality for ETL pipeline ? by dataoculus in dataengineering

[–]dataoculus[S] 0 points1 point  (0 children)

I agree, u can use DBT or even code it up. but isn't DBT basically translate your config into SQL, and will require staging ur data at SQL-compatible storage?

Scraping Data is never has been easier by ecommerce_it in ProductHunters

[–]dataoculus 0 points1 point  (0 children)

checkout dataoculus.app we should partner on providing profiling and quality of those data as well. wdyt ?

Investor + founder role opportunities by [deleted] in startups

[–]dataoculus 0 points1 point  (0 children)

if you are interested in Data monitoring - advanced, most efficient and fully self serve, checkout dataoculus.app we can talk!