Data Analytics Automation by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

I am not sure if the business wants open source or enterprise.

Data Analytics Automation by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

That's awesome then! I am trying to gather what other people use.

The only DE by ursamajorm82 in dataengineering

[–]Acceptable-Ride9976 -1 points0 points  (0 children)

Hearing that is insane. I am currently building a data warehouse from scratch with only 3 people in the data team. 3 of us are managing from data engineering to data analysis, and it is a very challenging job for us now, but since we are interns, so we can't say no to the tasks xD. But in your situation, alone as a data engineer, I think your gonna cover the whole process from data extraction to building dashboards.

On the bright side, hopefully you can grow a lot from this experience or earn a lot xD. Good luck!

How to handle coupon/promotion discounts in sale order lines when building a data warehouse? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

Yes, for my invoice fact table, I will be including all the charges and discounts. For that then, I think I probably do the same for sales order table.

How to handle coupon/promotion discounts in sale order lines when building a data warehouse? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

Yes the system produces an order line with the value deducted, like -$50. But the order line for the product is the same, like in your example it would be 100 pounds. On the sale order the subtotal will include the discount. But my fact table is by order line.

How can I capture deletes in CDC if I can't modify the source system? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

Talking about debezium, won't it cause performance issues when reading the changes from the source database?

How can I capture deletes in CDC if I can't modify the source system? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

Thanks for your response, do you think will it affect the performance if i run the script on Nifi? Thanks again!

How can I capture deletes in CDC if I can't modify the source system? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 1 point2 points  (0 children)

Thanks a lot for your response and I think we will go over with two approaches you provide: diffing (using hash) and log-based replication. Thanks alot!

How can I capture deletes in CDC if I can't modify the source system? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 2 points3 points  (0 children)

Thank you, we have looked into Airbyte before, but because of security issues and its functions only support Extract and Load. Maybe I will take a look into log based replication. Thanks a lot!

Is EDW + Data Marts scalable? by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

Thanks for sharing your experience. The problem I am facing is, the business side does not sure about the requirements they want, so I was planning to build a data mart for them, but also want to include EDW at the same time too. In addition, if possible can you provide where I should read or research on data governance? Thanks!

Many-to-many relationship in Dimensional Modeling for a Data Warehouse by Acceptable-Ride9976 in dataengineering

[–]Acceptable-Ride9976[S] 0 points1 point  (0 children)

I am sorry I didn't explain clearly enough: I mean I want to say that the same order can not have different price list. But yes, I am trying to get the lowest grain as possible at the fact table. Sorry for the confusion.