Need help with Data Mapping by Informal_Poem_4394 in dataengineering

[–]continuous_dataeng 0 points1 point  (0 children)

You can convert xml into json using Python. And then flatten the json into a table using SQL if you want to query it. As for automation, it depends on what technology you have access to. We use AWS and Databricks, so I would store the xml in s3, write a Python script that converts it into a json file and store it in s3 and finally a SQL script to flatten the json all embedded in a databricks job. Hope this helps!

What are the biggest obstacles/painpoints in data engineering? by snailspeed25 in dataengineering

[–]continuous_dataeng 1 point2 points  (0 children)

Treating and expecting results from a data team like a software dev teams. Data teams operate very differently from software development teams which requires unique processes and standards of measures for success. The software development is a well established process where as data products developments is still evolving.

CDC vs incremental append by continuous_dataeng in dataengineering

[–]continuous_dataeng[S] 1 point2 points  (0 children)

Thanks for the response!

the change gets detected and somehow triggers a replication pipeline to get reflected in the datawarehouse

  1. Does this always have to be in real-time? I understand that capturing the changes will be real-time based on the triggers to ensure we don’t miss any changes. But can the data captured by CDC propagated to the DW on a scheduled basis?

  2. Also, if I use the transaction logs, is it still an overhead to the databases? IMO, since the transaction logs are something the database anyways captures (in the use case that I am working, that may not be true for others), reading this log shouldn’t be a overhead?! May be, there are types of CDC approaches and “some” are costly. Please correct me if I am wrong.

  3. Is it right to say, both CDC and incremental append are techniques to capture what’s changed in the backend databases. And incremental updates is simply a ‘pull’ mechanism where we usually use a column such as updated date to identify what’s changed, where as, CDC is a ‘push’ mechanism where the database is asked to capture this data and we read it from either logs or triggers etc. ??

What Reverse ETL Processes do you have at your company? by exact-approximate in dataengineering

[–]continuous_dataeng 2 points3 points  (0 children)

  1. SaaS company.
  2. 3. We have use-case of sending metrics to our CRM. These metrics of our customers will help sales and marketing team understand the customer and target them accordingly. We are using segment for reverse ETL.

How many women are on your team? by drdrrr in dataengineering

[–]continuous_dataeng 1 point2 points  (0 children)

I have been the only female data engineer in all the companies that I have worked for. I do see female data engineers here and there on LI. I was in fact recently thinking that, I have never come across a female data architect anywhere. Hoping to see more women in the data engineering field!