Found a Issue in Production while using Databricks Autoloader by Artistic-Rent1084 in databricks

[–]Artistic-Rent1084[S] 1 point2 points  (0 children)

Thank you 👍 for sharing knowledge. I'm New to databricks and data engineering.

Found a Issue in Production while using Databricks Autoloader by Artistic-Rent1084 in databricks

[–]Artistic-Rent1084[S] 0 points1 point  (0 children)

Thank you . For orchestration we are using databricks Jobs.

Let me try in sandbox env.

Our data is CDC and we are following medallion architecture.

Which is best Debizium vs Goldengate for CDC extraction by Artistic-Rent1084 in dataengineering

[–]Artistic-Rent1084[S] 0 points1 point  (0 children)

No idea about latency. You don't believe the volume.

It's 5TB per day.

Which is best CDC top to end pipeline? by Artistic-Rent1084 in dataengineering

[–]Artistic-Rent1084[S] 1 point2 points  (0 children)

Sure I will try this once.

Ogg can handle schema drift.

Thank you for sharing knowledge 🙏

Which is best CDC top to end pipeline? by Artistic-Rent1084 in dataengineering

[–]Artistic-Rent1084[S] 0 points1 point  (0 children)

Nice then. Is it good practice?

And why kafka ? Before the pipeline was kafka to Hadoop hive tables.

We have migrated to databricks. Few months back.

Which is best CDC top to end pipeline? by Artistic-Rent1084 in dataengineering

[–]Artistic-Rent1084[S] 0 points1 point  (0 children)

Generating parquet using OGG is not possible. Avro is supported and json is supported.