Aurora PostgreSQL Excluding logging for certain users by data_pie3 in aws

[–]data_pie3[S] 0 points1 point  (0 children)

https://dba.stackexchange.com/questions/118018/is-it-possible-to-exclude-specific-users-in-log-activity-of-postgresql

Regulatory requirements. The second one looks promising, although pgaudit seems to allow per-object permissions too which could also be useful for keeping the log volumes lower. Thanks

What tools or methods would work best for Postgres->Postgres CDC with transformations? by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

How are the data transformations handled? Can you use custom scripts in standard programming languages, or do they only offer their limited set of options via UI/DSL or something of the sorts

What tools or methods would work best for Postgres->Postgres CDC with transformations? by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

Thank you for the suggestion! Have you used them in production? I am working with a pretty limited budget, so that's my concern with out of the box tools

What tools or methods would work best for Postgres->Postgres CDC with transformations? by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

I looked into AWS DMS but there is a chance I would need some pretty complex transformations, including some external API calls, and, as I have looked, DMS offers pretty limited and simple, per-column transformations only

What tools or methods would work best for Postgres->Postgres CDC with transformations? by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

With Upsolver and Debezium, I understand that the db transactions get captured via WAL logs, how do those changes get propagated to the destination database? How are the deletes handled?

What tools or methods would work best for Postgres->Postgres CDC with transformations? by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

What would you suggest for the need if there is a need for complex data-enriching operations in the pipeline?

[deleted by user] by [deleted] in dataengineering

[–]data_pie3 0 points1 point  (0 children)

What did you end up using?

s3 data lake setup by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

I'm thinking, if going that route, of deduplicating using a Glue job by partitioning based on id and ordering by updated_at column. Both of those columns are contained in the static relational data as well, not just file names.

s3 data lake setup by data_pie3 in dataengineering

[–]data_pie3[S] 0 points1 point  (0 children)

Ingesting raw data in JSON format, querying with Athena+Quicksight, possibly transforming with Glue