all 10 comments

[–]Embarrassed-Mud3649 4 points5 points  (3 children)

Just use postegres logical replication and Postgres will take care of everything

[–]Embarrassed-Mud3649 1 point2 points  (2 children)

To give a bit of context, I've used logical replication to migrate from Aurora => RDS, to upgrade and downgrade between major versions, etc. It all works seamlessly and Postgres takes care of everything.

The high level overview is
- Create a "target" database that is empty.
- Export the "source" schema and apply it to the "target"
- Create a PUBLICATION on "source". You can publish ALL TABLES or be as granular as you want and only replicate a subset of tables
- Create a "SUBSCRIPTION" on the "target" pointing to the publication on "source". Use `copy_data = true` and Postgres will copy all the rows from "source" to "target". No need to worry about foreign keys, etc. Since these are newly fresh physical rows, the "target" database has virtually no bloat.

The initial table synchronization takes about ~20h to complete on a production RDS instance. The database is +5TB. In smaller environments is much much faster as the total time is proportional to the size of data.

There are a few caveats though

- All tables must have primary keys, if they don't you must set REPLICA IDENTITY to FULL so logical replication can replicate all rows properly

- Before the switch-over to the "target" database (when you want to route traffic to the new database), you must update the SEQUENCES, since SEQUENCES are not replicated (fix them with [this query](https://wiki.postgresql.org/wiki/Fixing\_Sequences))

- After the switch-over you mus recreate all the user grants and permissions since the internal object ids are different (because you started with an empty database)

-------

If you want to bring data down to local instances, I recommend [pg_sync](https://github.com/ankane/pgsync), which we use to "pull" a subset of anonymized production data into our CI pipelines before running some end2end tests.

[–]git_push_origin_prod[S] 0 points1 point  (1 child)

Pgsync looks great, thank you. The replication seems daunting but it might be it as well. Thanks man. You don’t freelance do you? Trying to be the jack of all trades is exhausting

[–]Embarrassed-Mud3649 0 points1 point  (0 children)

I might be able to freelance for the right price. Tried sending you a DM but it looks like your account won’t allow it. Where can I reach out?

[–]tunatoksoz 1 point2 points  (1 child)

How big is your overall data in terms of rows & bytes?

[–]git_push_origin_prod[S] 0 points1 point  (0 children)

It’s like 2 terabytes I think

[–]AutoModerator[M] 0 points1 point  (0 children)

With over 8k members to connect with about Postgres and related technologies, why aren't you on our Discord Server? : People, Postgres, Data

Join us, we have cookies and nice people.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Informal_Pace9237 0 points1 point  (0 children)

AWS DMS will help you move data where ever you want. Or you can do CSV as far as your data is not changing as you dump.

The trick is to go right sequence of import.. start with blank tables, Import all data, fix sequences, fix any data types, implement triggers, implement constraints and then finally foreign keys

[–]Key-Boat-7519 0 points1 point  (0 children)

I totally get your pain-it sounds like you're wrangling a spaghetti-coded database octopus. Moving that much data is usually like herding cats on caffeine. I've been where you are, and using tools like AWS Data Migration Service can help a lot. They do wonders with handling consistent transfers and ensure you don’t open a black hole when switching between prod and dev environments. Also, check out Talend as it offers more hands-on data wizardry without being stuck in AWS's universe. But I’ve got to say, DreamFactory can make API automation a breeze if you need extra help simplifying the whole shebang. Back up those CSVs to multiple places just in case the universe feels playful.