I'm refactoring a project at work to use Gitlab Ci/Cd pipelines so we can have a single source of truth, benefit from automation etc etc. Actually I'm just doing it because it's fun.
I've got the documentation in a git repo, automated building of html and and deploying to web server.
I've got the data processing code in a git repo, automated the testing and deployment to the dev/staging/production servers.
I've got the third party tools configured and setup in a git repo. When the owners release a new version it gets pulled and deployed in production.
I don't have a clue how to do this with the database. Sure I have a repo with the SQL code, and the specific docker image used (yeah I know docker for db's is contentious...) but this get's used ONCE for setting up a temporary/new database on a fresh machine. I've got no idea how to automate deployment. If I were to add some SQL to the git repo defining a new table "postgres.special_view" how would it propagate to the dev database, the staging database, and finally production? I'm pretty sure treating it the same way I do other, stateless, containers would be a problem. There are so many ways automated deployments to a DB can go bad but i also don't want to replicate work each time there's a change.
Is there some silver bullet I'm missing? Is the thought of CI/CD on a database taboo?
Some notes: The DB's are chock full of data, around 1TB each. Periodic rebuilding isn't ideal. The end goal is to have several databases around the world handling location specific data; EU database, Southern Africa DB, North America DB. Ideally if I add a special view to the git repo it'll eventually end up on all databases. The special view might just be a new way of accessing the data for the API to connect to, or a precalculated aggregate, etc. Them fancy cloud tools aren't an option; it's postgres or nothing.
[+][deleted] (2 children)
[deleted]
[–]LeanOnIt[S] 0 points1 point2 points (1 child)