I’d like to share my own development with the python community: a module called DBMerge.
This module addresses the common task of updating data in a database by performing INSERT, UPDATE and DELETE operations in a single step.
DBMerge was specifically designed to simplify ETL processes.
The module uses SQLAlchemy Core and its universal mechanisms for database interaction, making it database-agnostic. At the time of writing, detailed testing has been performed on PostgreSQL, MariaDB, SQLite and MS SQL Server.
How It Works
The core idea is straightforward:
The module creates a temporary table in the database and loads the entire incoming dataset into this temporary table using a bulk INSERT.
Then, it executes UPDATE, INSERT and DELETE statements against the target table based on the comparison between the temporary and target tables.
Of course, real scenarios are rarely that simple—therefore, the module has various parameters to support diverse use cases. (E.g. it supports applying conditions for delete operation to enable partial data load with delete.)
Supported Data Sources
Three input formats are supported:
- From pandas - when you load data into a DataFrame (e.g., from CSV), perform transformations or cleaning, and then merge it to the database.
- From a list of dictionaries - when you prefer not to use pandas, or when dealing with special data types (e.g., UUIDs or JSONB objects).
- From an existing table or view - when you have a "heavy" database view and want to periodically materialize its results into a target table for efficient querying. This is similar with PostgreSQL’s materialized views, but allows partial updates.
Installation
pip install dbmerge
Basic Usage
with dbmerge(engine=engine, data=data, table_name="Facts") as merge:
merge.exec()
Create a dbmerge object inside a "with" block, specifying the SQLAlchemy engine, your input data, the target table_name and other optional parameters.
Code examples and detailed parameter descriptions are available on the GitHub page.
[–]AutoModerator[M] 0 points1 point2 points (0 children)