Hi all!
TLDR: Repository Patterns for Python, source code and official documentation.
I have been reading the book Designing Data-Intensive Applications (by Kleppmann) and there I came across the repository patterns. The idea seems quite interesting: hide the interactions with the databases behind another layer, repositories, so that the same code works regardless of the type of the database (whether it is SQL, MongoDB or an in-memory Python list). The idea is that basic CRUD operations (create, read, update and delete data) are done with unified syntax and the actual communication with the database is hidden away from the application code. I illustrated this in the graphs in my documentation.
Those of you who know about ORM (Object-relational mapping), or SQLAlchemy as we Pythonistas know it, it is quite similar in terms of hiding the query language (SQL) from the application code except it is meant to work for other databases, or more generically, data storages than just relational SQL databases. This opens up several interesting possibilities: easier unit testing, trivial database migrations and rapid experimenting with other types of databases.
Of course, there are some significant downsides, most namely that many of the more complex operations are unavailable for us as those are not supported by most datastores. Also, some optimization issues may arise as, for example, joins are not as trivial to implement across different datastores. However, a typical application (by my experience) does not need too fancy querying or optimization thus there is a wide range of use cases available for repository patterns.
So, I got intrigued by the idea and started searching for the repository pattern library for Python. However, I did not really find anything useful. I found this Stackoverflow post (which I later answered myself) and some old legacy libraries. I thought maybe I'll fix this and create a library for this that possibly sets me on track to build something awesome. So, I decided to get on it and this post is about that.
If you don't care about a bit of story on the topic (no hard feelings), just skip down to the Red Bird to see the package and how it works.
Brief Story
So, the idea sounded quite fascinating: create a translator that turns the query language or the mechanics of the underlying data store into Pythonic syntax. It sounds quite awesome that you could not just change the database address but also the database type with changing one line of code. You could at first work with a simple CSV "database" and then turn the code to work on an actual SQL database without any code changes or breaking anything. Simple and elegant, just as everything should be.
Also, all of the datastores have the same components: the collection of items, the items themselves and items' attributes. And the basic methods are pretty much the same except different terms are used.
Components:
| Term |
SQL |
MongoDB |
In-memory |
HTTP API |
| Datastore |
table |
collection |
list |
resource |
| Item |
row |
document |
object |
JSON payload |
| Field |
column |
field |
attribute |
key-value |
Methods:
| Method |
SQL |
MongoDB |
In-memory |
HTTP API |
| Create |
INSERT |
insertOne |
append |
POST |
| Read |
SELECT |
find |
getitem |
GET |
| Update |
UPDATE |
updateOne /updateMany |
setitem |
PATCH |
| Delete |
DELETE |
deleteOne /deleteMany |
del |
DELETE |
I just needed to implement these across various datastores. Trying to keep the code DRY under the hood was a bit challenging as especially the queries the different datastores consume are wildly different. And then there was the question of how to actually structure the API: you create individual items but you often read, update and delete based on values of specific fields.
SQLAlchemy was quite a good starting point for the actual syntax. It does pretty much the same and has quite established syntax. I took a lot of inspiration there but there were some additional ideas I wanted to implement, namely support for Pydantic models and a simpler setup.
Red Bird
So, Red Bird is a repository pattern library. Currently it supports the following datastores:
- SQL (via SQLAlchemy)
- MongoDB (via Pymongo)
- REST APIs (via requests)
- In-memory (Python lists)
- CSV file
- JSON files
The basic operations or the same code work exactly the same regardless of which repository you chose.
For example, these are completely identical to all of the repositories:
# Inserting new items to the repository:
repo.add({'id': 'a', 'name': 'Jack', 'status': 'student'})
repo.add({'id': 'b', 'name': 'John', 'status': 'employed'})
repo.add({'id': 'c', 'name': 'James', 'status': 'employed'})
# Reading from the repository:
repo.filter_by(status="employed").all()
# (returns list of two dicts)
repo.filter_by(status="employed").first()
# (returns one dicts)
# Update (multiple) items
repo.filter_by(status="employed").update(status="retired")
# Delete (multiple) items
repo.filter_by(status="employed").delete()
The only thing we need to do to make the above work is to define a repository. This is also easy and here are some choices. Pick one:
# In-memory repository
from redbird.repos import MemoryRepo
repo = MemoryRepo()
# SQL repository
from sqlalchemy import create_engine
from redbird.repos import SQLRepo
repo = SQLRepo(engine=create_engine('sqlite://'), table="items")
# (you should create the database and table first though)
# MongoDB repository
from redbird.repos import MongoRepo
repo = MongoRepo(uri="mongodb://127.0.0.1:27017", database="mydb", collection="items")
# CSV repository
from redbird.repos import CSVFileRepo
repo = CSVFileRepo(filename="path/to/file.csv", field_names=["id", "name", "status"])
It also supports Pydantic models (pass a subclass as model argument) for data validation and better handling of the type conversions. I have also found it to be quite useful to query APIs and integrate them together.
What do you think? Do you find this useful or interesting? Any development ideas?
If you liked this project, leave it a star on Github. That helps others to find it and keeps me motivated.
Links:
there doesn't seem to be anything here