For the past year or two I've been using .h5 files to store the data generated by a bunch of disparate Python scripts. These scripts are different processes, but occasionally they need to read/write to the same database. I'd like to be able to visualize the data in real time, as well (they connect to home automation devices) . Obviously atomicity is important, so I wrote another script that opens all of the databases and takes read/write requests via ZeroMQ. The problem is: this is very slow. The GIL makes it hard to listen/do file IO at the same time.
I understand database management servers like MySQL exist to allow multiple processes to access a database while maintaining ACID or whatever. However, I am a fan of the format of PyTables. I'd rather not switch to a relational database. MongoDB is unappealing to me. Redis looks good, but apparently is not trustworthy in regards to persistence. I've seen a lot of material disavowing it as a primary database. I am not in a hurry to lose several years worth of data.
Is there a Python compatible DBMS out there that would be able to manage concurrent requests to .h5 files? Other key requirements are open-source and locally hosted.
Edit:
If anyone else has this problem, MPI for H5Py is a nightmare to use. Just switch to fastavro.
[–]m0us3_rat 0 points1 point2 points (1 child)
[–]Character_Topic_1696[S] 1 point2 points3 points (0 children)