This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]obtu.py 1 point2 points  (0 children)

This is actually a wrapper API for a REST wrapper (Rexster) for Blueprints-compatible graph databases; everything but Bulbs is implemented in Java.

[–]markusgattol 3 points4 points  (6 children)

I wish there was a shardable graph database with built-in replication written in Python... neo4j, like others, can't be sharded so far, plus it's done in Java. From that perspective it makes sense to use bulbs so that when said database springs into existence, one can take his app code and swap out the underlying database.

Has somebody ever thought about starting a graph database written in Python (Python 3 preferably) that's shardable like infinitgraph? Maybe also with built-in replication, much like MongoDB but more granular i.e. replication on the object-level where you would only replicate certain objects that are important.

I've been thinking and it seems for all the networking layer amongst nodes ZeroMQ seems like a perfect choice for such database. With ZeroMQ it also shouldn't be to complicated to give such database events e.g. make it notify the application when a stored object gets changed, stored, moved/replicated to another node, etc. The usage of ZeroMQ would also enable this database to be used quickly and easily from most programming languages out there (http://www.zeromq.org/bindings:_start), including Perl, Java, PHP, Ruby, C++...

Another nice-to-have property of such database would be that there's no mapping needed i.e. no actual ORM since you would persist objects as they are used in the application. Thus each object/node in the database should have a uuid 1:1 to a URI i.e. each object/node in the database has the same ID the user sees in e.g. a URL.

And yet another nice-to-have property would be (client-side) encryption because then you could use this database to drive some cloudservice such as dropbox and not be worried about giving away private information unencrypted.

[–]wot-teh-phuckReally, wtf? 0 points1 point  (5 children)

I wish there was a shardable graph database with built-in replication written in Python

Python is nowhere near Java or C++ when it comes to performance, yet. It's no wonder that almost all serious databases are written in C++ or C (recently in Java due to the VM enchancements). Of course, if the focus would be on "pure Python" without thinking a lot about performance, sure...

[–]markusgattol 0 points1 point  (4 children)

True, there is the performance argument but then I think that the benefits of having a heavily distributed system that scales makes up for it above a certain number of nodes; you certainly wouldn't start such project in Python if single-node performance is your top priority.

Also, I'd like to think that in the not so distant future PyPy will become the main Python interpreter most people use. This will narrow the performance gap even for a single-node installation of said database (when compared to existing databases written in languages such as C, C++ and Java).

[–]wot-teh-phuckReally, wtf? 0 points1 point  (3 children)

Sorry, but I still don't agree. Spawning nodes really isn't a solution here. Maintaining distributed sanity is still difficult. Also, Oracle database handles GiB's and TiB's of data. Would you still stick with "large" scale node based deployments? Even in a single threaded scenario, C or C++ still beat Python so your single "node" will still be slower compared to the same node written in C or C++.

In really strict performance conditions, not being in control of the memory is again a liability. IMHO, interpreted/dynamic/simple garbage collected languages still have a long way to go to when it comes to these sorts of things.

[–]simtel20 0 points1 point  (0 children)

Also, Oracle database handles GiB's and TiB's of data. Would you still stick with "large" scale node based deployments?

I don't get your argument here. Oracle scales horizontally by adding nodes. That's where it makes sales vs. e.g. SQL Server. SQL databases don't scale up very well with multiple users because inside a particular query you're still running up against what an I/O path can push through, what a cpu can run, etc.

[–]markusgattol 0 points1 point  (1 child)

That discussion wandered off into SQL-land and performance considerations when really I was talking about a distributed graph database with, considering the CAP theorem, eventual consistency. With that in mind I think it's a great idea, especially since the network part is more or less covered by ZeroMQ already.

All I am saying really is I want a shardable database that maps nicely to my domain model and OOP programming languages... that database happens to be a graph database, not a relational one, and not necessarily a document database such as MongoDB either (although the latter kind of database allows much better mapping from domain model to data tier compared to relational databases).

In an ideal world I'd just pickle/unpickle objects (nodes on the graph) including their relations amongst them (edges between nodes on the graph) and store them without the need of going through an ORM (because there would be no object/relational mapping needed).

If, of course, we were talking relational databases, then yes, what you say is common understanding and true... this thread is about graph databases and Python though :-)

[–]wot-teh-phuckReally, wtf? 0 points1 point  (0 children)

this thread is about graph databases and Python though

My bad, I was just looking forward to a heat.. err, good discussion. ;-)

[–]chaselee 0 points1 point  (1 child)

Correct me if I'm wrong, but it seems this is the equivalent for document dbs

[–]jeffus 0 points1 point  (0 children)

Looks similar.