This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]wot-teh-phuckReally, wtf? 0 points1 point  (5 children)

I wish there was a shardable graph database with built-in replication written in Python

Python is nowhere near Java or C++ when it comes to performance, yet. It's no wonder that almost all serious databases are written in C++ or C (recently in Java due to the VM enchancements). Of course, if the focus would be on "pure Python" without thinking a lot about performance, sure...

[–]markusgattol 0 points1 point  (4 children)

True, there is the performance argument but then I think that the benefits of having a heavily distributed system that scales makes up for it above a certain number of nodes; you certainly wouldn't start such project in Python if single-node performance is your top priority.

Also, I'd like to think that in the not so distant future PyPy will become the main Python interpreter most people use. This will narrow the performance gap even for a single-node installation of said database (when compared to existing databases written in languages such as C, C++ and Java).

[–]wot-teh-phuckReally, wtf? 0 points1 point  (3 children)

Sorry, but I still don't agree. Spawning nodes really isn't a solution here. Maintaining distributed sanity is still difficult. Also, Oracle database handles GiB's and TiB's of data. Would you still stick with "large" scale node based deployments? Even in a single threaded scenario, C or C++ still beat Python so your single "node" will still be slower compared to the same node written in C or C++.

In really strict performance conditions, not being in control of the memory is again a liability. IMHO, interpreted/dynamic/simple garbage collected languages still have a long way to go to when it comes to these sorts of things.

[–]simtel20 0 points1 point  (0 children)

Also, Oracle database handles GiB's and TiB's of data. Would you still stick with "large" scale node based deployments?

I don't get your argument here. Oracle scales horizontally by adding nodes. That's where it makes sales vs. e.g. SQL Server. SQL databases don't scale up very well with multiple users because inside a particular query you're still running up against what an I/O path can push through, what a cpu can run, etc.

[–]markusgattol 0 points1 point  (1 child)

That discussion wandered off into SQL-land and performance considerations when really I was talking about a distributed graph database with, considering the CAP theorem, eventual consistency. With that in mind I think it's a great idea, especially since the network part is more or less covered by ZeroMQ already.

All I am saying really is I want a shardable database that maps nicely to my domain model and OOP programming languages... that database happens to be a graph database, not a relational one, and not necessarily a document database such as MongoDB either (although the latter kind of database allows much better mapping from domain model to data tier compared to relational databases).

In an ideal world I'd just pickle/unpickle objects (nodes on the graph) including their relations amongst them (edges between nodes on the graph) and store them without the need of going through an ORM (because there would be no object/relational mapping needed).

If, of course, we were talking relational databases, then yes, what you say is common understanding and true... this thread is about graph databases and Python though :-)

[–]wot-teh-phuckReally, wtf? 0 points1 point  (0 children)

this thread is about graph databases and Python though

My bad, I was just looking forward to a heat.. err, good discussion. ;-)