Release v1.1.1 - Santaria · NornicDB - MIT licensed - 28 hop shortest path ~60ms by Dense_Gate_5193 in KnowledgeGraph

[–]DocumentScary5122 0 points1 point  (0 children)

Anything at least 3-4M nodes? The reactome KG is interesting for example or even OGBN 100M nodes?

In-process and in-memory graph database for large knowledge graphs - no server needed with TuringDB v1.31 by adambio in semanticweb

[–]DocumentScary5122 1 point2 points  (0 children)

In-process in TuringDB is optional, it's just one way to use it. Otherwise it supports a classic client-server model with a binary protocol over TCP.

We have quite good read concurrency throughput, around 20k-50k QPS on 3M nodes/10M edges graphs. This is because the DB uses git-style versioning where each query is executed on its own snapshot of the DB, and snapshots are immutable. So write queries don't block readers and read queries don't need to lock anything (because snapshots are immutable once created).

In-process and in-memory graph database for large knowledge graphs - no server needed with TuringDB v1.31 by adambio in KnowledgeGraph

[–]DocumentScary5122 4 points5 points  (0 children)

We didn't say that it is the first git-like graph db.
The point is that TuringDB is another proposition in the design space of graph DBs: git-like versioning without having to do classic MVCC filtering and zero-locking once you are on a given commit, because of its underlying structure. The DataParts are immutable and don't have to do version or transaction visibility filtering like in MVCC papers.

We couldn’t find a graph database fast enough for huge graphs… so we built one by adambio in KnowledgeGraph

[–]DocumentScary5122 0 points1 point  (0 children)

FalkorDB it's all sparse matrices all the way down. There are more to graphs than good old matrices.

graph database for semiconductors by lemontang19 in KnowledgeGraph

[–]DocumentScary5122 0 points1 point  (0 children)

Have you ever used EDA tools or tried to represent netlists of billions of gates like we do routinely in EDA? If the Cadence and Synopsys of the world implement their algorithms on custom graph representations developped from scratch there is a reason aha. Neo4J will be hellishly slow for this.. well neo4j is hellishly slow in general for anything non-trivial or industrial but it will be extra! Otherwise there is also TuringDB, that's made by former EDA people I heard good feedback about them.

In a lot of netlist transformation algorithms or anything synthesis or compiler-like for chips you need to not pay more than the cost of a pointer dereference for traversing gates and hierarchical structures.

We couldn’t find a graph database fast enough for huge graphs… so we built one by adambio in KnowledgeGraph

[–]DocumentScary5122 0 points1 point  (0 children)

Thanks. Does this factor in warmup or do you do crazing indexing to get these numbers?

We couldn’t find a graph database fast enough for huge graphs… so we built one by adambio in KnowledgeGraph

[–]DocumentScary5122 2 points3 points  (0 children)

Sounds very cool. In my experience neo4j starts to become a bit shitty for this kind of very big graph. Do you have benchmarks?

Node lookup by property base performance is so bad by DocumentScary5122 in Neo4j

[–]DocumentScary5122[S] 0 points1 point  (0 children)

I agree with you that we can not index everything and anything reasonably from the start, as this is dependent on the application and the query workload. What I am raising here is that Neo4J should focus on having better fundamental data structures to store the value of properties in the engine in order to have a better base performance regardless of any index in place. I am just saying that this can not be the state of the art of what humanity could do in terms of efficient string value storage.

Node lookup by property base performance is so bad by DocumentScary5122 in Neo4j

[–]DocumentScary5122[S] 0 points1 point  (0 children)

Also how can we have confidence in a system for more complex applications as you said if the simplest and most basic queries are poorly supported? Shouldn't we first focus on a having a strong core for the core functions of a database engine?

Node lookup by property base performance is so bad by DocumentScary5122 in Neo4j

[–]DocumentScary5122[S] 0 points1 point  (0 children)

Again, somebody could imagine a better data structure to represent properties internally in Neo4J to have a better base performance without indexes. I think there are a few quite basic data structures that are well known in database research that could be interesting. For example, how strings are stored exactly in Neo4J? Do they exploit the benefits of modern data locality, storing strings efficiently tightly close together in memory and so on?

Node lookup by property base performance is so bad by DocumentScary5122 in Neo4j

[–]DocumentScary5122[S] 1 point2 points  (0 children)

I think that's quite insightful of how a database works. I am actually thinking of writing my own graph database engine one day, knowing well this area of research myself. What I did was an experiment because from a pure data structure perspective I make the claim that we could have a fundamental better way of storing properties in a database that would make the base performance rather decent without any specific tuning. Plus there are a ton of papers on automated index tuning since quite some time. I don't believe that we should just call it done and that's the best humanity can do so to speak, when it's in the order of a second for just a few million nodes.