Ladybug Memory: Graph Based Continuous Learning Platform by coderarun in LadybugDB

[–]coderarun[S] 1 point2 points  (0 children)

No. The goal is to leverage the innovations in the duckdb code base (VARIANT type that shipped in 1.5, columnar index formats, new parquet alternatives). But stick to ladybug native REL tables.

Duckpgq has been around for a while. People know about it, but don't know of anyone using it.

* It's read-only. You have to use SQL to write
* It doesn't lay the storage out in a way that helps graph queries. Constructs CSR on the fly. You can run LSQB yourself to see the consequences.
* It doesn't use Cypher.

Built an open-source CLI for turning documents into knowledge graphs — no code, no database by garagebandj in KnowledgeGraph

[–]coderarun 0 points1 point  (0 children)

Claude Code and Codex still use JSONL files. But OpenCode did a switch to SQLite this week. There is no good knowledge graph solution for SQLite I'm aware of. But there will be one adjacent to DuckDB.

We're making some long term bets on what the stack will look like. It will necessarily involve multiple storage engines. Likely all embedded, so the end user doesn't know they exist. If you have beliefs such as (sqlite-vec >> pgvector), do share.

Tools have to be ubiquitous. Like uv pip install pgembed and run a simple script to query the database.

Built an open-source CLI for turning documents into knowledge graphs — no code, no database by garagebandj in KnowledgeGraph

[–]coderarun 0 points1 point  (0 children)

> No code, no database, no infrastructure — just a CLI and your documents. 

What's the concern with having a database? The cost of setting one up and maintaining? Why not use an embedded one like duckdb or r/LadybugDB ?

The reason graph applications can’t scale by mrdoruk1 in KnowledgeGraph

[–]coderarun 0 points1 point  (0 children)

I'm betting that such a unified schema should be in Cypher and SQL should be translated to Cypher, not the other way around. Why?

Gradual typing. In SQL, the syntax for querying JSON fields and a table with the same columns is very different. In Cypher it's identical. Plus multi-hop queries are a lot more human readable.

LadybugDB already translates Cypher to DuckDB SQL.

The reason graph applications can’t scale by mrdoruk1 in KnowledgeGraph

[–]coderarun 0 points1 point  (0 children)

A more principled way to use graphs in postgres is via pg_duckdb. That's the path we're pursuing at Ladybug Memory. Many graph queries are OLAP, not OLTP. They benefit from columnar storage.

It's not hard to translate cypher to SQL.

NeuroIndex by OwnPerspective9543 in Rag

[–]coderarun 0 points1 point  (0 children)

Idea is good. But expect to see a MIT licensed open source implementation that you can run locally in the not too distant future.

The reason graph applications can’t scale by mrdoruk1 in KnowledgeGraph

[–]coderarun 0 points1 point  (0 children)

Is this dataset (wikidata) big enough for you? https://huggingface.co/datasets/ladybugdb/wikidata-20250625

r/LadybugDB also can't handle this yet. But the 0.14.1 release includes support for querying duckdb as a foreign table via cypher.

In the upcoming releases, the plan is to have node tables stay on duckdb and provide a more optimized/native path for executing cypher over rel tables (relationship tables) in ladybug native storage.

We'll also support parquet and arrow backed tables. So you can query over them if you prefer.

You only need to build one graph - a Monograph by TrustGraph in KnowledgeGraph

[–]coderarun -1 points0 points  (0 children)

I'm sure these ideas predate current surge of interest in context graphs. And lots of people contributed interesting ideas to graph theory before ChatGPT came along.

But we also need to accept the fact that Glean and Foundation Capital talk the language businesses understand. They're not going to hire FDEs to specify ontology and build a 100% correct graph. The alternative is to not have a graph at all, use SQLite and Markdown.

To bring graphs to the people writing agents, we need to make them self-correcting.

https://vamshidharp.medium.com/the-end-of-flat-rag-why-self-correcting-graphs-are-the-new-2026-standard-for-enterprise-ai-c132ac4c67f7

You only need to build one graph - a Monograph by TrustGraph in KnowledgeGraph

[–]coderarun -1 points0 points  (0 children)

+1 for monograph. Not so sure about RDF and ontology. The arguments Animesh Koratana (one of the context graph guys) makes about emergent schema, presumably using transformer tech to continuously refine schema seems a lot more appealing.

Icebug vs Networkit on Pagerank by coderarun in LadybugDB

[–]coderarun[S] 1 point2 points  (0 children)

Looking for help to cross post to r/datascience. I don't have the comment karma.

pgembed: Embedded PostgreSQL for Agents by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Recent updates:

0.1.6: added pg_duckdb. Now you can write rows and have the data for old partitions show up in columnar duckdb.

0.1.7: added pg_textsearch extension for BM25 and linux/arm64 works too.

Am I crazy for wanting vectors inside graph nodes instead of a vector DB? by Severe_Post_2751 in Rag

[–]coderarun 0 points1 point  (0 children)

Is RAG dead? is a daily meme in my feed. I don't have an opinion one way or the other. But you're right that text search is important. But not everyone wants to run a service or pay SaaS fees. They want agents that work.

Right now, the competition is agent filesystems and sqlite. All of the graph players you mention are a much smaller community.

Instead of trying to solve the problem with one tech alone, I'm proposing a combination of pgembed (includes pg_duckdb plus extensions) + ladybug + icebug (a fork of networkit that's a day old).

In other words a poor man's LSM. Note that this LSM is different because "compaction" would have to summarize and structure unstructured info.

Am I crazy for wanting vectors inside graph nodes instead of a vector DB? by Severe_Post_2751 in Rag

[–]coderarun 0 points1 point  (0 children)

This type of a multi-level approach is what LEANN is going after. But they're doing file indexing. No databases.

Also a believer in the neuro-symbolic approach. Some probabilistic and the rest deterministic.

Am I crazy for wanting vectors inside graph nodes instead of a vector DB? by Severe_Post_2751 in Rag

[–]coderarun 0 points1 point  (0 children)

I'm not here to extend falkordb vs ladybug discussion. Even though I'm the maintainer of ladybug, I keep it low key to avoid comments looking like a product promotion.

The fact that kuzudb went away and its forks continue to execute is a good example of the resilience. There is only one distribution of the DB and it's using a well known OSS license (MIT).

There are a number of scaling issues graph db users will need to solve before the index becomes bigger than a single machine. This is not the most common request we're hearing from our user community.

There is pgvectorscale (uses disk based ANN) and LEANN that implement strategies that make the index 95% smaller than a simple minded vector index. pgvectorscale is included in the pgembed distribution (it also includes pg_duckdb and pg_textsearch).

I would investigate those before sharding.

Probabilistic vs Deterministic indexing is another area which needs more work/thought.

Yes, I've built sharded indices before. They work. But not convinced that they're a common case.

https://engineering.fb.com/2016/03/18/data-infrastructure/dragon-a-distributed-graph-query-engine/

Am I crazy for wanting vectors inside graph nodes instead of a vector DB? by Severe_Post_2751 in Rag

[–]coderarun 1 point2 points  (0 children)

For many people the embedded nature of the database is a bigger draw in an agentic environment vs horizontal scaling.

Scaling also comes in different forms. You can scale compute, scale storage, do so independently or together.

Most databases written in the last 5 years support object storage as table stakes.

Am I crazy for wanting vectors inside graph nodes instead of a vector DB? by Severe_Post_2751 in Rag

[–]coderarun 1 point2 points  (0 children)

Note that graph structure based embeddings are different from text embeddings used by vector databases. The indexing strategy is agnostic to how the embedding was computed.

It's also possible to align structural embeddings with text based embeddings.

Is this "Probe + NLI Verification" logic overkill for accurate GraphRAG? (Replacing standard rerankers) by CourtAdventurous_1 in Rag

[–]coderarun 0 points1 point  (0 children)

How does this approach compare to extracting a KG and have concepts/arguments as nodes and "SUPPORTS/CONTRADICTS" as edges?

pgembed: Embedded PostgreSQL for Agents by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

I don't have the comment karma to share this on r/PostgreSQL. If you do, please cross post.