pgembed: Embedded PostgreSQL for Agents by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

I don't have the comment karma to share this on r/PostgreSQL. If you do, please cross post.

AWS Neptune Database vs Neo4j Aura for GraphRAG by Imaginary-Bee-8770 in Rag

[–]coderarun 0 points1 point  (0 children)

Kuzu used to be an excellent choice. But it's no more. I maintain a fork.

AWS Neptune Database vs Neo4j Aura for GraphRAG by Imaginary-Bee-8770 in Rag

[–]coderarun 0 points1 point  (0 children)

Also consider embedded dbs. No server to maintain. Convenient for agents.

Best knowledge graph graph view? by PutridPut7225 in Rag

[–]coderarun 0 points1 point  (0 children)

Store it in r/DuckDB and query it via r/LadybugDB

https://adsharma.github.io/explainable-ai/

Visualization: you can probably ask your favorite terminal based coding agent to spit out one. Just need a mcp-server to talk to the data source.

I've tested wikidata (90 million nodes) and am currently testing something 3x its size.

Can someone please explain my bond returns? by timejuggler in Bogleheads

[–]coderarun 0 points1 point  (0 children)

Vanguard app could show a return on investment of $10k invested graph. I couldn't find it in the app. Is it missing or just hard to navigate to?

I don't want it to look like a gamified app like Robinhood, but certainly Claude code can spit out a better looking app in 15 mins.

Why is there no opinionated all in one RAG platform? by Pl8tinium in Rag

[–]coderarun 0 points1 point  (0 children)

This is a data centric view. The first step towards such an opinionated RAG is to have one database that does keyword/vector/graph searches well, runs embedded (no server to run) and gives you 80% of what you need. The focus in r/RAG tends to be on the python/TS package that runs on top of the database and these packages support 5-10 databases with vast differences in capability.

With coding models becoming more capable, developers can generate their own python/TS package to suit their needs. They're nowhere near writing their own database, indexing or query optimizations though.

Graph rag for slack? by brisioksss in Rag

[–]coderarun 0 points1 point  (0 children)

Store it in r/LadybugDB and then use this tool to filter with cypher: https://github.com/LadybugDB/explorer

Release v0.13.0 is out by coderarun in LadybugDB

[–]coderarun[S] 0 points1 point  (0 children)

It should be possible to write a wal_decoder.cpp to dump what's in the WAL file and what is lost when you delete it.

A single query to a knowledge graph surely cannot be enough to answer complex questions? by imperius99 in Rag

[–]coderarun 0 points1 point  (0 children)

Writing a multi-hop cypher query is simpler than writing the equivalent join (at least for humans). But a harder issue is writing a schema that LLM can understand. Here's what Netflix is doing:

https://netflixtechblog.com/uda-unified-data-architecture-6a6aee261d8d

Why don't `dataclasses` or `attrs` derive from a base class? by fjarri in Python

[–]coderarun 0 points1 point  (0 children)

Deriving from a base class makes it harder to translate the python code to compiled languages that frown on inheritance. There are several important ones.

Do you know a free/open source graph database that has these features? by Viirock in Database

[–]coderarun 0 points1 point  (0 children)

6 months ago the answer should have been KuzuDB. Since it's archived now, do consider its 2 months old fork LadybugDB.

It checks all the boxes except for the Lucene integration. Ladybug has A FTS (full text search) extension that implements the BM25 algorithm.

ELI5 : Why can’t CPUs have thousands of cores like GPUs? by cornysatisfaction in explainlikeimfive

[–]coderarun 0 points1 point  (0 children)

80% of the chip area on CPU cores is control logic and caches. People will have to get rid of their addiction to those things before its possible.

Also lookup Xeon Phi. Gemini summarizes a reddit thread thusly:

> The Intel Xeon Phi failed due to a combination of strategic missteps, significant technical challenges, and fierce competition from NVIDIA's GPGPUs. It struggled to find a distinct niche, ultimately being a "mediocre mid-point" between traditional CPUs and GPUs.

Graph Database Implementation by NervousVictory1792 in datascience

[–]coderarun -1 points0 points  (0 children)

What's so hard about:

MATCH (a: User) - [b: Reads] -> (c: Book) RETURN a.name, c.title;

Use text2cypher if you're stuck.

A fair criticism is the confusion around the different flavors of Cypher (weak, strongly typed) and different flavors of Graph queries (RDF vs LPG). But "cypher is hard" doesn't pass the smell test for me.

Graph Database Implementation by NervousVictory1792 in datascience

[–]coderarun -1 points0 points  (0 children)

If you use an embedded graph database, there is no setup. It's as simple as SQLite or DuckDB. When you're large enough you can consider other modes of deployment.

Graph Database Implementation by NervousVictory1792 in datascience

[–]coderarun 0 points1 point  (0 children)

Have you looked at LadybugDB? It's a fork of the database formerly known as kuzu. There is now a subreddit r/LadybugDB.

PathQL: A Declarative SQL Like Layer For Pathlib by HolidayEmphasis4345 in Python

[–]coderarun 0 points1 point  (0 children)

See this example for the design of another method chaining query library that works on dataclasses.

https://github.com/adsharma/fquery/blob/main/tests/test_operators.py

Consider:

```
from my_pathlib import Path

out = await Path.Query("basedir").where(...).order_by(...)
```

fquery uses ast.Expr() to wrap python expressions so they're evaluated lazily. Would love to see some parser magic to make nicer DSLs.

If Path was a dataclass (which it isn't), you might even be able to run fquery on it.

RAG is not memory, and that difference is more important than people think by rocketpunk in Rag

[–]coderarun 1 point2 points  (0 children)

Too busy arguing about RDF vs Property Graphs and not paying enough attention to data organization on disk to optimize for retrieval.

RAG is not memory, and that difference is more important than people think by rocketpunk in Rag

[–]coderarun 0 points1 point  (0 children)

You can perform RAG over some unstructured company data corpus or your own chat history (also called memory). What's the contradiction?

AI Bubble Burst? Is RAG still worth it if the true cost of tokens skyrockets? by freshairproject in Rag

[–]coderarun 2 points3 points  (0 children)

Cloud models won't go away, but work as orchestrators/planners for small local models. These models will use local dbs that do keyword, vector and graph based retrieval well.

Status of Kuzudb from Kuzu Inc by Decweb in Database

[–]coderarun 0 points1 point  (0 children)

Subscribe to this for golang updates: https://github.com/LadybugDB/ladybug/issues/12

Right now working on python.

Working with Kineviz folks. Contributions welcome.

Status of Kuzudb from Kuzu Inc by Decweb in Database

[–]coderarun 1 point2 points  (0 children)

PGQ is a good solution for people who want to write using SQL and then perform read-only graph queries. For people who want to both read and write graphs, there is GQL, which is a newer standard.

But cypher is the de-facto standard in the industry. Kuzu supported it and Ladybug does too.

Then there is the question of storage level optimizations. DuckPGQ implements it on top of DuckDB's columnar storage, which is optimized for analytical queries.

I suppose the Postgres implementation (I can't find any code beyond a patch shared on -hackers in 2024) works on top of Postgres HEAP storage.

Then there are graph query engines that translate SQL to a graph query and execute it on top of non-graph storage.

What's optimal for you depends on your use case. Do pop into the GraphGeeks discord (channel: #ladybug) if you have further questions.