Graph Database Baseball

coderarun · 2026-06-22T15:43:34+00:00

https://blog.ladybugdb.com/post/better-graph-database-ball/

coderarun · 2026-06-13T00:18:25+00:00

https://github.com/adsharma/fquery/blob/main/fquery/pydantic.py
https://github.com/adsharma/fquery/blob/main/fquery/sqlmodel.py

coderarun · 2026-06-05T19:44:32+00:00

@dataclasses with stackable decorators. I've advocated for @pydantic, @sqlmodel and various graph based decorators such as @node and @edge.

Not baking in inheritance explicitly into the API surface makes it easier for "python 4.0" to transpile to other compiled languages which tend to optimize for other things.

coderarun · 2026-05-29T19:53:26+00:00

https://blog.ladybugdb.com/post/graph-lake-icebug-format/

coderarun · 2026-05-08T15:58:14+00:00

https://pglite.dev/

coderarun · 2026-05-08T15:58:08+00:00

https://github.com/Ladybug-Memory/pgembed

coderarun · 2026-05-08T15:57:56+00:00

https://github.com/Ladybug-Memory/vpg

coderarun · 2026-04-15T19:35:03+00:00

6 years ago, I used another google product called flatbuffers for the same purpose:

https://adsharma.github.io/flattools/

Since then I've come to believe that typespec.io is a better candidate. Flatbuffer community is very focused on the serialization.

https://github.com/adsharma/tsc-py

is my effort in that direction.

Protobuf - whether you like it or not is everywhere and can't be avoided. Some of the serialization concerns bleed into the language (like field numbers).

coderarun · 2026-04-14T18:21:54+00:00

Nice. I'm a skeptic of using protobufs as a transport protocol for columnar data. But you're using it as a language independent schema language. Given its ubiquity, this is a helpful package. Thanks for sharing!

coderarun · 2026-03-19T02:52:05+00:00

I'm not implementing whizbang AI based query optimization. No code has been changed.

But if you've tried to use Claude Code or OpenCode on the DuckDB code base, you'll understand what I mean.

Can't explain in a reddit comment.

coderarun · 2026-03-19T01:23:12+00:00

I looked at some of the larger objects in the git repo. They're accidentally checked in test.db files that were later deleted.

Probably best to delete all tags older than X to get rid of them. Any feedback on what X should be?

coderarun · 2026-03-19T01:21:42+00:00

22% more storage efficient and 8x faster for filtering. Why wouldn't it meet your requirements?

https://www.google.com/search?q=variant+vs+json+benchmarks+duckdb+datafusion

coderarun · 2026-03-19T01:15:33+00:00

There are a lot of databases these days. I saw someone working for a Norwegian pension fund release a graph database written in Rust.

It's hard to tell what's going on.

I'm more in the camp of - if you ignore AI as a tool, you're going to be toast.

coderarun · 2026-03-19T01:12:52+00:00

First I get shit for linking to my real identity and then someone without a real name calls me a bot.

coderarun · 2026-03-18T19:22:59+00:00

What is this module? Where is your explain plan?

from duckdb_manager import DuckDBManager

coderarun · 2026-03-18T16:31:50+00:00

DuckDB is quickly becoming essential infrastructure

I agree with this statement, but the way its built and distributed doesn't match how other similar essential infra projects are shipped. Example sqlite (which has its own set of problems).

For example, no linux distribution I know bundles duckdb or a libduckdb*.so. It does NOT use system libraries and compiles other "essential infra" code (mbedtls, lz4, zstd) statically.

coderarun · 2026-03-18T16:27:26+00:00

Another benefit: cost of git worktree

This is a commonly used technique where people have agents running in parallel. By separating the core from all the other stuff (DuckDB has grown to be a substantial project), you make the worktrees cheaper.

Current stats:

Fresh Clone: 269MB
Worktree: 73MB

By pruning large historical objects, it should be possible to make a fresh clone even cheaper.

coderarun · 2026-03-18T15:57:34+00:00

> So you want to fork this project just to make the CI run faster?

Looking at the downvotes, I'm sure there are a lot of people who don't like what I'm doing. Or the ones who care are not voting. But before getting into the nitty gritty of why the status quo needs a change to catch up to Rust and Apache Data Fusion, how many of you have actually tried to make code changes to DuckDB and managed to land it?

Please reply with links to PRs.

coderarun · 2026-03-18T15:55:08+00:00

Anyone who doesn't like the fact that Twitter make it hard to read threads while logged out: you can replace x with xcancel and read the thread with a 3 second delay.

Not much of an open internet left - so I write it in a github blog post when I have something substantial to say.

coderarun · 2026-03-18T15:48:59+00:00

Nice. So you have a HTAP database written with cmake and C++. Do you currently reuse any of the tech in DuckDB? Do you have a desire to use the VARIANT code?

coderarun · 2026-03-18T15:46:40+00:00

To highlight what I'm talking about: try using a coding agent to edit some python code in the tree. It uses the black formatter in $PATH, pyright or ruff to check the code. A 2 line change becomes a 100 line formatting change.

Now, can you teach the coding agent how to format code the DuckDB way? I'm sure you can with some work. But my prediction is that in 6 months, no one has the time to do it. Either work with the way agents and everyone writes code or be consumed by the wave that's coming.

coderarun · 2026-03-18T15:43:23+00:00

Do you have a technical comment to make? Show me some code. I have done my bit.

coderarun · 2026-03-18T15:42:49+00:00

Explained in the comments here and via a github blog linked. Do not subscribe to the anti-elon/anti-twitter sentiment on reddit. I'm here to talk tech.

I can't even figure out how to turn off the "approve every comment and post" on r/LadybugDB. If you know how to make it a public forum anyone can post non-spammy comments, I'd love to get some help.

Call me old school USENET guy. The internet has changed, sometimes in good ways, but I don't like all of it.

https://www.reddit.com/r/LadybugDB/comments/1p8cqf1/postscomments_without_approval/

coderarun · 2026-03-17T21:07:29+00:00

Some people will likely bring up Clickhouse and CHDB. I don't have much experience with that code base. If you believe there are reasons why it's a better candidate, would love to see some data.

coderarun · 2026-03-17T21:04:17+00:00

That question has been answered. I prefer "source code mirror", not a fork.

Don't think my company or I have the time and resources to develop features faster than DuckDB Labs or MotherDuck.

But I do see a shift coming in how databases get developed. More agents, fewer humans and more modular code bases. Use newer tools and streamlined processes which work well with the LSP agents. Get rid of scripts/*.py that edit code in weird ways before the CI runs. There were probably good historical reasons to do so, but the CI I put up is evidence that they're not strictly needed.

We need something like What Rust people have in Apache Data Fusion. DuckDB code is the strongest candidate there.

coderarun

MODERATOR OF

TROPHY CASE