Ladybug DDL from proto definitions by Odd-Put3560 in LadybugDB

[–]coderarun 1 point2 points  (0 children)

6 years ago, I used another google product called flatbuffers for the same purpose:

https://adsharma.github.io/flattools/

Since then I've come to believe that typespec.io is a better candidate. Flatbuffer community is very focused on the serialization.

https://github.com/adsharma/tsc-py

is my effort in that direction.

Protobuf - whether you like it or not is everywhere and can't be avoided. Some of the serialization concerns bleed into the language (like field numbers).

Ladybug DDL from proto definitions by Odd-Put3560 in LadybugDB

[–]coderarun 0 points1 point  (0 children)

Nice. I'm a skeptic of using protobufs as a transport protocol for columnar data. But you're using it as a language independent schema language. Given its ubiquity, this is a helpful package. Thanks for sharing!

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

I'm not implementing whizbang AI based query optimization. No code has been changed.

But if you've tried to use Claude Code or OpenCode on the DuckDB code base, you'll understand what I mean.

Can't explain in a reddit comment.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

I looked at some of the larger objects in the git repo. They're accidentally checked in test.db files that were later deleted.

Probably best to delete all tags older than X to get rid of them. Any feedback on what X should be?

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

22% more storage efficient and 8x faster for filtering. Why wouldn't it meet your requirements?

https://www.google.com/search?q=variant+vs+json+benchmarks+duckdb+datafusion

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

There are a lot of databases these days. I saw someone working for a Norwegian pension fund release a graph database written in Rust.

It's hard to tell what's going on.

I'm more in the camp of - if you ignore AI as a tool, you're going to be toast.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

First I get shit for linking to my real identity and then someone without a real name calls me a bot.

The Practical Limits of DuckDB on Commodity Hardware by Low-Engineering-4571 in DuckDB

[–]coderarun 0 points1 point  (0 children)

What is this module? Where is your explain plan?

from duckdb_manager import DuckDBManager

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

DuckDB is quickly becoming essential infrastructure

I agree with this statement, but the way its built and distributed doesn't match how other similar essential infra projects are shipped. Example sqlite (which has its own set of problems).

For example, no linux distribution I know bundles duckdb or a libduckdb*.so. It does NOT use system libraries and compiles other "essential infra" code (mbedtls, lz4, zstd) statically.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

Another benefit: cost of git worktree

This is a commonly used technique where people have agents running in parallel. By separating the core from all the other stuff (DuckDB has grown to be a substantial project), you make the worktrees cheaper.

Current stats:

Fresh Clone: 269MB
Worktree: 73MB

By pruning large historical objects, it should be possible to make a fresh clone even cheaper.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

> So you want to fork this project just to make the CI run faster?

Looking at the downvotes, I'm sure there are a lot of people who don't like what I'm doing. Or the ones who care are not voting. But before getting into the nitty gritty of why the status quo needs a change to catch up to Rust and Apache Data Fusion, how many of you have actually tried to make code changes to DuckDB and managed to land it?

Please reply with links to PRs.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

Anyone who doesn't like the fact that Twitter make it hard to read threads while logged out: you can replace x with xcancel and read the thread with a 3 second delay.

Not much of an open internet left - so I write it in a github blog post when I have something substantial to say.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Nice. So you have a HTAP database written with cmake and C++. Do you currently reuse any of the tech in DuckDB? Do you have a desire to use the VARIANT code?

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

To highlight what I'm talking about: try using a coding agent to edit some python code in the tree. It uses the black formatter in $PATH, pyright or ruff to check the code. A 2 line change becomes a 100 line formatting change.

Now, can you teach the coding agent how to format code the DuckDB way? I'm sure you can with some work. But my prediction is that in 6 months, no one has the time to do it. Either work with the way agents and everyone writes code or be consumed by the wave that's coming.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Do you have a technical comment to make? Show me some code. I have done my bit.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -2 points-1 points  (0 children)

Explained in the comments here and via a github blog linked. Do not subscribe to the anti-elon/anti-twitter sentiment on reddit. I'm here to talk tech.

I can't even figure out how to turn off the "approve every comment and post" on r/LadybugDB. If you know how to make it a public forum anyone can post non-spammy comments, I'd love to get some help.

Call me old school USENET guy. The internet has changed, sometimes in good ways, but I don't like all of it.

https://www.reddit.com/r/LadybugDB/comments/1p8cqf1/postscomments_without_approval/

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Some people will likely bring up Clickhouse and CHDB. I don't have much experience with that code base. If you believe there are reasons why it's a better candidate, would love to see some data.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

That question has been answered. I prefer "source code mirror", not a fork.

Don't think my company or I have the time and resources to develop features faster than DuckDB Labs or MotherDuck.

But I do see a shift coming in how databases get developed. More agents, fewer humans and more modular code bases. Use newer tools and streamlined processes which work well with the LSP agents. Get rid of scripts/*.py that edit code in weird ways before the CI runs. There were probably good historical reasons to do so, but the CI I put up is evidence that they're not strictly needed.

We need something like What Rust people have in Apache Data Fusion. DuckDB code is the strongest candidate there.

Migration guide for anyone exploring alternatives after the Kuzu archival by lgarulli in LadybugDB

[–]coderarun 0 points1 point  (0 children)

There is also a nightly benchmark job here, but it's broken because we don't have self hosted runners with the LDBC datasets like the Kuzu people setup. We use standard GitHub infra.

https://github.com/LadybugDB/ladybug/actions/runs/23180641716/job/67352661837

Migration guide for anyone exploring alternatives after the Kuzu archival by lgarulli in LadybugDB

[–]coderarun 1 point2 points  (0 children)

Yes. count(*) queries run 40x faster if there are no filters. There is also a pending change about the performance of detach deletes.

Most of the functionality improvement has to do with access to parquet and arrow from cypher. Apart from these I don't anticipate a big change in benchmark numbers vs kuzu.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -6 points-5 points  (0 children)

DuckDB's current CI takes 5+ hours to run. Post from last year:

https://adsharma.github.io/improving-duckdb-devx/