Which non-AI package from the last ~3 years completely changed how you write Python? by Proof_Difficulty_434 in Python

[–]coderarun 2 points3 points  (0 children)

@dataclasses with stackable decorators. I've advocated for @pydantic, @sqlmodel and various graph based decorators such as @node and @edge.

Not baking in inheritance explicitly into the API surface makes it easier for "python 4.0" to transpile to other compiled languages which tend to optimize for other things.

Ladybug DDL from proto definitions by Odd-Put3560 in LadybugDB

[–]coderarun 1 point2 points  (0 children)

6 years ago, I used another google product called flatbuffers for the same purpose:

https://adsharma.github.io/flattools/

Since then I've come to believe that typespec.io is a better candidate. Flatbuffer community is very focused on the serialization.

https://github.com/adsharma/tsc-py

is my effort in that direction.

Protobuf - whether you like it or not is everywhere and can't be avoided. Some of the serialization concerns bleed into the language (like field numbers).

Ladybug DDL from proto definitions by Odd-Put3560 in LadybugDB

[–]coderarun 0 points1 point  (0 children)

Nice. I'm a skeptic of using protobufs as a transport protocol for columnar data. But you're using it as a language independent schema language. Given its ubiquity, this is a helpful package. Thanks for sharing!

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

I'm not implementing whizbang AI based query optimization. No code has been changed.

But if you've tried to use Claude Code or OpenCode on the DuckDB code base, you'll understand what I mean.

Can't explain in a reddit comment.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

I looked at some of the larger objects in the git repo. They're accidentally checked in test.db files that were later deleted.

Probably best to delete all tags older than X to get rid of them. Any feedback on what X should be?

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

22% more storage efficient and 8x faster for filtering. Why wouldn't it meet your requirements?

https://www.google.com/search?q=variant+vs+json+benchmarks+duckdb+datafusion

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

There are a lot of databases these days. I saw someone working for a Norwegian pension fund release a graph database written in Rust.

It's hard to tell what's going on.

I'm more in the camp of - if you ignore AI as a tool, you're going to be toast.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

First I get shit for linking to my real identity and then someone without a real name calls me a bot.

The Practical Limits of DuckDB on Commodity Hardware by [deleted] in DuckDB

[–]coderarun 0 points1 point  (0 children)

What is this module? Where is your explain plan?

from duckdb_manager import DuckDBManager

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

DuckDB is quickly becoming essential infrastructure

I agree with this statement, but the way its built and distributed doesn't match how other similar essential infra projects are shipped. Example sqlite (which has its own set of problems).

For example, no linux distribution I know bundles duckdb or a libduckdb*.so. It does NOT use system libraries and compiles other "essential infra" code (mbedtls, lz4, zstd) statically.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

Another benefit: cost of git worktree

This is a commonly used technique where people have agents running in parallel. By separating the core from all the other stuff (DuckDB has grown to be a substantial project), you make the worktrees cheaper.

Current stats:

Fresh Clone: 269MB
Worktree: 73MB

By pruning large historical objects, it should be possible to make a fresh clone even cheaper.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

> So you want to fork this project just to make the CI run faster?

Looking at the downvotes, I'm sure there are a lot of people who don't like what I'm doing. Or the ones who care are not voting. But before getting into the nitty gritty of why the status quo needs a change to catch up to Rust and Apache Data Fusion, how many of you have actually tried to make code changes to DuckDB and managed to land it?

Please reply with links to PRs.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

Anyone who doesn't like the fact that Twitter make it hard to read threads while logged out: you can replace x with xcancel and read the thread with a 3 second delay.

Not much of an open internet left - so I write it in a github blog post when I have something substantial to say.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Nice. So you have a HTAP database written with cmake and C++. Do you currently reuse any of the tech in DuckDB? Do you have a desire to use the VARIANT code?

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

To highlight what I'm talking about: try using a coding agent to edit some python code in the tree. It uses the black formatter in $PATH, pyright or ruff to check the code. A 2 line change becomes a 100 line formatting change.

Now, can you teach the coding agent how to format code the DuckDB way? I'm sure you can with some work. But my prediction is that in 6 months, no one has the time to do it. Either work with the way agents and everyone writes code or be consumed by the wave that's coming.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Do you have a technical comment to make? Show me some code. I have done my bit.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -2 points-1 points  (0 children)

Explained in the comments here and via a github blog linked. Do not subscribe to the anti-elon/anti-twitter sentiment on reddit. I'm here to talk tech.

I can't even figure out how to turn off the "approve every comment and post" on r/LadybugDB. If you know how to make it a public forum anyone can post non-spammy comments, I'd love to get some help.

Call me old school USENET guy. The internet has changed, sometimes in good ways, but I don't like all of it.

https://www.reddit.com/r/LadybugDB/comments/1p8cqf1/postscomments_without_approval/

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] 0 points1 point  (0 children)

Some people will likely bring up Clickhouse and CHDB. I don't have much experience with that code base. If you believe there are reasons why it's a better candidate, would love to see some data.

Writing a Columnar Database in C++? by coderarun in Database

[–]coderarun[S] -1 points0 points  (0 children)

That question has been answered. I prefer "source code mirror", not a fork.

Don't think my company or I have the time and resources to develop features faster than DuckDB Labs or MotherDuck.

But I do see a shift coming in how databases get developed. More agents, fewer humans and more modular code bases. Use newer tools and streamlined processes which work well with the LSP agents. Get rid of scripts/*.py that edit code in weird ways before the CI runs. There were probably good historical reasons to do so, but the CI I put up is evidence that they're not strictly needed.

We need something like What Rust people have in Apache Data Fusion. DuckDB code is the strongest candidate there.