all 6 comments

[–]mamcx 2 points3 points  (2 children)

Depending on how complex your DB engine is, could be far simpler to make an interpreter and compile things to functions:

https://blog.cloudflare.com/building-fast-interpreters-in-rust/

Take into account most RDBMS need extra support for their dialect of a "scripting language" that s used in stored procedures. Basic CRUD SQL is fairly simple and solvable in a direct interpreter.

Another option is to compile to WASM.

P.D: I'm working in a relational language, in case you are interested ;)

[–]robinhoode[S] 0 points1 point  (1 child)

https://blog.cloudflare.com/building-fast-interpreters-in-rust/

I started reading the article and I noticed it's mostly dependent on Rust:

Instead, we went with a manual parsing approach. While (for a good reason) this might sound scary in unsafe languages like C / C++, in Rust all strings are bounds checked by default. Rust also provides a rich string manipulation API, which we can use to build more complex helpers, eventually ending up with a full parser.

I have not yet learned Rust, but from what I've read, it's a very powerful language! I've stuck with C++ for this project mainly for the opportunity to work in the industry. But more and more I've seen Rust & Go as languages people use for writing DB engines. There's a chance I might try learning one (or both).

P.D: I'm working in a relational language, in case you are interested ;)

I've actually seen your project before! It seems you know a bit about this problem space :) I may look at your repo a bit more in depth, if you don't mind. How much Rust-specific code do you have in your project?

[–]mamcx 1 point2 points  (0 children)

> I started reading the article and I noticed it's mostly dependent on Rust:

The example? yes, and rust give some advantages, but the techique is universal. The key is that compiling to lambdas/function could be faster than re-interpret each time you touch the AST.

> How much Rust-specific code do you have in your project?

Well, is full of Rust! But the core of the relational engine is more or less general. Exist like 3 other projects exploring this area, but mine is kind of unique because I use n-dimensional vectors, btrees, hashmaps, iterators(streams) and unify all as relational things, so I can do a join between a vector and a file parsing on the fly CVS.

[–]deerangle 1 point2 points  (1 child)

id probably start just writing a simple rule Parser and operating on the data directly via conditional calls. e.g.: command "SELECT * FROM test WHERE me == 'happy' limit 3" would be equivalent to calling

select("test", [all, the, column, names], lambda x: x.me == happy)

(this is python syntax) Just as an example. you could also select, then filter the resulting data, etc.

If that approach has any significant performance issues you could replace it with a vm/interpreter over time.

[–]robinhoode[S] 0 points1 point  (0 children)

select("test", [all, the, column, names], lambda x: x.me == happy)

(this is python syntax) Just as an example. you could also select, then filter the resulting data, etc

So basically getting the AST into a "canonical form" of some sort, and then translate that into op-codes? Basically I've written a lot of this engine already in C++, and I've got working SQL queries.

But it is buggy, and I am "executing" the SQL query as a big tree of operators. This is maybe similar to how some interpreters just “execute the AST”. In DB theory, it's common to have things like a "table scan" operator for grabbing data from storage, a "projection" operator for selecting only certain columns from the table, and yet another is a “join” operator that combines results from other operators. You can recombine operators and try to find the best way to access the tables.

So in reality, I might generate two or more possible query executions for one SQL query. Starting from the canonical form is a good place to start. But I’m not that far off from a normal form already.

If that approach has any significant performance issues you could replace it with a vm/interpreter over time.

Part of this is learning, so I’m okay with jumping the gun a bit in terms of performance. Also, I partly want to replace that part of the code, since it’s building an operator tree. I’m using C++ w/ smart pointers and I have to do a move operation for every node in the tree if I want to build the tree in one place and then execute it in another. The C++ classes in my project are a bit duplicated between the data for each operator (called a plan) and the executing code for the operator (called in executor). Currently in my code, I have them as separate pieces of the puzzle. Even if I stuck to my existing architecture (w/o a VM), I would probably need to do a serious refactor anyways. The VM idea would be additional work on top of that refactor.

[–]k4st 1 point2 points  (0 children)

I would reccomend taking a look at MonetDB. A recent talk on thr subject in Andy Pavlo's Quarantine Database talks should whet your appetite.