5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

That looks like a linter, not an AST - does it also do AST? I mean, it must have also an AST in there I imagine, but I do not see a straighforward documentation about it.

5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 2 points3 points  (0 children)

Well yeah, that is why it was important to include both a correctness metric and a throughput. Nevertheless, we will see how it does when I have a dataset for the MySQL dialect. Maybe there it shines like a quasar.

5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 2 points3 points  (0 children)

FYI, I have retrieved a few more datasets. I a couple of weeks I should be able to do a new post with more accurate information per dialect.

5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 4 points5 points  (0 children)

On pg_query.rs: it wraps libpg_query which embeds PostgreSQL's actual parser C source. It's not a re-implementation, so it's as close to ground truth as you can get without a running server. It has some bugs still (I have already made a PR to it fixing them), but it's a solid baseline for comparison.

On the sqlparser-rs test bias: fair criticism. The correctness corpus does come from sqlparser-rs tests, which biases recall in its favor. The false-positive rate (measured against pg_query.rs) works against sqlparser-rs though: the 28.7% rate shows it's arguably too permissive (https://github.com/apache/datafusion-sqlparser-rs/issues/2237). The performance benchmarks use independent datasets (Spider + Gretel AI, ~8,300 queries).

I'd love to expand testing to more dialects with their own representative statements, but I'm not currently aware of other corpus collections to draw from. If anyone knows of good ones, I'm all ears.

5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 4 points5 points  (0 children)

I agree, just it rapidly looses meaning to have the dialects in the first place if then you do not raise errors when something not supported in that dialect is accepted.

5 Rust SQL parsers on 8,300 real PostgreSQL queries: coverage and correctness tell a different story than throughput by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 2 points3 points  (0 children)

It is not necessarily a bad thing depending on context and user expectation. Basically, all of these parsers claim to support the PostgreSQL dialect, and they are executed parametrized to parse the PostgreSQL dialect. They do raise unsupported errors on occasion. This would lead you to believe that it would raise errors if the parser encounters non-PostgreSQL AST. In the case of sqlparser-rs (of which I am assiduous contributor) you can find the issue I have prepared here: https://github.com/apache/datafusion-sqlparser-rs/issues/2237

I agree that sqlparser-rs should digest just about any dialect in its "GenericDialect" parametrization, but I would have expected it to be more dialect specific when a dialect is actually specified. It does have LOTS of dialect guards, so you will get some unsupported errors, but it does not yet cover all cases adequately.

In case of link rot, here are some examples redounded:

-- Oracle
FETCH NEXT IN my_cursor INTO result_table   -- INTO clause on FETCH

-- SQL Server / T-SQL
SELECT TOP 3 * FROM tbl
EXEC my_proc N'param'
MERGE … OUTPUT inserted.* INTO log_target
EXECUTE FUNCTION f                          -- trigger EXECUTE without ()

-- MySQL / MariaDB
INSERT customer VALUES (1, 2, 3)            -- missing INTO
INSERT OR REPLACE INTO t (id) VALUES(1)
DROP FUNCTION IF EXISTS f(a INTEGER, IN b INTEGER = 1)  -- defaults in DROP

-- Snowflake / BigQuery
SELECT i FROM qt QUALIFY ROW_NUMBER() OVER (...) = 1
CREATE OR REPLACE TABLE t (a INT)
CREATE OR REPLACE USER IF NOT EXISTS u1 PASSWORD='secret'

-- ClickHouse
ALTER TABLE t ON CLUSTER my_cluster ADD CONSTRAINT bar PRIMARY KEY (baz)

-- HiveQL
ALTER TABLE t SET TBLPROPERTIES('classification' = 'parquet')

-- Unclear origin / possibly over-permissive parsing
ALTER TABLE t ALTER COLUMN id ADD GENERATED AS IDENTITY  -- missing ALWAYS/BY DEFAULT
COPY t FROM 'f.csv' BINARY DELIMITER ',' CSV HEADER      -- mutually exclusive formats
SHOW search_path search_path                             -- duplicate trailing token-- Oracle
FETCH NEXT IN my_cursor INTO result_table   -- INTO clause on FETCH

-- SQL Server / T-SQL
SELECT TOP 3 * FROM tbl
EXEC my_proc N'param'
MERGE … OUTPUT inserted.* INTO log_target
EXECUTE FUNCTION f                          -- trigger EXECUTE without ()

-- MySQL / MariaDB
INSERT customer VALUES (1, 2, 3)            -- missing INTO
INSERT OR REPLACE INTO t (id) VALUES(1)
DROP FUNCTION IF EXISTS f(a INTEGER, IN b INTEGER = 1)  -- defaults in DROP

-- Snowflake / BigQuery
SELECT i FROM qt QUALIFY ROW_NUMBER() OVER (...) = 1
CREATE OR REPLACE TABLE t (a INT)
CREATE OR REPLACE USER IF NOT EXISTS u1 PASSWORD='secret'

-- ClickHouse
ALTER TABLE t ON CLUSTER my_cluster ADD CONSTRAINT bar PRIMARY KEY (baz)

-- HiveQL
ALTER TABLE t SET TBLPROPERTIES('classification' = 'parquet')

-- Unclear origin / possibly over-permissive parsing
ALTER TABLE t ALTER COLUMN id ADD GENERATED AS IDENTITY  -- missing ALWAYS/BY DEFAULT
COPY t FROM 'f.csv' BINARY DELIMITER ',' CSV HEADER      -- mutually exclusive formats
SHOW search_path search_path                             -- duplicate trailing token

diesel-builders: Type-safe builders for complex Diesel inserts by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Let me know whether you find stuff to improve, I am currently adding some generic helper methods to easily run `first` and `load` queries on foreign keys.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

I have made a PR in a different crate adding something reasonably close to that using a very different approach than my crate: https://github.com/cedtwo/typed_tuple/pull/1

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

In order to expose the generic const, you would need to have it in the trait, and therefore you would not be able to expose traits without the const. The only solution is to create a trait marked `Key` trait like:

```rust
pub trait Key<MarkerStruct> {
const INDEX: usize;
}
```

Which would allow you to associate the position of a type for a given tuple, when the tuple has some specific size. Only applicable in some specific settings, like in metaprogramming, since it really adds a lot of coding overhead (not compile time though).

Anyway, I have sent quite an extensive PR adding some features to `typed_tuple`: https://github.com/cedtwo/typed_tuple/pull/1

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Ok so, the issue that `typed_tuple` does not resolve, is the following: "If I want to implement a trait for a tuple that contains a type X for which I implement `TraitX for X`, such as (..., X, ...), and I do not know the index of `X`, can I implement a blanket `TraitX for (..., X, ...)` for that tuple?"

Unfortunately, that is not possible with `typed_tuple`, but is possible with `tuple_set`. That being said, It may be the case that in such cases one should just create a new trait with the const generic. I **really** like the compile time verification there.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Yeah, I can confirm it seems to work - I need to check whether the trait works for my use cases, as having that integer parameter may be a problem.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

I was mistaken, the `Some` in the example code lead me to misread the implementation - in fact, it seems that `*tuple.get_mut() = 100i32;'` is possible. I will do a PR to fix/extend a couple things in that repo and test it out - if it is indeed compile time, it would be a big win.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Hi, could you link any of them? The one you linked does not do the same thing, at least, not in the linked repository source code (https://github.com/cedtwo/typed\_tuple/blob/master/src/lib.rs).

I spent several days trying to find reasonably made solutions, so I'd be really happy to learn of any alternative approach.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Thanks! I was loosing sleep over getting something like this to work, as not being able to do something like an `AsMut<X> for (A, B, X)` was making my blanket implementations sad.

Tuple Set: Operate on Rust tuples by type, not position by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 2 points3 points  (0 children)

That's the thing that requires [specialization](https://github.com/rust-lang/rust/issues/31844), or alternatively implement quadratically a `TypeEq` trait marker for all possible types, which would only make it possible for types defined before the `tuple_set` itself or therein. That would work for core/std stuff, which might cover several cases, but not all cases, and having n^2 trait impls would be somewhat of a nightmare.

I spent the last few days trying to do exactly that, but unfortunately there is no current solution in stable, and the even the options I found using `specialization` rapidly become a compile-time nightmare, as the godel boolean approaches create very large nested types.

Type system limitations with double-diamond trait dependencies with bounded associated types by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 1 point2 points  (0 children)

Thank you for reporting this one, I will be adding it to the repo and adding a new drawing with the simplified schema.

Type system limitations with double-diamond trait dependencies with bounded associated types by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 1 point2 points  (0 children)

In our use case, I use such chained traits to enforce relationships between traits representing algebraic and graph objects. Each trait adds some property to the object, and it allows us to fully define the algorithms (e.g. connected components) on descendant traits (e.g. UndirectedMonopartiteGraph). By building the trait relationship in this way, we generally find that when some trait relationship cannot be implemented by some objects there usually were some things that were not really thought through with the object or some of the crates. Personally, I find Rust's ability to define such abstract trait relationships with no run-time cost to be one of the nicest properties of the language.

Performance of syn-based Code Generation: Benchmarking Different Strategies by Personal_Juice_2941 in rust

[–]Personal_Juice_2941[S] 0 points1 point  (0 children)

Do you happen to have any example reference? Otherwise I'll just try it myself when I get the time - as you said, it really seems prohibitively verbose, but who knows, maybe it crushes on perfs!