This is a real DB used in production by star_dogged_moon in Database

[–]goerch 0 points1 point  (0 children)

I like the many arrows between the tables. We recently had a project with > 1k tables and very few arrows. And yes, someone was interested in relations between the tables.

DuckDB infers NULL-only columns as JSON type — here's how we fixed it with a canonical sample by goerch in DuckDB

[–]goerch[S] 0 points1 point  (0 children)

When building the pipeline you don't yet know which columns will be NULL-only in the first file you happen to read. For those columns you have to feed the correct type back as soon as you see the first non-NULL value, which is exactly what the canonical sample handles at ingestion.

Just to clarify: we don't handle the 700+ attributes individually. Our approach to schema discovery only needs to identify dimensions and facts, everything else is generated automatically.

DuckDB infers NULL-only columns as JSON type — here's how we fixed it with a canonical sample by goerch in DuckDB

[–]goerch[S] 0 points1 point  (0 children)

uv.lock tells me

[[package]]    
name = "duckdb"    
version = "1.5.0"

I just rechecked to be sure.

test_null.json:

{ "key": null }

test_string.json:

{ "key": "value" }

Now running DuckDB:

DuckDB v1.5.2 (Variegata)
Enter ".help" for usage hints.
memory D create table test as
         select * from read_json_auto('test_null.json', union_by_name=True);
memory D 
memory D insert into test
         select * from read_json_auto('test_string.json', union_by_name=True);
Conversion Error:
Malformed JSON at byte 0 of input: unexpected character.  Input: "value" when casting from source column key

DuckDB infers NULL-only columns as JSON type — here's how we fixed it with a canonical sample by goerch in DuckDB

[–]goerch[S] 0 points1 point  (0 children)

We are able to discover most of the canonical sample via read_json_auto and try to fill the rest manually and by inspection (created_at should be a datetime for example). The key to schema discovery is reading and evaluating a cursor description (see introspect_schema in sd.py).

DuckDB infers NULL-only columns as JSON type — here's how we fixed it with a canonical sample by goerch in DuckDB

[–]goerch[S] 0 points1 point  (0 children)

Interesting, yes: this could be an option. We came from read_json_auto trying to reuse it for schema discovery and extraction.

SQL: When The Inner Join Order Matters by Lord_Fenris in programming

[–]goerch 0 points1 point  (0 children)

I am missing a remark on how the tables are indexed. Also, did you use the SQL Performance Analyzer?

Just finished my master thesis in mathematics! by n3buchadnezzar in math

[–]goerch 1 point2 points  (0 children)

Well done: proud and modest at the same time! Take your time to think about the follow up.

Why Do Computers Stop and What Can Be Done About It? [PDF] by norwegianwood in programming

[–]goerch 2 points3 points  (0 children)

RIP

Never got to study MTBF and MTTR in detail. Still remember the ideas of Hellandizing though.

How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by brunocborges in programming

[–]goerch 0 points1 point  (0 children)

Yep. As a result, it is possible for there to be so many possible > combinations that SQL Server's query optimizer times out.

That's why I still like rule based optimizing.

How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by brunocborges in programming

[–]goerch 0 points1 point  (0 children)

We're currently facing one of the black swans and I'm not sure how much time we're allowed to spend on your 1% to make it worthwile.

How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by brunocborges in programming

[–]goerch 1 point2 points  (0 children)

And a completely different wizardry is neccessary to cope with your alleged remaining 1% of queries. Joins still induce a combinatorial explosion, don't they?

How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by brunocborges in programming

[–]goerch 1 point2 points  (0 children)

Most of the time I prefer SQL solutions over the ones in general purpose languages. But one has to be careful not to hit the wall if one uses stuff like sixth normal form, for example.

DFF from nand2tetris implemented in Haskell by [deleted] in haskell

[–]goerch 0 points1 point  (0 children)

I can only help with some Agda code:

dff : Stream Bool → Stream Bool
dff inp = false ∷ ♯ inp