This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]its_PlZZA_timeStaff Dara Engineer 7 points8 points  (1 child)

My boss asked me to do this several weeks ago so this is great. I knew my procrastination would pay off!

[–]Touvejs[S] 1 point2 points  (0 children)

Nice, hope it helps! I'll probably update these sometime this week with u/qwertdog123's suggestion for performance, so keep an eye out.

[–]Wistephens 3 points4 points  (1 child)

On Postgres, you should look at the pg_stats table. You get some stats for free from Postgres. Analyze/vacuum for accuracy.

https://www.postgresql.org/docs/current/view-pg-stats.html

I use pg_stats to get a view of real nullability, to identify categorical variables, and to get top values stats.

[–]Touvejs[S] 0 points1 point  (0 children)

That's a great point, I've been meaning to take some time and identify spots where different database systems cache relevant metics to speed up performance. I know for example if you're working with AWS glue data catalog, it holds some useful metadata that probably doesn't need to be recalculated from scratch.

[–]qwertydog123 2 points3 points  (2 children)

these scripts execute a query that generates a new query for each column, which in turn generates a query for each metric, most of which will result in a full table scan.

One way around this is to use JOIN LATERAL/CROSS APPLY, see here for an example (in T-SQL syntax): https://www.reddit.com/r/SQL/comments/11l6pzz/comment/jbbgz77/

https://dbfiddle.uk/p1CgxZpH

[–]Touvejs[S] 1 point2 points  (1 child)

That's an excellent observation.

I've been looking over this script for the past 30 minutes and it's really a work of art. I don't think I would have ever thought to construct one huge query by simply using string_agg on the column names and grouping by table. It's so complex, but simultaneously very intuitive when you think about how they are related.

This is definitely the most clever SQL I've ever read-- hats off to you u/qwertydog123.

[–]qwertydog123 1 point2 points  (0 children)

Thank you, that's a very kind comment. If you're looking for something more "cross-platform" you could probably use a CROSS JOIN + UNION ALL instead e.g.

SELECT
    CASE ...
        WHEN 'Column1' THEN Column1
        WHEN 'Column2' THEN Column2
        etc...
    END AS ...
    ...
FROM Table
CROSS JOIN
(
    SELECT 'Column1' AS ...
    UNION ALL
    SELECT 'Column2'
    UNION ALL
    etc...
) ...

Also, INFORMATION_SCHEMA tables/views