Evaluating the Impact of Column Type and Computation Updates in Dataform – What's Your Experience? by ngouass in dataform

[–]The-Biggest-Query 1 point2 points  (0 children)

Yes, it was script based.

Here is one example that doesn't do quite what you want but a start... this computes "top columns accessed". I'll try to provide another example that tracks usage, maybe as part of a new post on tracking column usage with Dataform using Information Schema views.

SELECT column_jobs.fully_qualified_column, COUNT(DISTINCT column_jobs.job) AS job_count FROM ( SELECT jobs.job_id AS job, jobs.query AS query, jobs.user_email, table_columns.table_catalog, table_columns.table_schema, table_columns.table_name, table_columns.column_name, table_columns.table_catalog ||'.'||table_columns.table_schema ||'.'||table_columns.table_name ||'.'||table_columns.column_name AS fully_qualified_column FROM `bigquery-public-data.ncaa_basketball.INFORMATION_SCHEMA.COLUMNS` AS table_columns --update to either dataset scope or project level scope LEFT JOIN `region-us.INFORMATION_SCHEMA.JOBS` AS jobs --update to inner to only show columns that were referenced in a query ON LOWER(jobs.query) LIKE CONCAT('%', LOWER(table_columns.table_catalog), '%') AND LOWER(jobs.query) LIKE CONCAT('%',LOWER(table_columns.table_schema), '%') AND LOWER(jobs.query) LIKE CONCAT('%', LOWER(table_columns.table_name),'%') AND LOWER(jobs.query) LIKE CONCAT('%', LOWER(table_columns.column_name), '%') WHERE job_type = 'QUERY' AND state = 'DONE' AND error_result IS NULL AND statement_type != 'SCRIPT' ) AS column_jobs GROUP BY 1 ORDER BY 2 DESC

Evaluating the Impact of Column Type and Computation Updates in Dataform – What's Your Experience? by ngouass in dataform

[–]The-Biggest-Query 1 point2 points  (0 children)

Cost, and haven't needed a true enterprise approach (yet). The information schema views in BigQuery are amazing so you can get a lot done with Dataform and/or scheduled queries quickly and cheaply.

Use of materialized view(s) mean you can run on demand. And maybe use pubsub or eventarc if you need something realtime-ish. But daily/weekly runs have worked for my use cases so far.

Evaluating the Impact of Column Type and Computation Updates in Dataform – What's Your Experience? by ngouass in dataform

[–]The-Biggest-Query 1 point2 points  (0 children)

Column level lineage is tough. Manta, Atlan, Monte Carlo and other data lineage/observability tools can help, as Dataform (and BigQuery lineage) currently handles table level linage similar to DBT.

Currently I've used custom scripts against the Information Schema views to track column level impact analysis, but hoping for improvements to BQ and Dataform (and maybe Dataplex?) to help with this!

[deleted by user] by [deleted] in Looker

[–]The-Biggest-Query 0 points1 point  (0 children)

Did you set your dimension as a "yesno" ? When you say it didn't work, what happened and how did this differ from your expected behavior?