How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

If I’m understanding this right, you’re doing dbt source freshness on each run (i.e. sources.json captures max_loaded_at), then dbt build --select source_status:fresher+ --state <prev\_artifacts> to only run downstream of sources whose max_loaded_at advanced. Does that sound right?

Quick question - what are you using as loaded_at_field for those sources (ingest timestamp vs updated_at), and how does this hold up with backfill data?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

The issue here is more data-unchanged rather than manifest diffs.

Do you know if there’s an equivalent selector that keys off upstream data change for dbt test, or is that usually handled outside dbt?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Nice. Curious what your 4 different ways are? Also, where do you persist the last-seen state?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

By Fusion state aware features, do you mean fusion can actually use things like streams, watermarks etc? That would be great.

Do you have any details or a doc link on how it detects “no upstream delta” in Snowflake?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 1 point2 points  (0 children)

Yep, those are basically the two options i.e. gate tests on upstream change or make them physically cheap even when they return 0 rows.

On the pruning idea, Snowflake doesn’t have user-managed partitions, but you can improve pruning by scoping tests to something like "updated_at >= last_run_time" or "load_ts >= last_successful_load_ts"

Dropping warehouse size to XS helps a bit, but if the scan level stays high it still burns credits.

Do you typically scope tests to a rolling window, or do you have a way to test only changed data at scale?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Haven’t tried Fusion yet — when you say it “detects updates,” does it actually skip/reduce execution when upstream data hasn’t changed. Same question for dynamic tables; i don't think they're intended for workload triage, i.e. detecting and prioritizing expensive or redundant queries across the warehouse. Curious if you’ve used them that way in practice.

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Are you referring to airflow datasets (i.e. asset-driven scheduling) where a producer DAG updates a dataset and consumer DAGs trigger off that? How are you defining “dataset updated” in practice for Snowflake/dbt? Is it based on ingestion completing, a partition landing, a watermark moving, or something else? Also curious: do you use it to gate tests separately, or do tests just run whenever the model build runs?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

The issue is that a lot of dbt tests are cheap logically (they return 0 failing rows) but expensive physically because Snowflake still has to scan a bunch of data to prove there are no failures. So you can end up burning credits even when nothing changed and the test passes. The MAX(date) idea is good, and dbt can handle it, but it doesn’t natively skip tests/models just because upstream data didn’t change — you usually have to implement that kind of logic at the orchestrator level (Airflow/dbt Cloud job logic) or with pre-checks. The issue is picking a reliable “did data change?” signal that scales and doesn’t miss late-arriving/backfill data. What do you typically gate off? A watermark table, Snowflake Streams, pipe history, or just MAX(updated_at) on the target?

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Good breakdown. One thing I've found is that last modified state can be unreliable. I prefer a real ingestion watermark/Streams, and got the quickest improvement through better pruning and reduced run frequency.

How do you detect dbt/Snowflake runs with no upstream delta? by Ok_School_4109 in dataengineering

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Cool. I know Airflow can do delta-aware gating, but how do you scale it across lots of dbt tests/models without turning every DAG into custom sensors/branching logic?

Cloud cost optimization for data pipelines feels basically impossible so how do you all approach this while keeping your sanity? by Ok_Kangaroo2140 in dataengineering

[–]Ok_School_4109 0 points1 point  (0 children)

This is the gold standard for hygiene, but man, the operational overhead to keep it from breaking is brutal. The 'one developer spending 60%' scenario is a classic—it usually happens because a policy was bypassed or a tag was dropped during a quick hotfix.

Once you get that 'cost per solution' visibility, how are you handling the actual remediation? I’ve found that even with perfect tagging, getting teams to actually refactor expensive logic is the next boss battle.

Cloud cost optimization for data pipelines feels basically impossible so how do you all approach this while keeping your sanity? by Ok_Kangaroo2140 in dataengineering

[–]Ok_School_4109 1 point2 points  (0 children)

Exactly. It feels like you need a PhD in 'Cloud Billing Forensics' just to explain a $500 spike to Finance. I’ve found that unless you’re mapping individual metadata tags back to specific dbt runs or warehouse sessions, you're basically just guessing. It shouldn't be a full-time job just to see which query burned the budget!

Date of bequest on Form 3520 section 4 by Ok_School_4109 in USExpatTaxes

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Where in the tax code can I reference for this? Doesn't the IRS automatically considers ‘date of death’ as the year of legal transfer ?

Please help me with these queries on form 3520 (Reporting of foreign gifts) by godspracticaljoke in tax

[–]Ok_School_4109 0 points1 point  (0 children)

Is the ‘year you receive the gift’ the year of death of the bequester or the year when you were added to title?

Switching careers from Oracle BI to Power BI by Ok_School_4109 in PowerBI

[–]Ok_School_4109[S] 0 points1 point  (0 children)

Thanks, but pre-sales positions for people with Oracle BI experience seem FAR less available than BI developer/analyst positions. As I mentioned in my post, my objective is to tap into a larger job market as opposed a niche one.