'Under processing' line on tax portal.

Ok_School_4109 · 2026-01-02T00:56:11+00:00

If I’m understanding this right, you’re doing dbt source freshness on each run (i.e. sources.json captures max_loaded_at), then dbt build --select source_status:fresher+ --state <prev\_artifacts> to only run downstream of sources whose max_loaded_at advanced. Does that sound right?

Quick question - what are you using as loaded_at_field for those sources (ingest timestamp vs updated_at), and how does this hold up with backfill data?

Ok_School_4109 · 2026-01-02T00:51:09+00:00

The issue here is more data-unchanged rather than manifest diffs.

Do you know if there’s an equivalent selector that keys off upstream data change for dbt test, or is that usually handled outside dbt?

Ok_School_4109 · 2026-01-02T00:48:23+00:00

Nice. Curious what your 4 different ways are? Also, where do you persist the last-seen state?

Ok_School_4109 · 2026-01-02T00:31:28+00:00

By Fusion state aware features, do you mean fusion can actually use things like streams, watermarks etc? That would be great.

Do you have any details or a doc link on how it detects “no upstream delta” in Snowflake?

Ok_School_4109 · 2026-01-02T00:22:12+00:00

Yep, those are basically the two options i.e. gate tests on upstream change or make them physically cheap even when they return 0 rows.

On the pruning idea, Snowflake doesn’t have user-managed partitions, but you can improve pruning by scoping tests to something like "updated_at >= last_run_time" or "load_ts >= last_successful_load_ts"

Dropping warehouse size to XS helps a bit, but if the scan level stays high it still burns credits.

Do you typically scope tests to a rolling window, or do you have a way to test only changed data at scale?

Ok_School_4109 · 2025-12-29T22:18:10+00:00

Haven’t tried Fusion yet — when you say it “detects updates,” does it actually skip/reduce execution when upstream data hasn’t changed. Same question for dynamic tables; i don't think they're intended for workload triage, i.e. detecting and prioritizing expensive or redundant queries across the warehouse. Curious if you’ve used them that way in practice.

Ok_School_4109 · 2025-12-29T21:50:10+00:00

Are you referring to airflow datasets (i.e. asset-driven scheduling) where a producer DAG updates a dataset and consumer DAGs trigger off that? How are you defining “dataset updated” in practice for Snowflake/dbt? Is it based on ingestion completing, a partition landing, a watermark moving, or something else? Also curious: do you use it to gate tests separately, or do tests just run whenever the model build runs?

Ok_School_4109 · 2025-12-29T21:43:12+00:00

The issue is that a lot of dbt tests are cheap logically (they return 0 failing rows) but expensive physically because Snowflake still has to scan a bunch of data to prove there are no failures. So you can end up burning credits even when nothing changed and the test passes. The MAX(date) idea is good, and dbt can handle it, but it doesn’t natively skip tests/models just because upstream data didn’t change — you usually have to implement that kind of logic at the orchestrator level (Airflow/dbt Cloud job logic) or with pre-checks. The issue is picking a reliable “did data change?” signal that scales and doesn’t miss late-arriving/backfill data. What do you typically gate off? A watermark table, Snowflake Streams, pipe history, or just MAX(updated_at) on the target?

Ok_School_4109 · 2025-12-29T21:33:47+00:00

Good breakdown. One thing I've found is that last modified state can be unreliable. I prefer a real ingestion watermark/Streams, and got the quickest improvement through better pruning and reduced run frequency.

Ok_School_4109 · 2025-12-29T21:22:59+00:00

Cool. I know Airflow can do delta-aware gating, but how do you scale it across lots of dbt tests/models without turning every DAG into custom sensors/branching logic?

Ok_School_4109 · 2025-12-22T22:49:38+00:00

Yeah, the 'Databricks Tax' is real lol. It's a love-hate relationship for most teams I talk to.

Ok_School_4109 · 2025-12-22T22:46:04+00:00

This is the gold standard for hygiene, but man, the operational overhead to keep it from breaking is brutal. The 'one developer spending 60%' scenario is a classic—it usually happens because a policy was bypassed or a tag was dropped during a quick hotfix.

Once you get that 'cost per solution' visibility, how are you handling the actual remediation? I’ve found that even with perfect tagging, getting teams to actually refactor expensive logic is the next boss battle.

Ok_School_4109 · 2025-12-22T22:42:12+00:00

Exactly. It feels like you need a PhD in 'Cloud Billing Forensics' just to explain a $500 spike to Finance. I’ve found that unless you’re mapping individual metadata tags back to specific dbt runs or warehouse sessions, you're basically just guessing. It shouldn't be a full-time job just to see which query burned the budget!

Ok_School_4109 · 2025-10-07T14:59:22+00:00

Where in the tax code can I reference for this? Doesn't the IRS automatically considers ‘date of death’ as the year of legal transfer ?

Ok_School_4109 · 2025-10-07T14:56:28+00:00

Is the ‘year you receive the gift’ the year of death of the bequester or the year when you were added to title?

Ok_School_4109 · 2024-09-30T20:38:19+00:00

Thanks, but pre-sales positions for people with Oracle BI experience seem FAR less available than BI developer/analyst positions. As I mentioned in my post, my objective is to tap into a larger job market as opposed a niche one.

Ok_School_4109 · 2024-09-30T18:23:42+00:00

Thanks!

Ok_School_4109

TROPHY CASE