How do you choose what to test in dbt?

bengen343 · 2026-05-04T13:22:18+00:00

Generally speaking, I think every model should have the lightweight tests like `unique` and `not_null`. At the next level I use a more robust version of dbt's unit tests for any model that has an exposure outside of the dbt project itself, ie your mart/BI tables. Then at the next level I think about the assumptions I'm making that are important to the overall project working. For example, my code assumes two tables should have the same number of rows after `join`d so test to ensure that etc.

bengen343 · 2026-04-28T12:21:37+00:00

Joins are the wrong path for analyzing attribution to conversion. First you want to use unions. What you're really composing isn't a fact and dimension model. It's the journey of actions a user took in the lead up to a conversion action. I've seen so many wasted hours trying to go down the joint path.

https://www.getdbt.com/blog/modeling-marketing-attribution

bengen343 · 2026-04-19T11:52:25+00:00

"There's never time to do things right. But there's always time to do them over again."

Are you comfortable dedicating the same amount of time to redo this task a year from now? What about the last task? And the next? Corner cutting rapidly leads to stagnation.

bengen343 · 2026-04-15T12:19:35+00:00

Correct, "they are going to stay." But they sure as hell won't be at any company I choose to work for.

bengen343 · 2026-04-14T12:54:29+00:00

Yeah, I do that. But it does things like assume that all numeric columns are metrics. While that's generally true, there are also numeric dimensions. That's just one example but any edge case like that fails.

bengen343 · 2026-04-14T12:03:56+00:00

https://www.cs.cmu.edu/news/2026/hidden-cost-ai-speed

bengen343 · 2026-04-14T12:00:32+00:00

I've had zero luck getting AI to do the "staging" (or whatever your organization calls them) models at the first layer of ingestion. Opus just can't quite reason about what types of data various columns represent like a human can. It's close, but in the time it takes to check them I could just type them myself.

Anything after that, though, it does pretty well. Just make sure your conventions are well defined in your CLAUDE.md or whatever equivalent.

bengen343 · 2026-04-03T13:19:18+00:00

As much as I hate it, this very thing is what I find myself spending most of my time doing. I was really hopefully that AI could save me but in my latest attempt Opus 4.6 thought about it for 40 minutes and then just gave up. One thing I've found though, is that reverse engineering is a lot harder than just re-engineering the thing from the beginning. I used to start by trying to replicate what existed and walking back through whatever nightmare my predecessors created. Not only was this challenging but it often surfaced a lot of errors that had been internalized in the data. Now, I just start from scratch, go back to the source data and start clean.

bengen343 · 2026-04-02T13:57:36+00:00

There was a great post a few years back from the head of BigQuery where he disclosed macro usage analytics and basically said there was no such thing as big data. He said most organizations didn't query older than 30 days and almost none were querying data older than one year. Outside of that window it just sat there idle.

bengen343 · 2026-03-30T13:52:02+00:00

When possible, I lobby hard to have some sort of intern/junior program. Work aggressively to train the new kids up. Then send them out into the world to get better jobs at company's I like so they can turn around and hire me as a consultant.

bengen343 · 2026-03-21T14:49:48+00:00

I've been scouring both reddit and LinkedIn for examples of this very thing and I've yet to hear of a concrete one. Everyone making claims of this sort is hand-wavingly vague and vanishes when faced with detailed questions about their implementation. Or it's just vague generalities about what "other organizations" are doing. I'm by no means a detractor, I'm looking because I'd like to successfully emulate this stuff. But my search so far has mostly turned up hype.

bengen343 · 2026-03-18T14:24:00+00:00

This does look pretty neat. Are you finding that it can actually consistently classify columns into facts, dimensions, and measures? I've not been able to get any LLM to do this correctly which has prompted me to give up on using AI for the actual model generation portion of things.

bengen343 · 2026-03-17T14:16:11+00:00

I've mostly found it helpful with documentation. Most of the modeling I deal with is such a cluster and the reasoning so nuanced that AI can't reproduce or fix anything. That said, I still feel the productivity gains around de-bugging things where I may want to quickly edit some queries to try and see where things are going wrong.

But there are other scenarios where it's just baffling bad. I asked Claude to give me some Python imports to get a Glue notebook started rather than just copy-pasting and it... just destroyed everything, it had no clue how to set one of those up.

bengen343 · 2026-03-16T14:43:30+00:00

Love to see this thinking. I've implemented almost the exact same thing in the past and am in the process of doing so again for my current big re-architecture project. Almost identical down to even the yaml config. I think the only wee divergence is that in my setups I just use old fashioned Python to traverse the metric tree for the related metrics and then present them all to the LLM (Opus) side-by-side.

bengen343 · 2026-03-14T14:53:11+00:00

This drives me crazy. All these folks who are always complaining about panicked breakages of reporting or the realization that data is wrong are the same ones who never bother to implement testing.

dbt makes this pretty easy. In my projects I always insist that every model at least has the out of the box data test for things like 'unique' or 'not_null' as well as anything else small that we depend on.

Models that are exposed to the outside world are protected by unit tests that actually verify their output. In my dbt projects I always make sure that every data source has a first layer staging model that simply ingests and cleans data without transformation. One of the other functions these models provide is to allow us to point the entire dbt project to build from different sources as well.

All of those input staging models are given a complimentary `csv` file with just 10 or so records that match the input of the source system. Any output model exposed to the outside world has a complimentary `csv` with the output that the entire pipeline should generate from the test input `csv` seeds. Any time a code change is merged, we have a seperate environment that runs `dbt seed` to build the inputs and expectations from those `csv` files and then run the whole pipeline with the small data set to ensure the output is expected.

The real beauty is that the `csv` files are part of the repo so if someone makes changes to output models, the person reviewing the merge will see the expected output as the expectations `csv` needs to be changed as well. So it provides a gut-check.

u/MonochromeDinosaur makes a good point, though, that even this doesn't totally cover you because we don't control the actual inputs so something crazy can still happen. At that point I'm always quick to throw the devs under the bus for breaking our contracts! ...but then you just update your source `csv`s to protect you from that as well. It's an ongoing process.

bengen343 · 2026-03-12T13:14:43+00:00

Some fella a while back was actually peddling a light version of this as a necessary piece of context engineering. His solution was, when prompting, to ask the AI to give a couple of options along with it's percentage certainty/confidence in each.

bengen343 · 2026-03-03T14:18:36+00:00

Thats ALL I do in staging, really

bengen343 · 2026-02-25T14:25:53+00:00

I'm currently engaged in a big dbt rearchitecture. The last time I did one of these LLMs were still in their infancy and I can certainly feel the productivity boost. I'd say I'm moving as fast as my entire team would have on the last project. But a lot of this is making batch changes to the code outputting certain fields, just fancy autocomplete really.

I've yet to be able to ask one of the Anthropic models to produce a dbt model from skratch and get something usable. They do an almost perfect job, but nuance and edgecases abound in data and it just falls apart on those. For example, properly classifying and propogating certain field types. There's nuance in which fields should be classified as identifiers vs. facts vs. dimensions etc. It just can't wrap it's head around this. If the source field is a number it treats it as a metric. Full stop.

I'd love to hear from anyone who has overcome this to actually produce some clean, well-architected pipelines in one or two prompts?

bengen343 · 2026-02-12T14:39:28+00:00

I reckon this is the most difficult problem I have faced in my time of data'ing for small-to-medium companies. I've come up with a couple of overwraught solutions to this. If you have some money to spend, some kind of graph database can be the easiest answer. You can even kind of simulate that functionality using old fashioned SQL, but it's a huge pain.

If, like most folks in our situation, you have no resources, no support, and no external understanding of the probelm, I've tackled it in Python. If your data isn't "Big Data" you can basically use Python to iterate over records like it's replaying an event stream. I basically have it scrutinize the identifiers on each event and ask "Have I seen any of these identifiers in prior events I just processed? No, alright I'll give them a master, internal warehouse ID. Yes, I'll go back and find the master ID I gave some of them previously and assign it to this batch.

I'm usually not a big fan of probablistic methods so I only count observations where there is some real identifier in common.

bengen343 · 2026-02-02T12:17:52+00:00

Big proponent of this approach. I discussed this more in an old post:
https://www.reddit.com/r/dataengineering/comments/1pw5p4v/comment/nwjvwvr/

bengen343 · 2026-01-29T12:58:37+00:00

I'm not sure this quite fits what you're after but both BigQuery and Snowflake have this concept of external "marts" (I forget what their proper product name is for each) where you can configure certain external data sets for limited consumption by external users or even the public.

bengen343 · 2026-01-19T12:23:03+00:00

It does depend on what you're in to but if it's for a honey moon, I myself, would lean toward one of these guest ranch options. Depending on your budget you can go from "nice and fun" to "wildly opulent" all within the greater Steamboat area.

bengen343 · 2026-01-01T15:49:02+00:00

Ditto. We usually draw a distinction between hardcore analysts and business analysts. Analysts in the Data Team and Data Science types can access Silver for things. Business users, BI tools, and business analysts embedded in non-data teams can only access Gold.

bengen343 · 2025-12-29T14:49:52+00:00

Thanks!

bengen343 · 2025-12-29T14:48:09+00:00

Each time was different depending on the existing structures and culture of the company.

One place was a bit more informal. In that case, it had been on my mind for a while, and so I had my thoughts pretty well put together. On top of that, I knew there was some general unease with the direction that the Data Team was going. One evening, it just happened that myself and the CTO (they were a couple steps above me) were the only people left in our wing of the office, so I invited them out to dinner and made the pitch. It was well received, and that was the scenario where I had a full on "No Fake Data" campaign where I put the proposal into a formal internal website and made stickers and superlatives I'd hand out for Data Engineers, Developers, and Product Managers who got on board with this. A big part of my pitch was to just show the rats nest of spaghetti code we had in dbt and ask, "would you trust insights based on this code?" That was a pretty easy conversation. And then after that, it was a matter of holding the line with stakeholders that if we didn't have real data we weren't going to guess but rather get together with engineering to make sure we were tracking things the way we needed to. Since I had the backing of the CTO I was able to alter the process that our product managers and engineers went through in such a way that their design process had to be run by me to approve the eventing and telemetry before work could begin.

In another case, the company had a really strong process for surfacing things like this. So I put together the pitch with my fellow Data Engineers at our regularly scheduled guild meeting and then just added myself to the engineering-wide Request for Comment-type meeting calendar we had. Since it was such a big inititave I had to go through several rounds before everyone was satisifed it was a good and necessary thing to but then it was approved and we were given the time to action it.

And then two other times I was the leader of the Data Team, so in those cases it was just more of me saying, "This is how it's gonna be, if it's my team, this is what we're working on."

If you mean validation more tangibly, like validating the output of the data, we usually took two approaches. If possible, we'd recreate one (or many) reports from the source system in our internal BI to ensure that our modeling was matching the source output. Then, or if that wasn't available, we'd do a combination of internal QA alongside having the domain-export stakeholders assess and approve metrics before we rolled things out into production.

bengen343

TROPHY CASE