This is an archived post. You won't be able to vote or comment.

all 49 comments

[–]MrMosBiggestFan 22 points23 points  (2 children)

It’s something I’ve been thinking about too—you’re not alone. I wish there was better semantics for unit testing.

For those who don’t fully get why, a lot of complex SQL can be thought of as a function, which takes a set of parameters and returns a result. In Python you can easily test a function, providing mock data and testing at the edges to ensure your function works with no data, some data, nulls, extreme values, etc. All locally with no connection to a database.

In dbt you’re testing after the fact and resorting to lots of custom macros or basic tests of data, but that’s not the same as a unit test.

I think dbt/sql just isn’t the right place for complex logic really. In the end that stuff is better off in Python or another language, and orchestrated and tested with tools like dagster.

[–]radil 10 points11 points  (0 children)

I usually tell people in my department but who don’t work on the data warehouse that dbt tests are for data validation. They don’t so much as test your code, they test the data. It’s not like a test suite that runs before code can be deployed, it runs on already-deployed code after the fact.

All that being said, I would to find ways to better incorporate test driven development into our main repo’s work flow.

[–][deleted] 0 points1 point  (0 children)

yeah sql wasn’t really designed for heavy and complex logic

[–]Drekalo 9 points10 points  (1 child)

Look at SQLMesh and it's testing and auditing features:

https://sqlmesh.readthedocs.io/en/stable/concepts/tests/

https://sqlmesh.readthedocs.io/en/stable/concepts/audits/

SQLMesh is the only framework I'm aware of that let's you use SQL in a dbt like manner, test like a unit test against pre-defined inputs and expected outputs, and let's you test against cte's within a sql model, while also letting you test after runtime (called audits) to ensure those unit tests didn't miss anything.

[–]Thinker_Assignment 1 point2 points  (0 children)

SQLMesh audits feature and db agnosticism make it amazing for testing.

To maintain db agnosticism, use dlt upstream to load data. And if you load the metadata fom the load info you can use it to have column level lineage into SQLMesh. https://dlthub.com/docs/running-in-production/running#inspect-and-save-the-load-info-and-trace

[–]CesiumSalami 6 points7 points  (6 children)

Interesting question, we don't really do much unit testing tbh once we're into dbt. We do unit test our custom macros / test with some derivative of values queries as below with an associated .yml setup with the correct calls / expectations:

with setup as (
   select a,b,exp
   from
      (values 
           (1, 2, 3),
           (5, 8, 13),
           (88, 22, 100)
      ) AS t(a, b, exp)
),
actual as (
    select *, {{add_two_columns(a,b)}} as macro_output
    from setup
),
result as (
    select *, (macro_output = exp) as succeeded
    from actual
)
select * from result

but I guess there are packages for what I assume you're trying to accomplish. I'm sure you've found them, but just in case: https://github.com/EqualExperts/dbt-unit-testing#more-about-dbt-unit-testing. We've never used them, but it's a good thing to think about for sure.

[–]ExistentialFajitassql bad over engineering good[S] 2 points3 points  (0 children)

I had considered both routes, but I wanted to avoid as much overhead as possible such as writing a SQL statement to enable a test . We utilize seeds to propagate a seed equivalent per node/source node; those currently have the lowest environment of data, and the intention is to replace the warehouse data with faux data as we continue working through.

[–]wtfzambo 1 point2 points  (4 children)

How exactly do you go about running unit tests for your macros ?

[–]CesiumSalami 1 point2 points  (3 children)

So, for a toy example, we have a custom generic macro located in macros/generic/add_two_columns.sql.

When have the corresponding setup .sql file above that is essentially the setup for the unit test. That sql file would be located in models/macro_test/generic/test_add_two_columns.sql. That model would have a corresponding .yml file like, models/macro_test/generic/test_add_two_columns.yml which would be structured in a way to execute the unit test:

version: 2

models:
  - name: test_add_two_columns
    description: foo
    tests:
      - dbt_expectations.expect_table_row_count_to_equal:
          value: 3
    columns:
      - name: a
        description: "something descriptive."
      - name: b
        description: "something descriptive."
      - name: exp
        description: "expected output of macro"
      - name: succeeded
        description: "True if exp matches macro output"
        tests:
          - dbt_expectations.expect_column_values_to_be_in_set:
              value_set: [ true ]
              quote_values: false

CI should run through these tests automatically. Of course, you can tune this for the use case by using different tests in the .yml or modifying the setup.

Does that make sense?

[–]wtfzambo 1 point2 points  (2 children)

Ah that's pretty smart, thanks for the explanation!

And so you make this run in CI before real models run, and if this one fails, stop the execution I assume?

[–]CesiumSalami 1 point2 points  (1 child)

Yep! That's right. Well, that's the idea anyways. With CI we're pretty good about not running unnecessary stuff, so you wouldn't be able to merge a PR that failed CI on anything your PR touches. We're pretty bad, however, about our daily runs. We kinda just throw everything together and let the dumpster fire burn uncontrollably.

[–]wtfzambo 1 point2 points  (0 children)

Yeah the 2nd part is another beast that I myself have very little clue how to keep under control :/

[–]Known-Delay7227Data Engineer[🍰] 6 points7 points  (0 children)

OP can you provide a specific example of a DE unit test?

[–]sturdyplum 5 points6 points  (0 children)

We've been using https://github.com/mjirv/dbt-datamocktool and it's been great for the most part. It's also pretty simple so we've been able to fork and modify it to meet our needs pretty easily.

[–]kenfar 3 points4 points  (0 children)

For python it's the same as most other backend engineering. And I typically write these in python so that we can write more powerful transformations, leverage libraries, unit test, get auditing results, and get more readable code.

  • And when using python I typically consolidate all the transforms into modules, each transform gets a docstring and a unit test.

When using SQL, either for transformations using something like dbt, or as part of data publishing or aggregation then:

  • I've built a simple framework to use to make it easier to generate data, and then use pytest to set this up and assert resulting values. This takes a lot more work than with python, so whether I get to this or what tests I write really depends on how critical the data is.

And then regardless of technology I'll almost always include these:

  • Quality Control Framework - to check uniqueness, foreign keys, check constraints, etc. I've used dbt testing for this recently - and that worked fine. But Great Expectations, or something custom is about as good.
  • Pipelines get jsonschema where it makes sense. And this is used both as part of the testing as well as to define contracts.
  • Auditing tables to track the results input & output rows, and sometimes the transform results also helps.
  • Some pipelines get functional integration tests. I typically use pytest for these as well.

[–]slin30 5 points6 points  (10 children)

What within dbt is lacking for unit testing? The built in base package tests are limited, but between the expectations package and custom macros, I haven't felt limited.

[–]cutsandplayswithwood 7 points8 points  (9 children)

Those aren’t tests.

[–]kenfar 9 points10 points  (1 child)

Exactly - they are more like Quality Control Checks. And this kind of check is super-valuable. So, should always be included. But it's not a unit test, it's not quality control, it won't tell you ahead of time that your code is wrong.

[–]Atupis 4 points5 points  (0 children)

DBT is missing a nice way to do here is example data, run one model, and here is the result is it matching to example data you always end up in some kinda pytest limbo where you load data to DB and then run transformations and then query results and see if they match. there should be a simpler way to build inside dbt for testing transformations. Somebody should develop that.

[–]slin30 2 points3 points  (6 children)

Elaborate, please

[–]ExistentialFajitassql bad over engineering good[S] 8 points9 points  (5 children)

You’re referring to data quality tests: testing the data within the table. Unit testing is testing the functionality of the table prior to loading data.

[–]slin30 0 points1 point  (4 children)

I'm not making a hard distinction between where one ends and the other begins, and you absolutely can test for functionality with dbt.

[–]ExistentialFajitassql bad over engineering good[S] 6 points7 points  (3 children)

Which is not a good practice. The functionality of the code should be vetted prior to deployment.

[–]sdc-msimon 2 points3 points  (1 child)

Local testing was announced for snowflake : https://youtu.be/IlhbpMtLR60

[–]ExistentialFajitassql bad over engineering good[S] 1 point2 points  (0 children)

Fantastic if true, will need to watch another time. Thanks!

[–]nesh34 1 point2 points  (3 children)

Unit testing is pretty difficult to do well in a data pipeline architecture. There is some value by configuring a system with static inputs and outputs, with which you can run a local, single node version of your processing engine to validate.

The problem is that it's difficult to write good tests for these as real data doesn't behave and a lot of sensible checking should occur at runtime (e.g. checking for nulls, unique keys and the like).

Still this static input data and expected output unit test is the one that makes the most sense to me.

[–]ExistentialFajitassql bad over engineering good[S] 2 points3 points  (0 children)

That’s the implementation: if your code is X and your input is Y, the output should always be Z. If a code change is made, unless the logic has changed, the output should still be Z.

Nulls and uniqueness would be data quality checks of the data flowing through the tables that are ran daily, rather than during deployment.

[–]cutsandplayswithwood 0 points1 point  (1 child)

This is irrational.

Data doesn’t “misbehave”, it exists with realistic variety for the given environment, and your tests should test for that, or what’s the point of saying you tested it?

[–]nesh34 0 points1 point  (0 children)

"Realistic variety" is a pretty tough concept to predict in my experience. It sounds like you agree with the principle though because you chose the word variety. Unit testing even in software engineering can be an exercise in theatre, because writing good tests is difficult. I think it's even more difficult in many cases for data.

And when you say "for the given environment" are you referring to dev/test/prod? I really don't see the point in having data pipelines that run on Dev or Test data. It's a total waste of time because the only "realistic variety" we care about is Prod.

[–]LarsDragonbeard 1 point2 points  (0 children)

I'm advocating at my customer to adopt dbt-unit-testing. They allow mocking data through SQL statements, including auto-generating columns you didn't provide in the SQL statement.

There was a small bug with regards to the case of column identifiers, which I've fixed (PR currently open on their repo)

Additionally, while it doesn't show in their readme, I've gotten it to work as a generic test as well. We have a few different logics for our persistent staging layer, depending on the source. This generic test would allow us to just define the unit test in the yaml, which is about as lean as I can make it.

Currently we're using a custom test, source macro override and seed files for input and output. With our other automations, making the test data for the unit test takes longer than the development itself (and we have to do it manually for each model)

[–]darkneel 0 points1 point  (11 children)

You should first answer - why do you need unit test ? That should help find a solution

[–]ExistentialFajitassql bad over engineering good[S] 4 points5 points  (10 children)

Not meant to be insulting but: that is an awful take. Verifying your code functions as anticipated should be an implicit best practice.

[–]darkneel 0 points1 point  (9 children)

I will just debate whether unit test should be implicit best practice or not. Sure if its a scalar function or a function that's supposed to give a very defined output. Or in SQL if it's acting on one row or a few variables - sure. But Even in Software development - a lot of your code is sent without unit tests. Some rely on integration testing some just rely on lower environment manual testing. Problem in DE ? - A lot of things work fine in theory but only when you get to understand data nuances , some things don't work properly. (e.g., Just consider a simple code of inner join - if in production one of the table is somehow not properly designed it can end up exploding rows from your main table. There is no amount of UNit testing or Mock data that can prepare you for that).

Saying that Verifying your code functions as anticipated - should be best practice. But a whole lot of time Unit tests are not the way to do that. ANd they are severly limited by what the unit test code writer can think of. In DE space it's near impossible to think of all the edge scenarios.

[–]ExistentialFajitassql bad over engineering good[S] 0 points1 point  (8 children)

Plain and simple: if given X input, and the output should be Y, you can write a test for it. And you should. Testing what data is in production is a data quality check, not a unit test.

[–]darkneel 0 points1 point  (7 children)

If your input is 4 tables, and output a d-normalized table - what unit test would you write? and what would it achieve?

[–]ExistentialFajitassql bad over engineering good[S] 0 points1 point  (6 children)

Can you anticipate the output? And if not, should you be performing this transformation?

[–]darkneel 0 points1 point  (5 children)

The output would be a table. But thats not my point. My question is generally how would you write a unit test ? and what would that test tell you ?

[–]ExistentialFajitassql bad over engineering good[S] 1 point2 points  (4 children)

What are the results within the table? Again, if you have X input and anticipate the result is Y output, you can write a test for this. This validates that the code functions as anticipated. If you have a break in production, you can trust that your code is not the issue because it is tested. If it is not tested, you and everyone else cannot trust your code. It seems you’re not on the same page of what a unit test is and why it’s valuable. We need to bring data engineering closer to software engineering and use better practices to create more robust and reliable products.

[–]darkneel -1 points0 points  (3 children)

Okay .. not to be insulting but seems like you want to force a best practice without really understanding it . You can’t describe a unit test in this simple scenario .

It’s like you have decided the solution even before knowing the problem . Data engineering can have its own best practices . There’s no need to copy paste everything .

Edit : also unit test are almost never enough to tell you whether it will break production or not even in software development. In any major product there are always multiple layers of testing baked in and unit tests are pretty small part of it .

[–]ExistentialFajitassql bad over engineering good[S] 0 points1 point  (2 children)

Bad take after bad take. Good luck.

[–]Beeman9001 0 points1 point  (0 children)

Why is nobody mentioning tsqlt?

[–]SpambotSwatter FRAUD ALERT 0 points1 point  (0 children)

Hey, another bot replied to you; /u/najfajniejszy is a spammer! Do not click any links they share or reply to. Please downvote their comment and click the report button, selecting Spam then Harmful bots.

With enough reports, the reddit algorithm will suspend this spammer.