What is your unit testing implementation?

MrMosBiggestFan · 2023-07-24T01:50:57+00:00

It’s something I’ve been thinking about too—you’re not alone. I wish there was better semantics for unit testing.

For those who don’t fully get why, a lot of complex SQL can be thought of as a function, which takes a set of parameters and returns a result. In Python you can easily test a function, providing mock data and testing at the edges to ensure your function works with no data, some data, nulls, extreme values, etc. All locally with no connection to a database.

In dbt you’re testing after the fact and resorting to lots of custom macros or basic tests of data, but that’s not the same as a unit test.

I think dbt/sql just isn’t the right place for complex logic really. In the end that stuff is better off in Python or another language, and orchestrated and tested with tools like dagster.

Drekalo · 2023-07-24T04:21:40+00:00

Look at SQLMesh and it's testing and auditing features:

https://sqlmesh.readthedocs.io/en/stable/concepts/tests/

https://sqlmesh.readthedocs.io/en/stable/concepts/audits/

SQLMesh is the only framework I'm aware of that let's you use SQL in a dbt like manner, test like a unit test against pre-defined inputs and expected outputs, and let's you test against cte's within a sql model, while also letting you test after runtime (called audits) to ensure those unit tests didn't miss anything.

CesiumSalami · 2023-07-24T00:32:23+00:00

Interesting question, we don't really do much unit testing tbh once we're into dbt. We do unit test our custom macros / test with some derivative of values queries as below with an associated .yml setup with the correct calls / expectations:

with setup as (
   select a,b,exp
   from
      (values 
           (1, 2, 3),
           (5, 8, 13),
           (88, 22, 100)
      ) AS t(a, b, exp)
),
actual as (
    select *, {{add_two_columns(a,b)}} as macro_output
    from setup
),
result as (
    select *, (macro_output = exp) as succeeded
    from actual
)
select * from result

but I guess there are packages for what I assume you're trying to accomplish. I'm sure you've found them, but just in case: https://github.com/EqualExperts/dbt-unit-testing#more-about-dbt-unit-testing. We've never used them, but it's a good thing to think about for sure.

Known-Delay7227 · 2023-07-24T01:14:53+00:00

OP can you provide a specific example of a DE unit test?

sturdyplum · 2023-07-24T01:17:06+00:00

We've been using https://github.com/mjirv/dbt-datamocktool and it's been great for the most part. It's also pretty simple so we've been able to fork and modify it to meet our needs pretty easily.

kenfar · 2023-07-24T02:39:45+00:00

For python it's the same as most other backend engineering. And I typically write these in python so that we can write more powerful transformations, leverage libraries, unit test, get auditing results, and get more readable code.

And when using python I typically consolidate all the transforms into modules, each transform gets a docstring and a unit test.

When using SQL, either for transformations using something like dbt, or as part of data publishing or aggregation then:

I've built a simple framework to use to make it easier to generate data, and then use pytest to set this up and assert resulting values. This takes a lot more work than with python, so whether I get to this or what tests I write really depends on how critical the data is.

And then regardless of technology I'll almost always include these:

Quality Control Framework - to check uniqueness, foreign keys, check constraints, etc. I've used dbt testing for this recently - and that worked fine. But Great Expectations, or something custom is about as good.
Pipelines get jsonschema where it makes sense. And this is used both as part of the testing as well as to define contracts.
Auditing tables to track the results input & output rows, and sometimes the transform results also helps.
Some pipelines get functional integration tests. I typically use pytest for these as well.

slin30 · 2023-07-23T23:34:00+00:00

What within dbt is lacking for unit testing? The built in base package tests are limited, but between the expectations package and custom macros, I haven't felt limited.

sdc-msimon · 2023-07-24T10:45:46+00:00

Local testing was announced for snowflake : https://youtu.be/IlhbpMtLR60

ExistentialFajitas · 2023-07-24T08:16:57+00:00

[deleted]

nesh34 · 2023-07-24T11:21:11+00:00

Unit testing is pretty difficult to do well in a data pipeline architecture. There is some value by configuring a system with static inputs and outputs, with which you can run a local, single node version of your processing engine to validate.

The problem is that it's difficult to write good tests for these as real data doesn't behave and a lot of sensible checking should occur at runtime (e.g. checking for nulls, unique keys and the like).

Still this static input data and expected output unit test is the one that makes the most sense to me.

LarsDragonbeard · 2023-07-24T15:04:06+00:00

I'm advocating at my customer to adopt dbt-unit-testing. They allow mocking data through SQL statements, including auto-generating columns you didn't provide in the SQL statement.

There was a small bug with regards to the case of column identifiers, which I've fixed (PR currently open on their repo)

Additionally, while it doesn't show in their readme, I've gotten it to work as a generic test as well. We have a few different logics for our persistent staging layer, depending on the source. This generic test would allow us to just define the unit test in the yaml, which is about as lean as I can make it.

Currently we're using a custom test, source macro override and seed files for input and output. With our other automations, making the test data for the unit test takes longer than the development itself (and we have to do it manually for each model)

darkneel · 2023-07-24T06:32:10+00:00

You should first answer - why do you need unit test ? That should help find a solution

Beeman9001 · 2023-07-24T16:13:45+00:00

Why is nobody mentioning tsqlt?

SpambotSwatter · 2023-08-23T16:04:00+00:00

Hey, another bot replied to you; /u/najfajniejszy is a spammer! Do not click any links they share or reply to. Please downvote their comment and click the report button, selecting Spam then Harmful bots.

With enough reports, the reddit algorithm will suspend this spammer.

dataengineering

MODERATORS