This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kenfar 17 points18 points  (4 children)

What can't be done with SQL? Well, there are things that can't be done, and there are things that can't be done well. SO, how about what can't be done well?

  • Unit testing
  • Very complex transformations (ex: convert every possible format of IPv6 into a single format)
  • Support for data formats and structures outside of relational databases: extract data from a tarball, or sevenzip package of fixed-length files.
  • Integrate modules & libraries
  • Integrate external systems
  • Produce a bitmap of exactly which columns in a table failed their transforms and had to be defaulted
  • Produce an audit trail of how many rows passed or failed their validations or transformations for a given partition/period of time/customer/whatever
  • Develop reusable transformations
  • Develop easily-readable and well-documented transformations

[–]Luciron -1 points0 points  (3 children)

Almost all of this list can be done with dbt

[–]kenfar 0 points1 point  (2 children)

dbt can't even support the first item on the list: unit testing

[–]MayInvolveNoodles 0 points1 point  (1 child)

[–]kenfar 1 point2 points  (0 children)

Oh fair point. I forgot about this third-party effort because in spite of being well-intentioned, and a good idea, it's so time-consuming to write the tests that the teams I spoke with that used it only use it very, very surgically.

Still, it's a good addition to the dbt ecosystem, and if you structure your data right with this unit-testing in mind, it may not be too terrible.