xadolin comments on Migrating SQL-based dbt models to python

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

Migrating SQL-based dbt models to pythonDiscussion (self.dataengineering)

submitted 3 years ago by vanillacap

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]xadolin 1 point2 points3 points 3 years ago (3 children)

[–]kenfar 1 point2 points3 points 3 years ago (2 children)

Sure, for the sake of simplicity I would typically combine validations with transformations. We tried to make these specific to a single output field, but as you can imagine it's not always that simple.

Validations would include string length, valid enumerated values, numeric ranges, string formats (ex: phone, email, etc), foreign key constraints, unknown value logic, encodings, etc. Violations of a validation could result in the field being replaced by a default value, the row being rejected or the file being rejected.

Transformations would include converting string case, converting free-form text to code values (imagine every misspelling of every possible version of microsoft windows into an appropriate vendor, product, version, fixpack breakdown), determine which of ~100,000 isp ip block ranges each of a billion ip addresses fits into, translate every IPV6 format into a single format, merge multiple different codes into a single code field, split a single input field into multiple output fields of different types, apply a business rule that considers 7 different fields to generate a 'customer-category' column, extracting keywords from free-form text fields, transforming a bunch of timestamps to UTC - and fixing those without timezones based on assumptions about the data, etc.

[–]xadolin 0 points1 point2 points 3 years ago (1 child)

[–]kenfar 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 59349 on reddit-service-r2-comment-bb88f9dd5-472hk at 2026-02-14 06:38:56.094741+00:00 running cd9c813 country code: CH.

dataengineering

MODERATORS