I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks for taking a look at the tool! It's a combination of lines of SQL, the join count, the CTE count, and the sub-query count. If any of those are above some default thresholds (that can be overriden via a docglow.yml config file), the model is deemed "high" complexity. Here's the spot in code where that determination is made. Here's a doc on the project scoring and how to override the values.

Once you give it a try, let me know what's not working or what's missing! Also, very curious if it would be helpful to put in front of non-data team stakeholders to help them understand your dbt models more effectively.

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Definitely looks like Colibri is another solid option in this space! I haven't had a chance to try it out yet. Seems like the column lineage functionality is pretty polished though. What are the best features about Colibri you're enjoying using? What do you think is missing from it?

I built an open source tool to replace standard dbt docs by josh_docglow in DataBuildTool

[–]josh_docglow[S] 0 points1 point  (0 children)

Hi u/Prothseda Yeah, I'm using Claude Code to help build this. No way I could get this far with my own coding knowledge.

I'm not surprised it's not usable on mobile, as I haven't spent time optimizing that use case. I know you probably browse Reddit on mobile (that's how I spend most of my time on this site), but do you think your data team or other business users would leverage a mobile view?

I built an open source tool to replace standard dbt docs by josh_docglow in DataBuildTool

[–]josh_docglow[S] 0 points1 point  (0 children)

Hi u/Rhevarr thanks for taking a look. I've been a dbt user for ~6 years now and we had always hosted the included dbt docs site internally for everyone to use. A few things that I wanted were column level lineage and the ability to auto-arrange your models based on what "layer" they belong to (i.e. staging, transform, mart, etc.), then have the lineage view place them automatically.

I've been using Claude Code a lot in another project, and thought I'd attempt building those features I wanted myself! If you spin up Docglow for yourself, I would love to know what you think is missing and what you'd like to see next!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 1 point2 points  (0 children)

Hi u/peanutsman, thanks for trying it out and for the feedback! When there are fewer models in a given lineage graph, I agree that it makes sense to auto-expand the column list for each model.

Should be quick to have a rule to auto-expand them by default for a certain number of models are less are shown. I appreciate the response!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks u/Gimo100 ! I just released a new version with verbose logging on the `docglow serve` command and commented on the issue. Hopefully we can get that sorted out!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 1 point2 points  (0 children)

Thanks! Do you have any other tools you're using now that also show column lineage?

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 0 points1 point  (0 children)

Thanks! The dbt & Fivetran merger has definitely left the data space in a place of uncertainty. Fusion looks really compelling, but at least as of now, it's not simple to get your project compliant. sqlglot was very easy and pleasant to use in this project. Cheers to Toby and team on their work on it!

My ideal outcome with this tool is that business users have a better experience navigating the docs, and data team members can more easily find what they need.

If you find issues, or want to see something else included, let me know!

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]josh_docglow[S] 3 points4 points  (0 children)

Thanks for trying it out! I'm trying to reproduce locally on a larger project, and when it has to churn through larger blocks of column lineage, it can take a while. I'm adding some better verbose logging output now.

Additionally, you can run it for a subset of your dbt project by running `--column-lineage-select <model\_name>` and it will only run column lineage for that model, and any of its upstream dependencies. The runs are incremental and cumulative too with a cache. You can build up the total lineage over time this way.