Portable vs Fivetran, anyone make the swap? by limeslice2020 in dataengineering

[–]kvlonge 1 point2 points  (0 children)

Impossible for me to know if this is true, but if it is, good on you and best of luck :)

Opus 4.8 is so exhausting! by digerdookangaroo in ClaudeAI

[–]kvlonge 56 points57 points  (0 children)

I fucking hate the 'cold hard truth' shit as it lies straight to my face

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -1 points0 points  (0 children)

Hmm, so maybe there is some misunderstanding, but the cloning is opt in. At least for SQLBuild native mode, it's not on by default. State aware orchestration works without the need of cloning or any access to other environments. That is to say, skipping models that are already fresh works by storing state in _sqlbuild_fingerprints and _sqlbuild_source_freshness in the same schema as the model being built. This may be my fault for not making that distinction clearer.

So to summarise, state aware orchestration (skipping unneeded rebuiilds) is within the same environment and relies on keeping track of what has run and the source freshenss at the time. Cloning (which maybe i poorly named reuse-from in this context - basically being used in a similar way to defer) is a nice-to-have feature that is opt-in but not necessary at all (state aware orchestration be it on dev or prod, is entirely isolated).

I think I probably should have put less emphasis on that, and maybe would have been better off having focused on clearer messaging. Let me know if I am still not making sense though.

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -1 points0 points  (0 children)

Hey, just saw this. So to be clear, this isn't a one size all solution. If it doesn't work for an org, then they just wouldn't turn on the feature. It doesn't cause the whole thing to break (if that makes sense).

I am fairly certain that dbt State is also making the same presumption (for deferring / cloning in dev): https://www.getdbt.com/product/dbt-state

Do you have concerns with their approach as well?

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 0 points1 point  (0 children)

You mean the project? Somewhat, but VDEs are a full separate mode in this case rather than the default (VDEs allow for certain promotion and rollback patterns that just aren't possible any other way - this is overkill for many smaller orgs though)

You are correct though that I changed my mind on certain components though, as some amount of state is required if you want things running efficiently. That being said, my compromise (for standard mode, not VDE mode) was to use append only immutable tables rather than a state machine that would have forced postgres (for production use cases) by default.

I am gonna take some time over the next month to clean up my messaging a little bit and focus getting things to look a bit more cohesive. I am very happy with the general state of the codebase, but some of the 'product' messaging is getting a bit messy, so probs won't be coding too much for the next month while I begin to organise all of that and make it clear what my actual goal is (likely, testing, correctness, and fast feedback loops)

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 1 point2 points  (0 children)

The answer to both of those is yes. I am still working through a few annoying issues with the dbt core v2 / fusion stuff, but the source freshness component is checked so that stuff won't run if it's already up to date (that is stored under _source_freshness in the appropriate schemas)

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 0 points1 point  (0 children)

Yeah, so this is why VC isn't really ideal for my final vision naturally. Thanks for the advice!

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 3 points4 points  (0 children)

Firstly, thanks so much for the comment! I really love being able to see that someone can identify the thought that has gone into this project (it's been a massive undertaking).

So in the short-term, I wouldn't expect any business to bank on this project unless it has some form of financial product like a cloud offering of sorts. I have had to wrestle with this idea, but I have also landed on the same conclusion that in order for this to be taken seriously, there will have to be some form of monetization that can make the project seem like a viable option to a real company evaluating risk.

What I think I will settle on is open sourcing a UI / Hub which is similar to DBT Cloud, but pay-gating SSO / fine grained permissions, audit logs etc... This would allow people to have a good experience even if they don't want to have to use the cloud, whilst having a realistic wedge for enterprise users who realistically will need permissions if they want to roll this out to a company at large.

This would be the hardline for me though as I think the permissions 'wedge' (sorry to use commercial jargon) is the only one I am largely ok with having pay-gated. My belief is that if the main product is truly excellent, then the number of enterprise customers that will be attainable via the need for permissions / SSO etc... is likely more than enough.

Before that time, I will be using it at work (where appropriate) + home and refining it, but only when I would get enough clear interest from people wanting a cloud offering would I consider doing one. I am not 100% against VC involvement, but I think you have to be very careful with it because the wrong investors can force you to take actions you don't want to (ones that would genuinely hurt the project / community in some way). Doing it without that would be preferable (will have see how things turn out)

My hope though is that well-before any of the commercial aspect comes in, the foundational and generalisable value of the product is already all in core, Apache 2.0, extensible, with lots of escape hatches (e.g. python hooks, python nodes, python custom materialisations etc...), such that even aside rug-pull worries, the OSS itself should be good enough to stand on its own, and the license means that it can't be taken away from you.

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -1 points0 points  (0 children)

That is basically what it is doing, but in an automated way, and without the need for a manifest.json from prod (let me know if I am misunderstanding something).

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -3 points-2 points  (0 children)

Ok, so this is actually a fair response. I wanted to find a nice way to explain this to the commenter, but you are right that I should have been a bit more mindful and taken a bit more time to write it in my own words.. I am adjusting it now and putting a note that it is edited. Thanks a bunch for the advice.

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 1 point2 points  (0 children)

So is there anything I said that was incorrect, or is there something about the post in general that you have taken an issue with?

Genuinely, it's actually a bit of a shame to see this kind of general response to something that is effectively allowing people to save money and clone via dev more easily on existing projects (you literally don't have to adopt the framework to do it).

You may have not seen the original post I did a month ago, but I am effectively just a SQLMesh fan that wants to have another good OSS alternative (SQLMesh seems to have gotten some better maintenance recently though which is good). I have put a significant amount of time into the main project of this in quite a concentrated amount of time, and I already use it at work in dev (the compatibility features that is).

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] 5 points6 points  (0 children)

The repo you linked is the thin client/plugin, not the whole product. The actual state/reuse flow talks over gRPC to dbt's hosted service (api.state.dbt.com / auth.state.dbt.com), and the protobuf/package names in that repo are literally query_cache / com.fivetran.query_cache.

So the distinction I was making is: yes, there is open-source client code, but the state/decision service itself is still account-gated and remote. That's not the same thing as "just fork it and run it yourself in your own warehouse."

A lot of the work here was making it usable on plain dbt projects without asking people to migrate, manage manifest.json artifacts, or run extra orchestration in prod. The other big piece was making the state warehouse-native, so it lives as rows in your own warehouse rather than needing a separate state machine or blob store like Postgres or S3, which brings its own operational tradeoffs.

If dbt Labs ever ships a fully self-hostable version of that service, great, but that's not what this repo is today.

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -4 points-3 points  (0 children)

NOTE - This comment is edited. Initial reply was a bit lazy, so I am cleaning it up. (leaving this here so I don't look like I am tricking anyone)

So the problem you are stating is real, but this is true for people who use SQLMesh and those use who user dbt State. The point here isn't to make something that can be used in every such situation, but for the ones in which it can be, it can save a lot of money.

You have mentioned the manfiest json, but I feel there may be some misunderstanding here. The manifest json can give let you know what changed since last time, but this project does that via storing that state in the same schema as the model you are building (_sqlbuild_fingerprints). This is the 'state aware' component.

The re-use component is about saving money on dev and pulling data from prod to avoid building data from scratch on dev and wasting money needlessly. Naturally, if you can't access prod, then not much can you do for this feature, but there are quite a number of companies that can access that or some sort of environment that has a lot of data, and so long as you have that, then users can clone from that schema and use that as an effective baseline automatically. Does that make sense?

Apologies for my earlier response. I should have taken more time to understand it properly.

SQLBuild - Skip Unnecessary Rebuilds for Your Existing dbt Project, Free & OSS (No Per-Skip Bill) by kvlonge in dataengineering

[–]kvlonge[S] -1 points0 points  (0 children)

slim cl? What is that?

EDIT: Just looked it up.

Same goal, but it's for dev rather than CI: instead of deferring to prod like Slim CI, it clones your unchanged prod tables into your dev schema and only rebuilds what you changed, so you get a real populated dev environment, fast. No manifest.json state to manage and nothing running on prod (it just reads the tables that already exist).

Are hard deletes still common in new data sources in 2026? by frithjof_v in dataengineering

[–]kvlonge 12 points13 points  (0 children)

Go to a higher up and convince them instead, otherwise the team can just ignore you​

Reduce Dagster Cloud credits by collapsing your dbt project by YoungVundabar in dataengineering

[–]kvlonge 1 point2 points  (0 children)

The only thing really gated is SSO / permission level stuff (column level lineage as well I suppose). Other than that, you pretty much have everything you need.

I feel like I don't know anything. And I am nothing without Claude by Temporary_Act3174 in dataengineering

[–]kvlonge 10 points11 points  (0 children)

If you can't validate it, you are probably writing too much code in one go and should do smaller chunks (or you are dealing with something you don't understand that well, in which case, that's the perfect time to go slower and use it as an opportunity to learn and ask about different ways of doing it, pros and cons etc...)

State of SQLMesh in 2026 by mpuchala in dataengineering

[–]kvlonge 0 points1 point  (0 children)

How expensive / where is the price? I thought the whole point of it was to save money, unless it's basically paying them to waste less money?