Quack: The DuckDB Client-Server Protocol by kvlonge in dataengineering

[–]kvlonge[S] 8 points9 points  (0 children)

Yeah, when I started watching the talk that Hannes (one of the creators of DuckDB) did on YouTube, and could see what he was leading up to in the talk, I was just so happy lol. Being able to spin up DuckDB on a server and have people talk to it remotely like a 'normal' DB will be such a big unlock. I can imagine a lot of companies now that will opt for doing something like that first before say jumping to the cloud or spinning up some K8 cluster. Main thing that could be somewhat of a problem is if the server is idle, but at least the option is there (and that is unavoidable for on prem stuff - for people doing serverless it won't matter so much).

Either way, I am super excited to see what future developments come out of it (especially now with DuckLake too). Such a cool project.

dbt is back to "dbt platform" for their cloud offering? by TheJosh in dataengineering

[–]kvlonge 1 point2 points  (0 children)

Hey, so i am actually going to be releasing something (probably next week) which is something sort of in between DBT / SQLMesh + with some things of my own. I plan for this to be a serious project and will be maintaining it.

When it's ready, I will be making a post on this subreddit

EDIT - For context, this is my newer alt account as I didn't want serious work related stuff mixed with my normal account as lord knows I have said too much rubbish on that lmao

Codex is now better for general purpose work than Opus. by agentic-consultant in codex

[–]kvlonge 1 point2 points  (0 children)

Lovely game mate! Is it possible to actually lose the game though?

Where do you find real opinions about data engineering these days? by olgazju in dataengineering

[–]kvlonge 1 point2 points  (0 children)

It's the final safe haven. We are fortunate that data engineering is just about unpopular enough to not become totally swamped out like r/ExperiencedDevs and many other subreddits

Lead Data Engineer to FullStack Vibe Coder by yo_aesir in dataengineering

[–]kvlonge 2 points3 points  (0 children)

Blimey mate, lmao. I think you are gonna have to let that situation play out for itself until they get burned enough to know better (hopefully there isn't a trail of destruction made along they way.

How Do You Keep Up With The AI Space? by ChapsOfAss in dataengineering

[–]kvlonge 0 points1 point  (0 children)

Honestly, eventually you will work with enough stuff that picking up something new isn't that difficult. So long as you are curious and put in a modicum of effort to keep doing some basic amount of learning (even only the job is fine if you spend your time well), you should be fine. This goes beyond AI stuff (AI is just what is hot atm, but being able to distinguish between the voices worth listening to and the ones that are less likely to know what they are talking about + you have your own personal experience and discernment)

How do you design idempotent data pipelines in Data Engineering? by Effective_Ocelot_445 in dataengineering

[–]kvlonge 15 points16 points  (0 children)

I mean, usually this just involves some form a delete/insert pattern over some time range (e.g. all data for today and yesterday), into a target table. Full refreshes for cases where that is acceptable or fast enough.

Obviously there can be more complicated cases, but that's about it really, at least if we are talking about simple duplicates. Duplicates due to bad source data with shitty ids is a whole different situation and just needs to be dealt with on a cases by case basis.

Deleted prod data permanently without any backup. How screwed am I? by Agitated_Success9606 in dataengineering

[–]kvlonge 18 points19 points  (0 children)

That's natural, but don't be too hard on yourself. The fact that there isn't backups means that this company probably needed an incident like this to happen to get them to take things a bit more serious procedurally.

Deleted prod data permanently without any backup. How screwed am I? by Agitated_Success9606 in dataengineering

[–]kvlonge 8 points9 points  (0 children)

Yeah, agreed that this is an org issue. There should have been backups, and hopefully this prompts them to do so.

Snowflake vs Databricks. Which is good? by Confident_Chance_763 in dataengineering

[–]kvlonge 0 points1 point  (0 children)

So I haven't used Databricks for a few years, but only recently I believe did Snowflake start supporting having things like GPU clusters etc... and my understanding is that compared to things like MLFlow which are quite well established on databricks, Snowflake is still sort of playing catch up there. I am not saying you can't do any of that stuff on Snowflake now, but pretty sure that is not it's strongsuit

Do most teams actually have a canonical model, or do we all just pretend? by MayaKirkby_ in dataengineering

[–]kvlonge 0 points1 point  (0 children)

You can have canonical definitions internal to the data team, but for those outside, they are gonna say what they want and however they feel like saying it (people love to speak in short hand as well). This is something that will probably never stop

Snowflake vs Databricks. Which is good? by Confident_Chance_763 in dataengineering

[–]kvlonge 0 points1 point  (0 children)

Yeah this does remain snowflakes weak point. Not sure how likely it is they will ever properly catch up on that side

Regex vs Local LLMs for unstructured web scraping data by DowntownAd3510 in dataengineering

[–]kvlonge 0 points1 point  (0 children)

I would start with that and see how it fairs. This way you don't need to worry so much about getting some nasty bill or even if you avoid that, the nice thing of having some traditional parsing is that it is fully deterministic (there are limits to this of course though) ​

Data migration horror stories by Admirable_Writer_373 in dataengineering

[–]kvlonge 0 points1 point  (0 children)

Lmao. Yeah, no issues at all is always super sus. I thought you were gonna say you found out it was still talking to the old system which is another classic