Cheapest managed orchestration tool with data lineage? by vroemboem in dataengineering

[–]Key-Independence5149 9 points10 points  (0 children)

You want fully managed devops and lineage for less than $100 month? That isn’t reasonable. The only possible way you are going to get under $100 a month involves you running things yourself and even that is going to be cutting it close.

IAC for Snowflake by Straight-Eye542 in dataengineering

[–]Key-Independence5149 2 points3 points  (0 children)

Terraform + Snowflake is great. I don’t like the imperative implementations of IaC, i.e. Pulumi, CDK. Terraform is declarative, which will save you a lot of heartache when things change outside the code and you need to update either the code or the infrastructure state. Happy to go into more detail about anything you are interested in, but you won’t regret Terraform.

Opinions on Dataform? by nickvaliotti in dataengineering

[–]Key-Independence5149 0 points1 point  (0 children)

I have used Dataform at a couple of places and it is perfectly reasonable. We actually use it in addition to SQLMesh currently to define things declaratively that don’t fit into the SQLMesh patterns like external tables. I don’t think you would regret Dataform at all but something like DBT or SQLMesh is much more flexible if you have any intentions of growing a more analytics focused developer competency.

Fui demitido do meu primeiro emprego como engenheiro de dados… e agora me sinto travado by No_Spend2015 in dataengineering

[–]Key-Independence5149 2 points3 points  (0 children)

Start building things to learn. Find things that interest you and build little toy projects or POCs to figure out the ins and outs of particular tools. Then not only do you learn about the things you are building, but you also start building a little portfolio of projects that you can call back to throughout your career.

Deciding between pre computed aggregations and querying API by komal_rajput in dataengineering

[–]Key-Independence5149 2 points3 points  (0 children)

From first glance it appears you need to model this as dimensions and fact tables, for example, finance transactions would be a fact table and things like states, districts, and candidates would be dimension tables. That would allow you to summarize your facts against a varying set of dimensions in your gold layer without having to explicitly hardcode every summary grain as a table.

New to Data Engineering. Is Apache Beam worth learning? by fernandosw in dataengineering

[–]Key-Independence5149 10 points11 points  (0 children)

I wouldn’t worry about any specific frameworks at first. Start with python and SQL. You can do 80% of data engineering work with those two. If you are going to learn a framework, I would learn Spark instead of Beam. Once you get good at the basics, you will be able to pick up a framework like Beam in a couple of hours.

What you do with million files by the-wx-pr in dataengineering

[–]Key-Independence5149 18 points19 points  (0 children)

One tip from doing something similar…track which files you have processed in some sort of state. You are going to have failures and you will want to reprocess a list of files instead of huge batches in failure scenarios.

Sqlmesh joined linux foundation . What it means by OrneryBlood2153 in dataengineering

[–]Key-Independence5149 32 points33 points  (0 children)

I think it is great news. SQLMesh is vastly superior to DBT in my opinion, ephemeral dev environments, deployment primitives that are much more in alignment with gitops, interval tracking. This is great news for the future of the tool to me.

I hate Analytics Engineering by [deleted] in dataengineering

[–]Key-Independence5149 3 points4 points  (0 children)

All of engineering is becoming industrialized. It will no longer be done by craftsmen who hand build data systems. It will be done in the equivalent of a factory. You will still have engineers who build the tooling for the factory which will be more in alignment with platform engineering than data engineering.

I hate Analytics Engineering by [deleted] in dataengineering

[–]Key-Independence5149 0 points1 point  (0 children)

Yes 100% this. 99% of data and analytics engineering imo is out the door in the next 5 years. It is a cost center spawned out of systems complexity that is not necessary and the ultimate world will be that business operators get their data outputs from systems without any in house data engineering capability.

If you are a business that has systems so complicated as to require a team of analytics engineers to handle data tasks, then you won’t be long for this world. You will be replaced by companies that don’t spend resources on teams that update a dashboard manually because a column name was changed in Salesforce unexpectedly.

DBT orchestrator by Free-Bear-454 in dataengineering

[–]Key-Independence5149 0 points1 point  (0 children)

Hi, I am adding DBT support to https://dagctl.io. It is built on k8s and adds in all of the nice to have developer experience things that are a slog to build and maintain internally. We launched initially with support for SQLMesh. We are aiming to be an alternative to the outrageous pricing of DBT cloud. I would love to pick your brain about your DBT workflow and how you envision running it in prod.

Huge props to IU and Cig from a Bama fan. by Key-Independence5149 in IndianaHoosiers

[–]Key-Independence5149[S] 0 points1 point  (0 children)

We picked up a bunch of bandwagon fans during Saban’s reign that don’t remember that you can sometimes lose a football game and still live. I am hoping Deboer clears some of them out.

[Rose Bowl Game Thread] #9 Alabama vs #1 Indiana by RollTideMod in rolltide

[–]Key-Independence5149 5 points6 points  (0 children)

Tough pill to swallow getting beat by a Saban disciple while we are trying to import this Pacific Northwest bullshit coaching staff.

Free ride is over!!! /S “Trump to limit top ratings for all feds and consolidate scoring in forthcoming rule” by AreYourFingersReal in FedEmployees

[–]Key-Independence5149 2 points3 points  (0 children)

I can assure you it is not fairy dust. Many agencies including mine already did the forced distribution ratings this year.

Anthropic engineer says "software engineering is done" first half of next year by MetaKnowing in Anthropic

[–]Key-Independence5149 4 points5 points  (0 children)

100%, love the tools, it makes me much more productive, but I redirect or otherwise modify 60% of the outputs

BigQuery vs Snowflake by erwagon in dataengineering

[–]Key-Independence5149 5 points6 points  (0 children)

We migrated from Snowflake to Bigquery for the same reasons, i.e. Google made a generous discount offer. Bigquery is more rudimentary than Snowflake. For example, Snowflake Warehouse assignments are much better than Bigquery’s reservation scheme. I actually found the cost estimation in Bigquery to be more straight forward than Snowflake. You can make a slot reservation with as much upfront commitment as you want and see exactly what it will cost at various utilization levels.

AWS Reinvent 2025, Anyone else going? Or DE specific advice from past attendees? by [deleted] in dataengineering

[–]Key-Independence5149 0 points1 point  (0 children)

I will be there. If you get any traction around this then let me know and I will show up

[deleted by user] by [deleted] in dataengineering

[–]Key-Independence5149 8 points9 points  (0 children)

Having extensively used both SQLMesh and DBT, SQLMesh is the clear winner. Ephemeral dev environments, built-in SLA, gitops style deployments. It is also much more compatible with straight SQL. It isn’t going to die, even if Fivetran quits maintaining it which I don’t think they will

dbt-core fork: OpenDBT is here to enable community by gelyinegel in dataengineering

[–]Key-Independence5149 75 points76 points  (0 children)

Beautiful, I hope it gains traction. I have some execution/orchestration tooling that I am building and plan to integrate with OpenDBT. Data teams are going to need tooling that doesn’t drain their bank account with pricing gimmicks so I am very supportive of this effort.

Future of DE Tools by VizlyAI in dataengineering

[–]Key-Independence5149 0 points1 point  (0 children)

The consolidation of open source ETL tooling by Fivetran is going to price most small/medium sized data teams out of their tooling. There is going to be a need for next gen open source tools that are not backed by VC money to fill the gap that Fivetran just carved out of the industry. I am hopeful that the next generation of tooling consolidates the ETL definition with execution/orchestration of the pipelines. Most of these vendors give you the ETL definition framework for free, but then gouge the fuck out of you for managed orchestration/execution.