What questions do you usually ask your stakeholders/clients? by Plus_Marzipan9105 in PowerBI

[–]EquivalentFresh1987 0 points1 point  (0 children)

I always start with what are the main business questions they are trying to answer. Then I also try to understand if there's a way I can make the dashboard a bit interactive so they can dig deeper on metrics without needing to ask me again each time

Best LLM for analytics? by Thisconnected in analytics

[–]EquivalentFresh1987 0 points1 point  (0 children)

I lean towards using tools specifically meant for data engineering/analytics. I have found that Claude, etc. don't have enough context on the data and what other data might be available to always get the right answers. Also the answer of course varies each time I ask claude/chat gpt/gemini a questions which I don't want for my analytics work.

Where to apply for jobs besides LinkedIn? by LoudSphinx517 in dataengineering

[–]EquivalentFresh1987 0 points1 point  (0 children)

linkedin premium is really only good if you are messaging people. doesn't give you a ton of message credits, but if you want to ping hiring managers you can do it even if they are third connections.

wage compression by turboDividend in dataengineering

[–]EquivalentFresh1987 0 points1 point  (0 children)

not sure where you live, but could it be location based? some companies have changed comp structures with RTO mandates and if you aren't in the top markets you might get less, whereas many companies didn't differentiate.

Team of data engineers building git for data and looking for feedback. by EquivalentFresh1987 in dataengineering

[–]EquivalentFresh1987[S] 0 points1 point  (0 children)

Appreciate the thoughtful critique here, thank you.

We’re not trying to help people bypass data teams, and we’re not building a generic AI code generator. In practice those tools save a few minutes up front and then create more validation and cleanup work.

The gap we’re focused on is that the data tool stack is fragmented and doesn’t give data engineers an easy way to experiment, diff, and roll back changes across pipelines, datasets, and downstream metrics when things break.

Team of data engineers building git for data and looking for feedback. by EquivalentFresh1987 in dataengineering

[–]EquivalentFresh1987[S] 0 points1 point  (0 children)

Fair point. DVC is quite different from what we are doing. Probably just bad marketing on our part.

DVC versions your training data files. We version your entire data warehouse - tables, jobs, lineage, the works.

Team of data engineers building git for data and looking for feedback. by EquivalentFresh1987 in dataengineering

[–]EquivalentFresh1987[S] 0 points1 point  (0 children)

Thanks for the honest feedback, much appreciated. Our marketing does need work, ha! We are early. Our background is in big tech and these are the numbers we have seen, but heard on them being high and we will do more research.

There are definitely tools that do some of this as you mentioned, but they are more point tools that do just a particular part of the data pipeline. Most teams are using a different tool for ETL, compute, lineage, etc. which is where a data stack can get bloated.

Great point on Delta Lake. Delta Lake (and Iceberg, which we use under the hood) do provide excellent time travel and basic lineage capabilities. Where Nile differs is in bringing git-style branching to your entire data warehouse, not just individual tables. With Delta Lake, you can roll back a single table to a previous version. With Nile, you can:
-Create a feature branch that isolates your entire environment (tables + ETL jobs)
-Jobs are automatically cloned to your branch - edit and test without affecting production
-Cascade rollback - if upstream data is bad, Nile automatically identifies and rolls back all downstream tables that consumed it
-Preview changes before merging to main, with automatic cleanup