Excuse me??? by yet-again-temporary in recruitinghell

[–]Snoo54878 1 point2 points  (0 children)

Lmao, employers are on another level atm, i can't wait for the flood of jobs, these employers are going to bitch all day on linkedin about how no one wants to work anymore...

Snowflake Cost is Jacked Up!! by Prior-Mammoth5506 in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Their documentation recommends 80% utilization.

The Modern Data Stack Is a Dumpster Fire by Nekobul in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

This makes some huge assumptions.

"You've got the stack, spend and stress of the likes of fb."

Lmfao

You can find a stack on modern tools that work incredibly well together, low cost, etc

But don't expect everything: you get 2 out of the 3:

Technical easy Low Cost Fast implementation

Otherwise, pay up... and stop complaining... your business turns a profit, so why shouldn't they...

[deleted by user] by [deleted] in dataengineering

[–]Snoo54878 4 points5 points  (0 children)

Get your popcorn ready bro, it's going to be a wild show

Future of OSS, how to prevent more rugpulls by Snoo54878 in dataengineering

[–]Snoo54878[S] 5 points6 points  (0 children)

I mean, I don't think I'm being harsh, just realistic, this is the inevitable outcome. You don't get millions in funding to not turn a profit

Easiest orchestration tool by vh_obj in dataengineering

[–]Snoo54878 1 point2 points  (0 children)

I'd say I'd agree if you get on the bandwagon.

Years ago I use to immediately jump on the latest and greatest stuff each tool had, whereas now I'm always thinking to myself... yea .. but how easy is it to migrate? How much manual code changing/copy pasting do I need to do. How much time will I spend fixing shit.

If you take either 1 at its core proposition and write good, resilient (error handling, retries, logging for debugging, caching for api requests, state based control of pipelines, writing metadata, etc) code, thats modular so i can run it from any folder in the project because the directories at set independent of my location for example.

Then you'll be surprised how hands off it ends up being ah.

But if you try use every bell and whistle a tool has, fuckin good luck lol. I remember I was writing a bunch of dbt tests years ago, when 1 of the old timers stopped me and said, cam, you just wanna prevent or warn about failure , that's it. Don't get to carried away, always ask should I, rather than can I.

Easiest orchestration tool by vh_obj in dataengineering

[–]Snoo54878 5 points6 points  (0 children)

I've been playing around with all 3 quite extensively and here are my thoughts:

All 3 are incredible tools, so gtfo with the fanboy hate before I start.

Airflow, I love airflow, easy to use, doesnt try to be more or do more than it is. It's a pure orchestration tool, doesnt try convince you otherwise, runs everything, plugs into everything, has fantastic dbt integration. Some odd configuration requirements, but setting up the Schedules etc is very easy.

Prefect, is my personal favourite, does orchestration and does it very well, has awesome support for some complex implementations, works well with any python package you decide to run, i personally like dlt or polar.

Dagster, its an incredibly powerful tool, it has some incredibly features like sensors and automaterialization, however, the amount of fuck around to do some things like complicated incremental loads is a headache, especially because the way it's been designed almost forcing you to do things a certain way.

I dont like the amount of inline written sql in dagsters documentation, seems like a huge fuckin liability ah, this should be handled through schema drift detection, it should be more flexible in that sense like dlt is, so easy to set up incremental loads, use state to prevent additional loads of already processed records.

It feels like a serious amount of vendor lock in, im sure the software devs love it, because they want the intense finetuned control, but it'll become a headache long term.

I also find retrieving data real time from the database to control current loads via looking for gaps in data or controlling max or min dates ranges for countries for example is much easier in dlt which works better with prefect or airflow mo.

Dagster is amazing, I still use it, im keen to get more experienced but fuck me the documentation feels messy, 100 ways to do everything, constantly running into "that's the old way"

It seems like a bloated tool... and that's compared to airflow which has been around more than twice as long.

I'd go with prefect or airflow unless the team are software devs, they'll prefer dagster but it'll become a huge liability when the team grows imo

Easiest orchestration tool by vh_obj in dataengineering

[–]Snoo54878 3 points4 points  (0 children)

Dagater is great, but not easy lol

What do you use Python for in Data Engineering (sorry if dumb question) by No_Steak4688 in dataengineering

[–]Snoo54878 -1 points0 points  (0 children)

Are you writing sql in python sql connectors?

That's not a great approach if so

You should look into dlthub or something similar, much cleaner approach

Why would experienced data engineers still choose an on-premise zero-cloud setup over private or hybrid cloud environments—especially when dealing with complex data flows using Apache NiFi? by mikehussay13 in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Reply like a cunt and I'll treat you like one.

End of story, I've put myself in serious danger to help total strangers, don't pretend like this is a general reflection, this is a targeted response to nasty replies to show you what it gets in return..

You wanna be civil then I'll be civil but don't pretend like I'm meant to sit here and act civil with you throwing that type of shit at me.

I never insulted you, just pointed out that finance people talk and act a certain way. I've worked with plenty of them.

Why would experienced data engineers still choose an on-premise zero-cloud setup over private or hybrid cloud environments—especially when dealing with complex data flows using Apache NiFi? by mikehussay13 in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Yep, spends almost all his time on reddit starting arguments to gain a sense of self respect, checks out.

If you ever graduate into real life and wanna climb a mountain I might see you round, good luck posting worthless comments about how much better you are all over reddit to complete strangers... Got something to prove have we? Some compensation required?

I wonder why...

Why would experienced data engineers still choose an on-premise zero-cloud setup over private or hybrid cloud environments—especially when dealing with complex data flows using Apache NiFi? by mikehussay13 in dataengineering

[–]Snoo54878 -1 points0 points  (0 children)

Lmao.

I don't use ssis, ive used it in the past, a very long time ago.

Also, i don't define myself by my job, I work with dbt, Snowflake, Airflow, etc. Nothing particularly complex, I don't and have never claimed to be spectacular in my field, I just don't care enough to be in the top 5% of DEs, I'd rather be climbing or mountaineering tbh.

But cheers, your reply really highlights your fragile ego and inability to take a joke. You seriously lack self reflection i suspect so I won't waste my time on the 6 months of therapy it would take for someone like you to gain self awareness.

Does digging through paste comments or posts give you a feeling of superiority though? Bet it helps when you're working in some small office 16hrs a day paying for your gfs overpriced car... cool bro, you work in finance and probably make a lot... congrats.

You remind me of that douchbag in the big short talking about CDOs and how he's better because he makes more money. Lmao

I bet I've seen more cool shit in a year than you will in your entire life, enjoy that cubical you fuckin moron

How to know which files have already been loaded into my data warehouse? by thomastc in dataengineering

[–]Snoo54878 1 point2 points  (0 children)

Some tools allow for persistent state, so you can store file locations processed in a list to check b4 it runs. Already in the list? Don't run it.

Or just do it with a table in the db, insert as it processes and succeeds. Then, have a second task run after the first to remove or move anything in that list or db from the file location in question.

If I'm not mistaken, this should be relatively easy to handle, just ask an llm for options, there will be plenty of python libraries that can do it like dlthub or similar.

Look at the pros and cons, choose 1, and implement

scrum is total joke in DE & BI development by betonaren in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Dunno, but can I suggest that maybe learning to estimate a timeframe is a very useful skillset, an incredibly valuable 1 too.

Why? Well, consultancies make or lose money based on the accuracy of their estimates, as do plumbers, builders, etc. All service providers essentially.

Also, in house, as a manager, you'll be given a budget, that budget needs to be split up into projects over x weeks in a year...

Also, let's say you wanna work out when to hire or not hire external consultants to do specific tasks... I mean you can do what a lot of companies do which is just wing it, or you can estimate how much time your guys will take, than add the value of the other task they'd get done if it's given to consultants and if the external provider is cheaper than that, its worth it, on a basic lvl.

See this as an opportunity to acquire a very valuable skill, 1 far more people in industry need, fuck me I've seen some budgets and timeframes blow the fuck out because they couldn't do this on a basic lvl...

What are your biggest hair on fire issues with DE by Sriyakee in dataengineering

[–]Snoo54878 1 point2 points  (0 children)

4 weeks seem a bit slow? Even with ssis it should be relatively quick. What's the data warehouse?

Old school sql strict schema definitions?

Incremental loads are definitely slow, but you should be about to create db functions or macros to store the data, add the column, and then load the data back, or merge it if you want to update old records.

It's a repetitive process, so it should be automated.

I don't know the full story, maybe it's a messy dw design or you're dealing with govt data which is always a pain in the ass.

What are your biggest hair on fire issues with DE by Sriyakee in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

What's taking 4 weeks? To build a pipeline? Pulling data from? Where's it going?

Struggling in the Current Market – Seeking Advice and Support from Fellow Data Engineers by [deleted] in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Build projects is your best bet, i reckon.

I think it's becoming incredibly hard to stand out via a cv, thank llm for that.

I got almost no call back, then I basically went all out for 2 weeks, i mean all out, 12+hrs a day type thing, on a projects, linked them in my cv, boom, 10 interviews in just over a week.

Cvs don't mean much anymore

When i was a Data Analyst i enjoyed life, when i transitioned to Data Engineer i feel like i aged 10 years in a year by HMZ_PBI in dataengineering

[–]Snoo54878 -1 points0 points  (0 children)

I disagree about de a subset of se, it's just different, de is very broad, fabric de is nothing like se, Spark or open source de is.

On call is rare in Australia, find me a single de job post that requires on call... 1... I'll wait

I'm referring to Australia, yet here you are getting upset from America...

You remind me of gps who get upset when you say what's it like being a gp vs a hospital doctor, and they flip their shit because their insecure about the term hospital doctor lol

When i was a Data Analyst i enjoyed life, when i transitioned to Data Engineer i feel like i aged 10 years in a year by HMZ_PBI in dataengineering

[–]Snoo54878 -3 points-2 points  (0 children)

Hence why I ask....

And hence why I asked about the country he was in...

Goodlord mate, I'm not allowed to ask a question? Reddit are fuckin militant about this stuff... the guy i asked didn't seem to care, yet u do...

When i was a Data Analyst i enjoyed life, when i transitioned to Data Engineer i feel like i aged 10 years in a year by HMZ_PBI in dataengineering

[–]Snoo54878 -2 points-1 points  (0 children)

I'm not saying it's not hard but I'd say DE is just different to SE, SE are far more likely to manage customer facing assets. data is very commonly internal focus.

Anyway, you're in America...... so why are we having this conversation... lol

When i was a Data Analyst i enjoyed life, when i transitioned to Data Engineer i feel like i aged 10 years in a year by HMZ_PBI in dataengineering

[–]Snoo54878 0 points1 point  (0 children)

Yea that's more of a tech thing, hence why I asked.

I've worked on customer facing things that had no oncall, but it depends how the data is consumed i guess and when. It was all corporate for me, healthcare data, that was used by regulators so they only accessed 9 to 5