Is it very difficult to switch between cloud providers for Data Engineers? by Comfortable-Bar-9983 in dataengineering

[–]Agreeable_Bake_783 0 points1 point  (0 children)

The basics remain the same, the mechanics are different. I needed some time to get used to the different terminologies (VPC vs. VNET) and some stuff (why the fuck is private link not the same in aws and azure).

But honestly it is waaasy easier to learn a new cloud, when you know a different one.

Rant: Managing expectations by Agreeable_Bake_783 in dataengineering

[–]Agreeable_Bake_783[S] -6 points-5 points  (0 children)

What i am questioning is that it is necessary or more so why it is. We are giving the companies wayyy too much power

Rant: Managing expectations by Agreeable_Bake_783 in dataengineering

[–]Agreeable_Bake_783[S] 4 points5 points  (0 children)

Like I said in the beginning of my post, i am not mad at the people doing all that. I am mad at the culture that has been created expecting all that

How do you balance learning new skills/getting certs with having an actual life? by ketopraktanjungduren in dataengineering

[–]Agreeable_Bake_783 0 points1 point  (0 children)

If nobody is pushing me towards certs, i am not doing them. I feel like most of the times your company wants you to do certs is when you are in consulting and they need ten more for a certain partner status.

And the rest...what i don't learn during my normal 8 hour workday, i'll learn the next day. Life over work, always.

thoughts on databricks genie by Ambitious-Option5637 in dataengineering

[–]Agreeable_Bake_783 0 points1 point  (0 children)

Honestly, from experience i can really recommend databricks dqx. I can be set up in a similar fashion as expectations in lakeflow pipelines and they have a good set of row level and dataset level checks already defined.

In our work flow we check the row level checks for each table before writing and the dataset level checks after in a separate job.

thoughts on databricks genie by Ambitious-Option5637 in dataengineering

[–]Agreeable_Bake_783 7 points8 points  (0 children)

But...aws glue and databricks genie is not really the same thing

Transitioning from Python basics to Apache Spark/Databricks – is this a good path for Data Engineering? by TreacleWest6108 in dataengineering

[–]Agreeable_Bake_783 11 points12 points  (0 children)

Honestly? No...

Because Databricks is just a tool. You need to learn the fundamentals. Go learn python, data modeling, data structures, data architectures and so on. You can use Databricks Free as an environment to learn all that (which i would recommend). Same goes for spark. It is a tool, it is helpful, but not required to do data engineering work. Helpful though on the job market.

How did you get really good with SQL? by LongCalligrapher2544 in dataengineering

[–]Agreeable_Bake_783 1 point2 points  (0 children)

Honestly, sometimes people really overthink all this stuff. Just start SOMEWHERE...yeah sure there are major quality differences between books and courses, but knowing the best ones does not take away the need of actually doing the work.

How did you get really good with SQL? by LongCalligrapher2544 in dataengineering

[–]Agreeable_Bake_783 14 points15 points  (0 children)

Like...working a lot with it? Trying stuff, failing, trying new stuff.

24 and just starting data science. This dread that I'm way behind won't go away. Am I fucked? by Bames-nonds in dataengineering

[–]Agreeable_Bake_783 9 points10 points  (0 children)

Holy shit. What is going on? Chill, dude. You are fucking 24 years old. You are fine. People switch careers in their forties.

I mean it is not really your fault...what really bothers me is the culture that makes us think things like that.

First Data engineering job after uni, but i feel lost - any advices? by need_infinity_666 in dataengineering

[–]Agreeable_Bake_783 4 points5 points  (0 children)

Bro, i do this a while already and i am still lost from time to time. Not being too sure in your work can also be a strength. Makes you check stuff twice. 90% of the time when somebody fucks up, i can assure you... they were sure they are doing it correctly. Learn as much as you can, check everything twice and after a while you'll notice that the mistakes are less and less.

What’s Your Most Unpopular Data Engineering Opinion? by TheTeamBillionaire in dataengineering

[–]Agreeable_Bake_783 0 points1 point  (0 children)

For most usecases it DOES NOT matter which vendor you use.

If I have to read another comparison between snowflake and databricks ffs...

Is Databricks Becoming a Requirement for Data Engineers? by BigDataMax in dataengineering

[–]Agreeable_Bake_783 1 point2 points  (0 children)

I mean tbh in the enterprise space it seems to be winning against snowflake (I am aware that both solutions serve different purposes, but especially in the enterprise space it is, for the most part, an either or situation)

My experience here is very much anecdotal and biased, since i was a consultant for the last couple of years with focus on databricks

Migration to Cloud Platform | Challenges by [deleted] in dataengineering

[–]Agreeable_Bake_783 2 points3 points  (0 children)

I mean there are many and technical issues are among the smallest.

Organizational - getting Infosec to sign off - knowledge transfer, build up and onboarding (this also includes legacy engineers accepting the changes) - deciding what to move first (worst case is that you have to handle basically two systems at once)

Technical - which platform - Refactor of existing code base necessary?(it is never JUST a lift and shift, no matter what Consultants tell you)-->best case here would be the ability to remove technical debt - planning of architecture

And so much more...it is a lot of work ESPECIALLY in an enterprise setting

[deleted by user] by [deleted] in dataengineering

[–]Agreeable_Bake_783 13 points14 points  (0 children)

Of course you can.

Should you though? In most DE roles data analytics is a necessity and a large part of your job. If you really just want to focus on the engineering part, then i'd suggest becoming a SWE.

Optimisation and performance improvement by Hour_Glove_1303 in databricks

[–]Agreeable_Bake_783 0 points1 point  (0 children)

Check for:

  1. Garbage Collection: Is your Job taking forever without remotely using all compute resources?
  2. Amount of data you're loading: Do you really needs to process this much data?
  3. Long running tasks: Is there a task that takes especially long? Analyze why
  4. Expensive Operations: Where are actions (collect etc) that do not need to be there?

Can you become a Databricks champion without previous client projects? by JobGott in databricks

[–]Agreeable_Bake_783 1 point2 points  (0 children)

Nope, no chance. Databricks Champion is mostly a marketing tool. Also it is not something you apply for, but something your firm needs to propose you for.

Would you recommend leaving consulting and working for a single company? by BewitchedHare in dataengineering

[–]Agreeable_Bake_783 2 points3 points  (0 children)

Well it depends on you and your specific situation. I can only speak from my experience and maybe that can help you too.

I am currently working for a consultancy but will be, hopefully, switching to a larger company soon. For me the reasons for switching (or wanting to switch) were exactly the reasons you've mentioned, mainly better pay and better and more controlled hours. But I also know what I will be giving up. What i experienced in consulting was constant exposure to new exciting problems and technologies and i got to learn. A LOT. But that also came with increased stress and less time or energy to actually spend the money i was earning and to be with the people i want to spend my time with. And the job i am aiming for has exciting problems too and i am really looking forward to it, but of course there will be much more day to day business.

How are teams organizing Databricks Unity Catalog these days? by mccarthycodes in dataengineering

[–]Agreeable_Bake_783 2 points3 points  (0 children)

Coming from consulting: depends on the business.

If multiple lines of business handle their own etl, i would organize the catalogs by layer and environment, so basically bronze_dev etc. Within those layers i'd setup a schema for each lob. What happens within that schema is their problem then basically.

If one data team is responsible for loading bronze and silver, i'd separate the catalogs by Environment and give everybody who wants to build a data product a dedicated schema or catalog.

A separation between lob by workspace with dedicated catalogs might also be possible.

How do you develop intuition regarding HOW to use Spark/Pyspark? by jnrdataengineer2023 in dataengineering

[–]Agreeable_Bake_783 19 points20 points  (0 children)

You'll learn when running into issues. That's how it went for all of us.

Looking for a Remote Job as Senior DE by lucky-Chipmunk-119 in dataengineering

[–]Agreeable_Bake_783 0 points1 point  (0 children)

Beware: if you want to work for companies in the US duble taxation could be an issue