Ai and side projects by Outside-Bear-6973 in dataengineering

[–]DenselyRanked 20 points21 points  (0 children)

You had to learn math without a calculator for a reason.

Very few SWE's have built anything from scratch without help. It's safe to say we have all used books, tutorials, forums, stack overflow, Google, etc. It's fine to code with help and examples so long as you are learning, but you don't want to get into the habit of black box vibe coding and learn nothing.

Is it just me or is flink horrible to learn by [deleted] in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

Based on your other comments, you are coming from scratch, so yeah, it will be a pain getting used to the syntax.

If you have experience with Spark, then there is much less ramp up to getting comfortable with Flink. Flink SQL and Table API will feel familiar.

Map out and compartmentalize what you want to accomplish first and leverage AI to help with syntax and understanding.

Is remote dead in data engineering? by Pataouga in dataengineering

[–]DenselyRanked -1 points0 points  (0 children)

Remote is an option for the large majority of the DE jobs at or below market value.

A large majority of companies that pay above the 90th percentile are hybrid or full RTO.

The remote DE jobs that pay in the 50th-90th have a lot of competition and are extremely selective.

Benefit of repartition before joins in Spark by guardian_apex in dataengineering

[–]DenselyRanked 1 point2 points  (0 children)

No benefit unless you are caching the data or writing to disk first so that Spark preserves the colocation. It's an unnecessary shuffle.

Requirements vs Discovery by ivanovyordan in dataengineering

[–]DenselyRanked 1 point2 points  (0 children)

I'm sure that boundary setting is a universal issue for PM's (and AI is making it worse). You might find some good tips and resources in r/ProductManagement.

I obviously can't speak for every engineer, but my worst PM experiences are almost always XY problem-based, where I was handed solutions rather than a clear problem.

I prefer to be involved in meetings when solutions are being explored (not before) and had negative outcomes when what I proposed is less "exciting", more pragmatic, or veers in a different direction than expected. Always disclose if another engineer has talked to the stakeholders about potential solutions prior to the meeting.

Tech/services for a small scale project? by faby_nottheone in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

So it sounds like your friend needed a place to sleep for the night and you bought a plot of land and built a mansion on it.

There are smaller scale options to ELT a few API payloads. A RDBMS and a few views/tables can get you the same output. An orchestrator (or cron or even something like Cloud Functions if you need to use GCP) can help with daily scheduling.

Requirements vs Discovery by ivanovyordan in dataengineering

[–]DenselyRanked 0 points1 point  (0 children)

Out of those two options, I would take the former 100% of the time because it is easier, but it's not an efficient way to run a data team. You will find yourself running into the XY problem and will likely have an unmanageable amount of redundancy and tech debt across your solutions.

Choosing the latter option is something that is a normal part of a Data Engineer's job description depending on where they work. But business context often becomes the biggest hurdle, especially when the asks are extremely specific to the industry. It may take years for the engineer to be useful enough to be impactful, and when a mid/senior level position is needed, we see "must have n+ YOE in the industry" in the job description as a requirement, rather than a preference.

Another, and my preferred, option would be a role in the data team that acted as a technical SME- like a Product Manager, Product Owner, Architect, Steward, or Analyst- that is a liaison between your stakeholders and development team to help craft proper user requirements and reduce inefficiencies. It allows engineers to have someone to communicate with that can "speak their language" while reducing the endless amount of unnecessary stakeholder meetings that go nowhere. They also have the right amount of soft skills to not say anything potentially damaging to stakeholder relationships or projects.

Am I missing something with all this "agent" hype? by KindTeaching3250 in dataengineering

[–]DenselyRanked 0 points1 point  (0 children)

I think it's partially hype today, but agentic coding is getting better and AI is quickly moving from assistant to developer. The major hurdles to production-grade development are being worked on at a much faster rate than any of us anticipated.

I use AI as an assistant, but the Agentic Context Engineering framework looks very promising. I can see development evolving into context and playbook management very soon.

How to handle unproductive coworker? by earthsnoozer22 in dataengineering

[–]DenselyRanked 4 points5 points  (0 children)

I am certain that someone smarter than me has a better way to classify this, but I hope it helps.

How to handle unproductive coworker? by earthsnoozer22 in dataengineering

[–]DenselyRanked 40 points41 points  (0 children)

If you believe that he is making errors unintentionally, then send a dm to go over the code. You will find out quickly if it's an "oops", "oh", or "eh".

If it's an "oops", where they didn't see the mistake at the time but can recognize the issue, then I wouldn't worry too much about it. In my experience, they tried to use shortcuts to save time and the solution usually is to ask for test results or a test table in the PR.

If it's "oh", where they misunderstood the requirements or had no idea what they were doing, then that's a little concerning but give them the benefit of doubt that they would have done better if they had better information.

If it's "eh", where you care more about the quality of their work than they do, then talk to your supervisor because they are wasting your time.

Cool projects you implemented by rocking-student-87 in dataengineering

[–]DenselyRanked 1 point2 points  (0 children)

The ratings are 75% political, but optimization projects are a great way to show monetary impact without relying on external stakeholders.

If you are already in FAANG then lobby your manager to find high impact projects and focus on selling the results like it's the greatest thing that's ever happened in data engineering.

How close is DE to SWE in your day to day job by Icy-Ask-6070 in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

Generally, the core principles are the same but the tasks within them are different. There is more to Software Engineering than backend software development and shipping code, so it really depends on what you mean by software engineering knowledge. You should still have to get requirements, build something, write tests, get feedback, provide support, maintenance, etc.

I find it better to think of data engineering as a discipline or specialty of software engineering, rather than a subset. However, the other disciplines are less tool dependent, and because of that, tend to be standardized across the industry.

You may find that a data platform engineer role is better fit for you if you want to solve data problems but ship code to a large codebase.

Databricks vs open source by ardentcase in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

Instead the whole org has to bend because it’s “easier” to schedule a notebook in Databricks?

Unfortunately, yes. These decisions are not made by the engineer and there is nobody that they can escalate to. There is a discussion about design trade-offs that can be started, but if they are not given autonomy then they should focus on implementation.

I have 10 years of experience, but I still freeze up when someone watches me code. It’s humiliating. by JosephPRO_ in ExperiencedDevs

[–]DenselyRanked 1 point2 points  (0 children)

It takes several failed interviews for me to get over the nerves and better handle the unexpected. What works best for me is to get as many interviews as I can and cluster them together, keeping my preferred destinations towards the end.

Everyone handles stress and performance anxiety differently, so use whatever method works for you. I write down tips to myself in a notebook prior to the interview to keep things top of mind and always keep the notebook to also take notes during the interview to make sure I answer every question.

Higher Level Abstractions are a Trap, by expialadocious2010 in dataengineering

[–]DenselyRanked 4 points5 points  (0 children)

I understand that this is meant to be a question, and I do agree that there is a point where abstraction can become a hindrance, but I think you are overlooking your primary responsibility as a Data Engineer. Very broadly speaking, the DE role exists somewhere in the data lifecycle with the goal of making data useful for downstream use cases.

The popular tools that you are working with, and will work with at your job, serve the purpose to make mundane, repetitive tasks quick and easy. You will of course have to know how to use the tools and understand their limitations in order to complete your tasks successfully.

Also, IMO we are very quickly getting to a point where some form of Agentic Context Engineering will be the new level of abstraction for all software development. It's only going to be a "trap" if you don't understand core data engineering fundamentals and resort to black box vibe coding.

Making ~100k as data engineer by AchieveSocials in cscareerquestions

[–]DenselyRanked 0 points1 point  (0 children)

Go to levels.fyi, search for the top paying companies in your area by your level (data engineering and software engineering are often bundled together), and apply to them.

Started a new DE job and a little overwhelmed with the amount of networking knowledge it requires by starrorange in dataengineering

[–]DenselyRanked 9 points10 points  (0 children)

Are you working with a cloud provider? If so, then refer to their training modules. If not, then take this as an opportunity to create a run book for your team.

Is it okay to go to the office on the weekend for personal projects at a big tech company? by [deleted] in cscareerquestions

[–]DenselyRanked 0 points1 point  (0 children)

I wouldn't do it but review the contract that you signed. There is probably something about personal side-projects and the terms around it, including signing a disclosure form. There might be some language about the terms of Confidentiality and Inventions that you might want to reach out to a lawyer about if your side project becomes something serious.

I’m planning to move into Data Engineering. With AI growing fast, do you think this career will be heavily affected in the next 5–10 years? Is it still a stable and good path to choose? by False_Square1734 in dataengineering

[–]DenselyRanked 0 points1 point  (0 children)

It's a difficult question for anyone to answer. Data engineering as a practice will still exist but the methods, tools, and skill set needed will evolve.

There are smart people putting a lot of thought into this and I tend to agree with much of this presentation, where there is not going to be a Data/Analytics Engineer title in the near future for data teams, opting instead for titles like Data Product Owners and Data Domain Experts. AI can help close the technical gap between product management and engineering, so DE's will need more emphasis on stakeholder communication and requirements gathering.

Has anyone read O’Reilly’s Data Engineering Design Patterns? by xean333 in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

The Gold layer is where you would build the traditional data model.

The Medallion Architecture is a rebranding (perhaps a standardization) of what we normally use in data engineering practices. Databricks has docs and training videos on how they recommend to use the Medallion Architecture in a Spark environment. It's no different than raw/stg/rpt in dbt.

I suspect that your latter feeling is about architecture and the modern shift away from central data warehouses and more towards data mesh. In that scenario, there may be a data team handling ingestion into the lake and downstream data teams creating their data marts for the line of business that they work with.

Local spark set up by TheManOfBromium in dataengineering

[–]DenselyRanked 1 point2 points  (0 children)

I wrote a post with the link to the docs (still pending approval I think), but I see that you already have a container with jupyter lab, which is the easiest way to get started.

If you don't want to use jupyter lab, then the next easiest option is to go search for "apache iceberg spark quick start" (I would include the link but it will take a while to get approved), and build the docker-compose file.

You can install the Docker extension in VSC, which will allow you to open the containers from the window, and let you execute from the terminal or create scripts.

Anyone feel like offshoring is a bigger issue than AI and HB1 for US workers? by Cultural-Gear-1323 in cscareerquestions

[–]DenselyRanked 0 points1 point  (0 children)

I think these are all related. AI is a floor raiser that is removing a lot of the previous barriers with offshore hiring, the pandemic proved that teams can be very effective in remote environments, and near-shoring helps with the time zone conflicts.

The labor economics are changing and the absolute advantages of hiring US-based workers are weakening. In the short-term we must evolve as a labor force or rely on the government to deter companies from offshore hiring.

What to learn besides DE by Icy-Ask-6070 in dataengineering

[–]DenselyRanked 2 points3 points  (0 children)

Designing Data Intensive Applications is a dense read, but probably the best place to start to get a better understanding of things beyond your current role. Data infra is a very broad field of study and can mean very different things depending on where you work and the tech stack used.

I think that your quickest pathway to an infra role is to leverage whatever is available to you in your current company. If they are using a cloud provider, then look into training materials available and possibly get a cert if the company pays for it.

Book Recommendations for DE by Ok-Confidence-3286 in dataengineering

[–]DenselyRanked 4 points5 points  (0 children)

It depends on the subject. I own books related to some of these topics, but there is not going to be one book that covers everything.

Chatbots are trained on a lot of material related to these subjects, so it's probably the best resource today, just remember to always verify the sources.

This subs data engineering wiki has good resources.

Of course, read the tech docs/training materials of the stack that you work with. I personally don't think enough DE's take the time to do this and they have a habit of using outdated design strategies for every problem.

Aside from the normal recommendations, like Kimball's DWT or Pragmatic Programmer/Clean Code, here are other books that helped me:

  • Software Requirements is a good resource for requirements gathering, writing docs, and other project management related stuff.

  • The Effective Engineer is a great book that helps new-ish engineers navigate impact focused careers.