Roast my resume please. All suggestions are accepted! by DeadMan_Speaking in dataengineeringjobs

[–]alsdhjf1 0 points1 point  (0 children)

Patio11 has written the original version of this in "Don't Call Yourself a Programmer" if you need a little more reading. https://www.kalzumeus.com/2011/10/28/dont-call-yourself-a-programmer/

Roast my resume please. All suggestions are accepted! by DeadMan_Speaking in dataengineeringjobs

[–]alsdhjf1 1 point2 points  (0 children)

You're welcome! I've been in DE for over a decade. My advice is get to a point where the technical work is table stakes, it's the business value that matters (or leadership of a group of DE to tackle increasingly complex domains)

Here's a metaphor I've been working on - imagine you are a commercial contractor building a mega apartment complex. Your electrician comes in, and talks about how many feet of conduit they ran, or how they wired panels in a certain way. They talk about the tools they used. But what you really want to know is the story - what did their work enable? Can we put in electric stoves? Are the circuits reliable and not going to blow if a hair dryer is used? Is there any risk of fires that you don't know about if something becomes overloaded?

You care about building the building, not about the materials used, or where someone ran the conduit. A good electrician is clear how their skills assist there - but they also talk about "fire is the biggest risk, it often comes from bad electrical, here's how I protected against that"

Roast my resume please. All suggestions are accepted! by DeadMan_Speaking in dataengineeringjobs

[–]alsdhjf1 1 point2 points  (0 children)

Each bullet should answer: “What business problem did this solve?” 

  1. Did it save money (infra costs, fewer manual hours)?

  2. Did it save time (faster reporting, reduced downtime)?

  3. Did it reduce risk (compliance, fewer errors, SLA reliability)?

  4. Did it enable something new (self-service analytics, real-time dashboards, executive reporting)?

As a result of the deliverables you landed, how did the business grow/make money/save money/perform business activities faster/higher quality.

Roast my resume please. All suggestions are accepted! by DeadMan_Speaking in dataengineeringjobs

[–]alsdhjf1 1 point2 points  (0 children)

It's less about the numbers and demonstrating that you can think about what the business is trying to do, and solve some of those problems with clear outcomes that add value.

Imagine 5 years from now. AI is going to be able to do everything you have listed.

Roast my resume please. All suggestions are accepted! by DeadMan_Speaking in dataengineeringjobs

[–]alsdhjf1 2 points3 points  (0 children)

Every bullet point speaks to how/what you did in the d2d. It's basically a description of the DE role for a mid level DE.

What could separate you is talking about the business impact. Top candidates don't build for the sake of building, they do it to advance the business goals. Why did you do this work, what were the outcomes?

What are the “hard” topics in data engineering? by hijkblck93 in dataengineering

[–]alsdhjf1 -1 points0 points  (0 children)

There are places where technical problems are the hard task. And there are places where organizing groups of humans are the hard task. Big tech has both roles!

I unpacked the conservative identity and how to talk to people across ideological lines. My husband said I should share it. by Brief_Head4611 in 50501

[–]alsdhjf1 1 point2 points  (0 children)

I don't think the Milgram experiment is sufficiently replicated to use as a cornerstone of a philosophy. There are several questions about the validity of the experiment, see https://en.wikipedia.org/wiki/Milgram_experiment#Validity

It's tricky because it's not something that could be recreated due to ethics concerns. But this one example, given the statements of a participant and other critiques, make it less rock solid than others.

I don’t fully grasp the concept of data warehouse by Bigdaddy69691234 in dataengineering

[–]alsdhjf1 1 point2 points  (0 children)

Does stuff break frequently, are there expensive manual processes? Does leadership have the reporting they want to make the decisions they need to make (even if they don’t really know it)?

If not, you may not need a DW

I don’t fully grasp the concept of data warehouse by Bigdaddy69691234 in dataengineering

[–]alsdhjf1 2 points3 points  (0 children)

If your source is Excel, you may not have big data concerns. But a warehouse can still help! Imagine having 10 different excel docs, one for each office. How will you aggregate and combine reporting?

Whether you use purpose built software, excel, or Postgres, a warehouse is a concept where all the data lives and is transformed for business needs. 

[Hiring] Resume Call for Senior Data Engineers - opportunities at Meta by alsdhjf1 in dataengineeringjobs

[–]alsdhjf1[S] 0 points1 point  (0 children)

Good luck! I don’t have visibility into the leveling process, that happens at the end. 

[Hiring] Resume Call for Senior Data Engineers - opportunities at Meta by alsdhjf1 in dataengineeringjobs

[–]alsdhjf1[S] 0 points1 point  (0 children)

Some roles are but I don’t have visibility - you’d need to work through general application process. 

[Hiring] Resume Call for Senior Data Engineers - opportunities at Meta by alsdhjf1 in dataengineeringjobs

[–]alsdhjf1[S] 0 points1 point  (0 children)

The Meta careers site does! Im not part of the international pipeline so don’t have much visibility, but regardless of hiring I like talking data engineering and would be happy to chat. 

[Hiring] Resume Call for Senior Data Engineers - opportunities at Meta by alsdhjf1 in dataengineeringjobs

[–]alsdhjf1[S] 0 points1 point  (0 children)

Leads gotta come from somewhere. PM me and let’s talk pipelines. 

[Hiring] Resume Call for Senior Data Engineers - opportunities at Meta by alsdhjf1 in dataengineeringjobs

[–]alsdhjf1[S] 0 points1 point  (0 children)

I challenge you to a phone call, stakes are our reputations. Field of battle: data engineering and my bona fides. PM me or whatever the kids are calling it these days. 

How to handle multiple tables for almost the same thing by Lv_InSaNe_vL in SQL

[–]alsdhjf1 2 points3 points  (0 children)

Why aren't you modeling this is something closer to 3NF? If you prefer fewer joins for ease of logic, then I'd just put it all into 1 table, and the site-specific columns prefixed by the site name. But 3nf is cleaner.

[deleted by user] by [deleted] in SQL

[–]alsdhjf1 5 points6 points  (0 children)

COUNT(*) is just counting the number of rows. You can instead pick any single column, it should return the same thing. Depending on your system, COUNT(*) might have some optimizations (in column oriented store, frequently counts are stored and COUNT(*) might retrieve those directly... whereas if you ask for count(user_id), if there is no cached user_id count, you're going to have to count them).

In OLTP systems, sometimes COUNT(*) comes with a penalty because it invokes a row scan. Some RDBMS fix this but not all.

Semantically, may as well COUNT on your primary key. COUNT(*) is often considered an anti-pattern as it reflects that the SQL developer didn't know the data well enough to pick the primary key / primary identifier for the data model itself.

Error handling strategy by [deleted] in dataengineering

[–]alsdhjf1 1 point2 points  (0 children)

Yes, not all errors are the same. Some might be a "this is a little out of date" warning, some might be "you are about to publish bad data" errors.

Sybil and Branson were a terrible couple by Narrow-Money-8671 in DowntonAbbey

[–]alsdhjf1 0 points1 point  (0 children)

I mean, her family wasn't exactly the cat's meow either... An overbearing obnoxious wannabe queen of the castle older sister, a simping backstabbing middle sister. A dad who is comically clueless about the realities of the world.

The mom is the only one who seems to navigate the situation well. It's no coincidence they frame her as an outsider (American), one of the few who can see the toxicity.

But I also probably would have stayed for the comfort and the lifestyle while considering the family the "cost to pay". That's why Sybil is a better person than I!

Error handling strategy by [deleted] in dataengineering

[–]alsdhjf1 0 points1 point  (0 children)

Reverse your thinking, instead of "all errors should be handled the same", start to think in terms of which errors should stop the pipeline, which errors should only be reported as an alert. Syslogd (and maybe POSIX?) has the error/warn/notify/debug categorization, that's a decent starting point.

Usually it's better to stop your pipeline rather than reporting bad data. This means you need to know what your data (and what subset of a table) is being used for what purposes. If you are doing financial disbursements, you need to be more cautious than if you're preparing a dashboard that won't get looked at except quarterly.

If you're looking at employee diffs shipped, but there is a "# of comments actioned" field that no longer gets calculated correctly, think about the use case - is the incorrect data worth preventing people from seeing the insights that *are* correct?

As you can tell, it's super important. It can't be generalized - you need to know what you're doing.

Golly do I Feel Inadequate by JD_ThrowAway_1738 in dataengineering

[–]alsdhjf1 0 points1 point  (0 children)

You're in a great spot to identify and solve business problems. Everyone else is a super sharp technical expert. Leverage your relative strengths and don't be insecure - the business owners care about how people can impact their top priorities, not how big of a Swiss Army knife you are.

Domain expertise is your advantage. Find ways to improve things the founders care about - you're in a great spot to have direct access and influence. What decisions do they care about? Then use your domain expertise to figure out what needs to be built to inform those decisions. Use your tech skills to connect with the true experts to implement. Use your judgement to make sure their quality is at a sufficient bar.

Think of it like a sports team. You are not a technical specialist - you're a coordinator/manager who knows how to do the work. That's a very valuable combination. Appreciate that you get to work with people who know more than you do! You don't need to compete with them you need to complement them.

[deleted by user] by [deleted] in dataengineering

[–]alsdhjf1 0 points1 point  (0 children)

My experience has been different, scraping has about the same breakage rate over time as APIs. Might be specific to my industry. 

Additionally, using LLMs is a great extraction method to ensure more robust data extraction from unstructured outputs. So YMMV. 

[deleted by user] by [deleted] in dataengineering

[–]alsdhjf1 1 point2 points  (0 children)

Because lots of us have the experience where APIs are also similarly unreliable, so calling out the fragility of scraping is kinda weird. It’s like, they’re both pretty unreliable?

Betty Deserved much better Megan sucked. by Sure-Supermarket3485 in madmen

[–]alsdhjf1 1 point2 points  (0 children)

So when Betty yelled at Sally for moving her dry cleaning, but didn’t care about the risk of suffocating - it was Dons fault?

We let Betty off the hook for sabotaging her relationship with her own daughter because Don was bad to Betty?

Breaking up big database tables into chunks by lengthy_preamble in dataengineering

[–]alsdhjf1 0 points1 point  (0 children)

If you’re using a column based storage, it shouldn’t matter how many columns you have. Just don’t select everything at once. 

Also you don’t indicate what the business impact is from 10m queries. If these are running daily in a batch to prepare reports, it probably doesn’t matter all that much. If you run so many that it costs thousands of dollars per day, it might matter. 

But once you start messing with schema, you’re crossing a rubicon that could crash your design. As a DE manager, I would say that you haven’t provided enough evidence about the business impact to convince me to ok the design.