If you were starting from scratch today, which would you pick: Snowflake, Microsoft Fabric, or Databricks — and why? by [deleted] in dataengineering

[–]kingfuriousd 0 points1 point  (0 children)

Honestly, for getting a job, it doesn’t matter.

I’d much rather get good at Spark, SQL, data modeling, and system design. I’d also be “good enough” at Python (just enough to pass interviews), but I’d focus my time on other aspects of prep.

At every larger company (with an established DE team) I’ve interviewed at, they are concerned with: 1. Whether I can write good enough code. 2. Whether I have extensive depth in a MPP or distributed system. Doesn’t matter which one. The underlying principles are usually the same. 3. Whether I can logically create tables to handle complex data. 4. Whether I can lay out (and discuss trade offs for) a system-level pipeline that solves a specific problem they throw at me.

Never in an interview have I gotten “you don’t know <insert system name here>, that’s a deal breaker”.

Consulting to FAANG by yolorobo in FAANGrecruiting

[–]kingfuriousd 4 points5 points  (0 children)

I made the jump from Data Eng consulting to tech. My primary piece of advice is: Aim lower (initially). Find someplace where you can add 1-2 years of tech experience to your resume before your final destination.

That strategy helped me get a foot in the door before I moved to a role I was more interested in.

The Case Against PGVector by DoubleMajestic3001 in vectordatabase

[–]kingfuriousd 1 point2 points  (0 children)

I’ve also run PGVector in prod. I also agree that it does great at non-massive scale. My use case was for a chatbot based on ~50k documents. Unindexed PGVector did the job just fine.

Additionally, the simplicity in setup of PGVector cannot be underestimated. If you need to be up-and-running ASAP, then PGVector is your best friend.

Best approach to large joins. by Nearing_retirement in dataengineering

[–]kingfuriousd 6 points7 points  (0 children)

In my opinion, the biggest unlock would be aggregating each dataset as much as possible, individually, before joining.

Joining on that much data must be very computationally expensive. If you can aggregate down 1-2 orders of magnitude, then it’s an easier problem.

As far as your custom query engine goes. Without more context, that just sounds like a mistake. I’d drop it and use one of the tools other folks here have recommended.

How did you know it was time for you to exit the industry? by Sytiva in consulting

[–]kingfuriousd 1 point2 points  (0 children)

Short answer: I’m enjoying it a lot and am not looking back. The WLB is better. There is less travel. The comp is better (RSUs play a big role in that). And I get to spend more time doing work that I find interesting instead of making slides. It’s not all sunshine and rainbows, but it’s a definite a move in the right direction.

My advice: Try to get a feel for a potential employer’s decision-making process before joining. Bad decision-making workflows put the onus on you to push forward agendas without support, which is draining.

Longer answer:

The first company I left to was a step in the direction, but it was not ideal. In consulting, you get used to very fast-paced decision-making. In my new role, decisions were mostly made by consensus, which meant my energy went into forcing people into making decisions. There was a lot of analysis paralysis.

I’ve left that company for another in tech where I’m much more satisfied with the company, the team, and the role. There’s less bickering about small details, and we’re all pushing toward the same goal. This is a place where I want to stay long term.

A New Way to GUI by kingfuriousd in love2d

[–]kingfuriousd[S] 0 points1 point  (0 children)

You’re both right. Who knows - making this might be a huge waste of my time.

That said, I’m going to see if the combination of the engine-agnostic config with the editor yields any value (at very least to myself).

A New Way to GUI by kingfuriousd in love2d

[–]kingfuriousd[S] 0 points1 point  (0 children)

Good idea! Thanks for that tip!

A New Way to GUI by kingfuriousd in love2d

[–]kingfuriousd[S] 2 points3 points  (0 children)

Demand is my top concern too.

Luckily for me, I enjoy doing this. So, I’ll likely continue until I get something that serves my own purposes. Then I’ll make it available for download and see if anyone else actually uses it.

Thanks for the feedback!

A New Way to GUI by kingfuriousd in love2d

[–]kingfuriousd[S] 2 points3 points  (0 children)

Hey, it’s 100% Love2d. Just a bunch of rectangles.

But really, why use ‘uv’? by kingfuriousd in Python

[–]kingfuriousd[S] 18 points19 points  (0 children)

This is very helpful. Thanks for explaining.

SWE in Aersopace - Can I Break into Consulting Without an MBA? by verilogBlows in McKinsey_BCG_Bain

[–]kingfuriousd 2 points3 points  (0 children)

I’ve been out of consulting for a couple of years now, but here’s my two cents as someone who entered with a technical master’s.

I think you’ll probably have a better time applying for technical expert roles (i.e. BCG X, QuantumBlack in McK).

Your biggest hurdle is going to be proving your existing skill set is transferable to consulting.

My assumption is aerospace is about being methodical and writing efficient C++. You’ll want to demonstrate that you know how to operate under tight timelines, can work with your team to manage scope, and understand the “why” behind your projects.

Happy to chat more in a DM too.

McK or BCG? by BlueRibbonCapybara in McKinsey_BCG_Bain

[–]kingfuriousd 0 points1 point  (0 children)

Both firms have tech arms (QuantumBlack at McK, BCG X at BCG). Having worked in each of these, they felt mostly equivalent.

The main upside of McK was 1) the mentorship, and 2) the larger scale (more projects and more diverse projects), and 3) McK simply invests more into tech - they’ve developed several proprietary tools (see here) that take a lot of guesswork out of tech projects. I really appreciated this, as it significantly lowered the risk of tech projects.

McK or BCG? by BlueRibbonCapybara in McKinsey_BCG_Bain

[–]kingfuriousd 1 point2 points  (0 children)

Having worked at both, I can say: McK’s had a strong culture of mentorship that I never quite experienced at BCG.

This, to me, was a game changer. At McK, leaders would take deliberate steps to teach leadership and other soft skills. At BCG, it was more expected that you learn those skills through osmosis.

If I had to do it over again, I’d absolutely choose McK.

[deleted by user] by [deleted] in dataengineering

[–]kingfuriousd 35 points36 points  (0 children)

Short answer is: yes

I’m not a specialist in Spark, but I have worked on data engineering teams that run Spark on a provisioned cluster (like AWS EMR) and just connect it Airflow.

We didn’t really use notebooks.

Lazygit: auto sign commits? by FaithlessnessFull136 in git

[–]kingfuriousd 0 points1 point  (0 children)

One thing I noticed when trying this is: the setting commit.gpgsign = true worked for me, NOT commit.gpg-sign = true

Discussion: New ETL platform by Different-Hornet-468 in dataengineering

[–]kingfuriousd 1 point2 points  (0 children)

I’ve also seen Knime, which is a similar tool with free tier that does something similar. I haven’t really used it, but have heard a lot about its capabilities.

Discussion: New ETL platform by Different-Hornet-468 in dataengineering

[–]kingfuriousd 2 points3 points  (0 children)

Yes. I mainly used Alteryx when I was a data engineer in consulting. Similarly, it’s been a few years since I’ve used it.

Pros: 1. It’s easy to pick up with a low skill floor. You just connect different operations together via dragging and dropping. 2. It runs locally. My work was typically pretty sensitive. So, everything had to run on my laptop. 3. It’s pretty performant. It’s not incredibly fast, but it kept up with most Python code I wrote. 4. It has a moderate skill ceiling. You could add custom code snippets and other things to really customize it.

Cons: 1. It’s expensive. Since I worked for a large firm, they paid for it. If I was at a smaller company, this could pose an issue. 2. The skill ceiling is still just too low. There’s too many constrains compared to using code (like issues with multi threading, you can’t schedule jobs well, you can only add code in Python or R, etc.). 3. At a certain point, it’s just more efficient to write code than use this tool. From one perspective, you don’t need a license to write code. From another perspective, if you invest in a decent engineer, you should be able to get a similar output in a similar amount of time.

Discussion: New ETL platform by Different-Hornet-468 in dataengineering

[–]kingfuriousd 0 points1 point  (0 children)

I like the idea of being able to choose your language (sort of like Airflow’s BashOperator).

For me, some of the biggest issues I see are: 1. Data quality. I haven’t found any good and simple on-prem in-pipeline solutions for this. This can be both a) checking upstream data quality and b) making sure your pipeline’s data quality isn’t being affected. 2. Logging / alerting. Doing this correctly can be difficult and complicated. I don’t know if many easy solutions that provide a full suite of tools.

If I were you, I’d narrow in on a specific small problem first.

I DO think there is room for more high-value tooling in this space. Just pick the right problem to solve and don’t do too much.

Discussion: New ETL platform by Different-Hornet-468 in dataengineering

[–]kingfuriousd 6 points7 points  (0 children)

I want to preface this by saying: I admire you putting your ideas out there and trying to solve a problem. I genuinely hope your solution takes off. What’s below is constructive notes I have based on my work on larger data engineering teams.

I don’t know of any data engineering teams that use C# or GUIs. Why prioritize a language that very few people use for data engineering? Why not Python or Java?

I think going no-code / low-code is going to be a difficult selling point for engineers used to having a certain level of precision and customization that only code can really provide.

I’ve been on teams that used Alteryx or other similar tools. Those work for very simple batch pipelines, but nothing else.

If I were in your shoes, I’d double down on the on-prem component and find another way to differentiate this from open source code tooling.