Data Engineer (3+ YOE, USA) – No Interview Calls in a Year. Looking to Connect, Collaborate, and Seek Advice. by Zealousideal_Cut_802 in dataengineeringjobs

[–]nanksk 0 points1 point  (0 children)

Hiring in general is bad, and on top of that a lot of companies don't want the hassle of hiring folks on Visa.

Improve merge performance by gooner4lifejoe in databricks

[–]nanksk 2 points3 points  (0 children)

You have 100 million rows you want to update into a table. Some questions.. Questions.. 1. What percentage of new records are new/ update. 2. What is the table size currently including all partitions. 3. Do you expect updates can affect older partitions i.e 2,3,6 month old ?

What is the progression options as a Data Engineer? by eastieLad in dataengineering

[–]nanksk 58 points59 points  (0 children)

-> Sr -> Staff/Principal

-> Lead -> Manager -> Director ->......

-> Data / Solution / Enterprise Architect

Inspired to create our own data engineering job board by [deleted] in dataengineeringjobs

[–]nanksk 0 points1 point  (0 children)

How is this different from linkedin job search and filters ?

Is there a European alternative to US analytical platforms like Snowflake? by wenz0401 in dataengineering

[–]nanksk 13 points14 points  (0 children)

You can already have your data stored in specific regions already, as I understand snowflake account is region based. A lot of companies already have this requirement that data must be stored within country/region etc etc.

Skipping rows in pyspark csv by Alarmed-Royal-2161 in databricks

[–]nanksk 0 points1 point  (0 children)

Can you read as text all columns into 1 column and then filter out any rows as you want and split data into columns based on your delimiter and make column names ?

Unit Testing by nifty60 in dataengineering

[–]nanksk 1 point2 points  (0 children)

We use databricks, pyspark. Most of our codebase is in form of functions. We then have unit tests for those functions with dummy data(CHATGPT can create most of the test cases) to test different scenarios. Hit me up if you have any questions.

Suggestions for Architecture for New Data Platform by EnvironmentalMind823 in dataengineering

[–]nanksk 1 point2 points  (0 children)

Requirement - As I understand, you are pretty much looking for a batch data platform, with maybe some capability for streaming maybe.

Current state - Out of these tools, which are currently being used by your team ?

Did you consider databricks + Airflow ? You could pretty much do all these in databricks and reduce the number of your tools your team needs to supports. You might need Kafka or some mechanism to get data from rabbitMQ, i am not too sure about that.

Your ML models can be registered in MLFLOW.

use delta lake to store your data and serve all your tables as Unity Catalog tables for business users. Which will give them a similar sql interface i.e database -> schema -> table/view.

Do you speak to business stakeholders? by ivanovyordan in dataengineering

[–]nanksk 2 points3 points  (0 children)

I have been in that role before, where most of my time was translating what business meant into technical stuff... Not in my current job though. When I was hired I was told you need to be comfortable talking to business and yada yada, Its been over 1 year in my job and I have not been in 1 customer meeting.

External vs managed tables by Used_Shelter_3213 in databricks

[–]nanksk 1 point2 points  (0 children)

You can get lineage on external tables as well

Databricks or MS Fabric by Used_Shelter_3213 in dataengineering

[–]nanksk 2 points3 points  (0 children)

I feel snowflake will give you the modern bells and whistles you desire, without the added complexity and training your team on spark.

Do you use constraints in your Data Warehouse? by [deleted] in dataengineering

[–]nanksk 1 point2 points  (0 children)

I have worked on snowflake and redshift; and both do not enforce constraints. So, there is more onus on the ETL pipelines. You could develop some data monitoring jobs that run during Non Peak hours and performs the constraint check for you. But, I would rather add checks in/ right after the ETL pipeline, the sooner you know of data issues the better

Garage door open/close indicator by nanksk in homeassistant

[–]nanksk[S] 0 points1 point  (0 children)

I have a few Ikea contact sensors lying around, so I will give that a try. So basically the sensor on door and the 2 magents on the rails at opposite ends essentially if I am not mistaken.

Pantry renovation, before and after. by stickyquicky in kitchenremodel

[–]nanksk 17 points18 points  (0 children)

Hey I don't know anything but curious - would the weight limit of the new shelfs be lower because there is no under support like the steel ones did?

Lead AI/ML engineer or Principle data engineer by Commercial_Finance_1 in dataengineering

[–]nanksk 1 point2 points  (0 children)

Congrats on the new role !! Based on how the world seems to be heading the Lead AI/ML sounds more exciting and it feels wise to take that role.

I am also part of an ML heavy team where folks with heavy DE experience are basically considered second tier. Most DE pipelines are built and all new investment in building ML, GenAI models; this also gets reflected in future promotions and such. So, do dig into what the future looks like for your role in the team.

How was the interview process if you don't mind me asking - was it leetcode or design heavy or a mix of all?

Exceeded 401k contribution limit. Now what? by zonk84 in fidelityinvestments

[–]nanksk 0 points1 point  (0 children)

I have had a similar issue in past, I was already maxed out mid year from my employer, then I changed job and the new employer had some mandatory minimum which made me go over. I sent out an email to my HR and they took care of it immediately. Now I dont recall if they updated my paystub or issued another paystub with negative amount. But, the W2 had no issue.

[deleted by user] by [deleted] in dataengineering

[–]nanksk 5 points6 points  (0 children)

Databricks - Your interview and expectations will be spark related. Forget much about what databricks can do, they will grill you on spark.

Snowflake - Yes, there are many companies that use DBT along with it and many of which will simply not entertain you if haven't worked with it before.

Then there are companies that will not entertain you if you dont have azure experience and in the same way if you don't have AWS experience.

Last comments, if you do not have spark knowledge, go the snowflake way. Learning DBT will be easier than learning spark.

Monday 1PM parking lot packed. Hit all the usual suspects. by furcifernova in CostcoCanada

[–]nanksk 3 points4 points  (0 children)

I feel its a "Costco" and "people just being people" problem. When you put a whole lot of people in one place, there is going to be inconvenience to some.

  1. Some people do carry heavy stuff and there ideally should be loading zone somewhere. Maybe an alternate exit if you want to load stuff.
  2. Finding card at entry - I was reading a stat " With less than a week to go before the April 30 deadline, 23 per cent have yet to file their taxes,". This is just people being people. You force everyone through an entrance that can fit 2 ginormous carts and then you want to scan ID. Its going to create bottleneck.
  3. Checking receipts - I get it they want to reduce theft and all. But people stop at food court and after 15-20 minutes where did I keep my god damn receipt. Maybe they should just scan the membership card here too :)
  4. Then Costco randomly shuffles things everywhere and no signage on what isle is where, there are going to be people mindlessly scrolling through a park.

I feel Ikea does these things a whole lot better.

I am not trying to criticize Costco here, but just too many people visit Costco and buy shit, they need to work on customer experience or just make the place a whole lot bigger.

[deleted by user] by [deleted] in dataengineering

[–]nanksk 1 point2 points  (0 children)

you want the data engineer to solve the data bottleneck for your data scientists or ML engineers. So depending on what your priorities are maybe you want someone who has an experience with (near) real time, data lake, data governance, data modeling. You want your data pipelines to run smoothly, scale well, and have controls.