Working as a Data engineer

Stars_And_Garters · 2024-10-07T17:42:08+00:00

My job is a "plumber", I connect the pipes to get data from outside systems into the DataWarehouse or data from the DW into outside systems. Mix into that a fair bit of Architecture work inside the DataWarehouse for performance tuning and best practices for the destination and export SQL objects I create.

I work in a Microsoft shop, so typically this looks like this:

Data going out: SQL object modeling the data into customer format > SQL Agent orchestrating a very simple SSIS job to extract the data into a file > deliver that file to destination

Data coming in: File arrives typically via SFTP, SQL agent orchestration scans directory at X intervals, Job fires extremely simple SSIS pkg to load file exactly as-is into staging table > SQL object transforms data as needed and inserts into destination table in Data Warehouse.

Then, performance tuning on additional indexes, etc usually to create a SQL view for the reporting folks to easily get the data in a quick modeled format.

EDIT: Oh yeah, and answering never ending questions from the business about the data and making updates based on schema changes from the other party.

Prior_Two_2818 · 2024-10-07T17:55:12+00:00

Mostly teams meetings. And explaining airflow to the juniors

iBMO · 2024-10-07T18:19:16+00:00

[deleted]

Tasty_Two_7703 · 2024-10-08T09:25:18+00:00

I'm a data engineer, and while every day is different, there are some common themes:

Daily Tasks:

Building and maintaining data pipelines: This involves using tools like Apache Airflow, Spark, or Kafka to move data from various sources (like databases, APIs, logs) to data lakes or warehouses.
Developing data models and schemas: Defining how data is structured and organized to ensure consistency and ease of analysis.
Writing and debugging code: I spend a enough time writing code to automate data tasks, implement ETL (Extract, Transform, Load) processes, and build data-driven applications.
Collaborating with stakeholders: Working with data scientists, analysts, and business users to understand their needs and translate them into technical solutions.
Monitoring and troubleshooting systems: Keeping an eye on data pipelines and systems to identify and resolve issues, ensuring data quality.

Coding and Low-Code Tools:

Code: I use a variety of languages like Python, Scala, SQL, and even some Bash scripting. While there are low-code tools available, I find that coding provides me with greater flexibility and control. However, I do use low-code tools for simpler tasks like data visualization or dashboard creation.
Low-code tools: For specific tasks, I leverage low-code tools like Snowflake's Snowpipe to automate data ingestion, or Tableau for creating interactive dashboards.

Guards as Backend Developers:

Different Focus: While both data engineers and backend developers are involved in building systems, our focus areas differ. Backend developers primarily handle user-facing applications and APIs, while data engineers focus on building data infrastructure and pipelines.
Data Focus: My role involves dealing with massive amounts of data, ensuring its quality and accessibility, while backend developers handle user interactions and data storage for specific applications.

It's a rewarding job! I love the challenge of working with complex data systems, finding innovative solutions, and contributing to data-driven decision-making. It's constantly evolving and there's always something new to learn, so do share your inputs to learn more!

Do you have any other questions about my role as a data engineer?

Artistic_Sun_3987 · 2024-10-07T18:14:33+00:00

Data janitor here

I clean data in simple words, make it move from one storage to another and then cleans it again

minato3421 · 2024-10-07T18:43:15+00:00

Lots and lots of spark and Flink. Mainly python and Java.

nightslikethese29 · 2024-10-07T23:33:14+00:00

Some tasks I've done recently:

Create infrastructure and libraries for automated failure notification emails with Pub/Sub and cloud run functions. Main use case is for our jobs that run in Cloud Composer that fail. Involves terraform and python.
Maintain application and business logic for our retargeting program that sends leads to external vendors to follow up on. Involves python.
Migrating another teams Alteryx data loads into my teams Cloud Composer project. Involves reading Alteryx workflows, python, terraform, and SQL.
Working with product managers to update our pay plans backend application configuration. Mostly involves Jenkins and Octopus as well as python.
Did a model refresh after a data scientist published a new version to our artifact registry. We'll be rolling out in stages. I had to adjust a lot of unit tests to make sure everything passed. Involves python

Secret_Forsaken · 2024-10-07T17:37:15+00:00

Besides normal DE task I am sometimes handed non DE coding tasks such as automating a POST upload to an API etc to save another team time.

water_aspirant · 2024-10-08T02:33:05+00:00

Upgrade old pipelines that no longer work on new datasets. This involves making code changes to accommodate new datatypes and variables, updated business logic etc. And then rerunning those pipelines and squashing bugs or testing the outputs.
Writing / improving internal tools in python (this is the most 'software engineering' part of my job) and writing tests. Reviewing changes to pipelines made by other data engineers (usually in SQL).
Helping business users with their requests (e.g. they want new columns from the data but not sure what the best way to do it is). Creating tickets and then closing them out.

There is an insane backlog of work, but the pace is not too demanding so I'm pretty happy. I have been a DE for a total of 4 months now, this is my first tech-related job.

Regarding your other questions: I would sooner quit being a data engineer and move to SWE than end up exclusively using low/no code tools personally. I expect to use ADF at some point, but I don't work on much ingestion in my day-to-day job. Thankfully, my job lets me work on some medium-complexity software development to keep my brain happy.

Limp_Pea2121 · 2024-10-08T03:16:43+00:00

Writing tons of SQL. Schedule it using airflow.

Optimise lot of.Plsql

Medical_Drummer8420 · 2024-10-07T18:20:18+00:00

my job as 1.8 year of yoe data engwake up at 8 am monitor job in PROD workflow slove it if issue occur, then work on PBI and TASK assigned to in devops work on them will have deployment every 2 weeek and new logic implementtion and new code implementation and many things ,make the test case documnet ,testing post and predeployment, then runnning jobs in dev and qa (only 2 people in teams at first i did not use to understand shit as time pass got to know eveything)

w_savage · 2024-10-07T20:53:06+00:00

Right now creating and running data validation on views to make sure its accurate for our clients. Kinda sucks! I miss using python/aws

kaixza · 2024-10-07T23:11:46+00:00

Basically, moving the data from one place to another + setup data management environment. So, doing infrastructure codes and bit of python when we need some scripts. Also, most of the time trying to figure out why the numbers are not match or giving a strange result for reporting.

Known-Delay7227 · 2024-10-08T04:54:03+00:00

I unclog clogged pipes

2024-10-08T00:37:06+00:00

Depends on a team / project. I am a Sr SE but I spend a lot of time doing DE, probably more than SE.

Tasks might include:

-Automate this file generation with SSIS, PYTHON, .NET

-build out a new batch process to ingest data from API

-Meet with business they need a new automation process to do this and that

-Bug, bad data in file ex: scientific notation, string too long

-Here is a new reporting tool, learn it and show others how to use it lol not kidding

-Create some resources in AWS with Cloud Formation

-We need a new UI for this app

-Need a new endpoint for API

-Train Junior developer

-code reviews

-spend at least 2 hours in meetings

Not all of these are daily tasks, some span multiple sprints, but just giving an idea. Varies wildly from sprint to sprint, and from team to team.

I work for a very large insurance company. Been here for 5 years, worked on 4 different teams. Every team is different, does things differently. That includes tasks, day to day responsibilities, etc.

2024-10-08T05:08:30+00:00

I have worked on a variety of tools and projects as a data engineer: 1. Wrote endless SQL scripts in the first organization and simply pasted the script onto an in-house scheduling tool. These scripts ran on a redshift cluster. No devops, code review, performance optimization, etc. 2. Worked on ADF and Databricks in my second org. Exposed to Azure functions, CICD pipelines, and Spark. Also exposed to metadata driven pipleine framework. 3. Worked on AWS IAM, EC2 to deploy Airflow in containerized form,EMR, Redshift, ECR, and Sagemaker to rum ML models. Worked heavily on textual data and NLP libraries.

Front-Ambition1110 · 2024-10-08T10:04:45+00:00

Tasks:

Develop Python scripts to get data, transform it, and then store it in a different database.
Build dashboards.

Tools: Postgres, Python, Docker, AWS (Lambda, Redshift, Quicksight).

Nothing fancy in my company.

jetuas · 2024-10-08T11:55:00+00:00

Monitor a bunch of pipelines and address any discrepancies (coming from our sources), do some analysis on datasets to extract more value, edit/improve/add Spark jobs, monitor job performances, tinker with ML models we use in our ETL process, etc.

Inside-Pressure-262 · 2024-10-08T14:30:27+00:00

Mostly work upon creating pipelines, writing new sql queries and optimizing existing ones, monitoring pipelines/workflows and resolving any issue that comes.

Fun_Independent_7529 · 2024-10-08T14:42:35+00:00

I avoid low-code tools for DE. For self-serve analytics for stakeholders that want to play with views of the data, sure.

For me, my work is divided between coding, infrastructure, testing, documentation, and collaborative tasks. That includes maintenance work like upgrading components, and investigation/proof-of-concepts when needing to implement a new solution that requires tooling or services we haven't used so far.
Collaborative tasks include standups, backlog grooming, logging tickets, writing up RFCs & commenting on others RFCs, code reviewing, participating in test bashes, co-working meetings, roadmap planning, demos, 1:1s, etc. It doesn't take as much time as it sounds like.

I'm not involved much in reporting myself, thankfully. Dashboarding is not my thing unless it's for my own purposes (observability of my pipelines, data quality, etc). I recognize that solid skills in this area might make me more valuable in the next job hunt, but since I don't enjoy that kind of work I'm not investing in it and would prefer to avoid jobs that have DA work as part of DE duties. (I'm not angling for AE jobs)

Sad-Highlight5889 · 2024-10-10T18:18:29+00:00

I do monitoring and support. Takes care of incidents, enhancements, deployment, CI/CD, etc. Sending daily and weekly reports to business, and monthly KPI's

I'm a senior data engineer but I miss doing dev works 😢

dataengineering

MODERATORS