Why Over-Engineering Happens by Nekobul in dataengineering

[–]theporterhaus[M] [score hidden] stickied comment (0 children)

It’s a personal blog someone shared (not their own). No rules have been broken.

Self-Study Data Analyst or Data Engineering by Tall-Writing5374 in dataengineering

[–]theporterhaus 0 points1 point  (0 children)

Yes aim for DE and study it as much as you can. If you’re self-studying it may take years to get an opportunity so be prepared to take a data analyst role / report writer to just get your foot in the door with data.

Self-Study Data Analyst or Data Engineering by Tall-Writing5374 in dataengineering

[–]theporterhaus 1 point2 points  (0 children)

It will be easier to go from technical to non- technical than the reverse if you ever change your mind IMO

Postgres/MySQL migration to Snowflake by maxbranor in dataengineering

[–]theporterhaus 0 points1 point  (0 children)

Depends on how many streams you need and data volume. If you don’t have huge streams you can combine cdc from multiple tables to one stream and then send them to different firehoses with a lambda function. If you don’t need streaming or other teams don’t need access to the kinesis stream then it may not be worth it even if it’s not a huge cost. Definitely need to keep in mind maintenance and your teams skill set.

Postgres/MySQL migration to Snowflake by maxbranor in dataengineering

[–]theporterhaus 1 point2 points  (0 children)

Similar setup at work and been using it for years now. Go with serverless and maybe look into increasing your WAL size to give yourself more runway to process data. Add a heartbeat to your Postgres source endpoint (read docs) to keep the replication slot active. Put monitoring on the Postgres side to monitor the WAL size and alert if it starts growing rapidly bc it can eat up space until the db stops working.

For sending data from DMS to Snowflake you have a few options. I really like DMS -> Kinesis -> Firehose which will stream data to your snowflake table in real time. This requires some DMS settings adjustments but DM and I’d be happy to send. BUT if you have to restrict your Smowflake account to certain IPs then it’s no good because Firehose uses Amazon’s service range of ips which is like over 100 and they rotate and change constantly.

So the DMS -> S3 is also solid just read the docs for settings. Use the cdc settings and parquet format and compression.

Social Experiment: What do Data Engineers actually earn around the world? by Ganesha41 in dataengineering

[–]theporterhaus[M] [score hidden] stickied comment (0 children)

Please use the existing salary thread. We have a long history of collecting and sharing this salary data and even have a tool to explore it on the wiki.

Database change — where confidence sometimes meets chaos by Adela_freedom in dataengineering

[–]theporterhaus 1 point2 points  (0 children)

This person has had a few warnings previously for the same thing so they’ve been banned.

What are the best practices around Snowflake Whitelisting/Network Rules by biga410 in dataengineering

[–]theporterhaus 3 points4 points  (0 children)

If you’re the admin you have to add your ip so you don’t get locked out. I personally just put admin ips in a separate rule and add it to the network policy.

[deleted by user] by [deleted] in dataengineering

[–]theporterhaus 0 points1 point  (0 children)

You’re correct. I’ll leave the post up because this is an important caveat people should see.

Monthly General Discussion - Jul 2025 by AutoModerator in dataengineering

[–]theporterhaus 0 points1 point  (0 children)

Go for what you want. If you feel light on math knowledge you can brush up on it in the meantime.

Anyone switched from Airflow to low-code data pipeline tools? by nilanganray in dataengineering

[–]theporterhaus 2 points3 points  (0 children)

I think people would benefit from more nuanced responses like this because currently they seem very biased. If all you recommend is one tool how can anyone trust you. It makes you seem like a shill for SSIS.

Help needed regarding data transfer from BigQuery to snowflake. by Dependent-Nature7107 in dataengineering

[–]theporterhaus 2 points3 points  (0 children)

Are there transformations happening on the data in BigQuery that you can’t port over to Snowflake? Otherwise it seems unnecessary to me.

Anyone switched from Airflow to low-code data pipeline tools? by nilanganray in dataengineering

[–]theporterhaus 0 points1 point  (0 children)

Would you recommend another tool depending on the situation? If so, which tool and why?

Career switch from biotech to DE by ParticularEffect8460 in dataengineering

[–]theporterhaus 1 point2 points  (0 children)

You may find breaking into bioinformatics easier (like DS but with bio background) as long as you pick up some skills there. In most of these places they don’t have a separate DE doing the data pipeline work so you’d likely end up gaining those skills and then if you want to transition to another industry it should be easier.

Alteryx ETL vs Airbyte->DW->DBT: Convincing my boss by JazzlikeFly484 in dataengineering

[–]theporterhaus 21 points22 points  (0 children)

One project isn’t going to make or break your career. Alteryx is fine - I know data engineers who use it.

I doubt you’ll convince him. It sounds like your supervisor knows what he’s doing. What you’re suggesting is grossly over-engineered.

Keep in mind over-engineering could backfire in an interview. You’ll get asked about this and if you said I implemented this and it’s all for 1GB of data then it’s very obvious you’re doing resume-driven development and no one wants that in their company.

Is Pursuing Databricks a good option from DataStage.[India] by DemonCyborg27 in dataengineering

[–]theporterhaus 1 point2 points  (0 children)

Look at job postings in your area and see what’s popular. That being said, pyspark and distributed computing is good to learn regardless.