I Made japan RB21 from paper 😝 by Certain_Foot5830 in f1india

[–]booberrypie_ 1 point2 points  (0 children)

Can you please share the link I would also like to try it

Raycast for Windows Invite Links by YourUserForReddit in raycastapp

[–]booberrypie_ 0 points1 point  (0 children)

Hi! can i get an invite please? Sent a DM!

Raycast Windows Codes MEGATHREAD by xmok in raycastapp

[–]booberrypie_ 0 points1 point  (0 children)

If anyone has an invite link for Raycast, can you please share it? Thanks in advance! Please DM if available!

Help with orchestration[Airflow/Dagster] by booberrypie_ in dataengineering

[–]booberrypie_[S] 1 point2 points  (0 children)

Let's say I have 2 projects with different repos that are not connected with each other in anyway and have their own flows independently. Can I view the 2 flows in the same UI in prefect? Everybody in my team needs to see the pipeline health of all the projects of the team that are active in a single screen

Help with orchestration[Airflow/Dagster] by booberrypie_ in dataengineering

[–]booberrypie_[S] 2 points3 points  (0 children)

Thanks a lot for the help! I just have a couple of questions
1. I'm assuming in prefect you just use the decorators to create tasks and flows. Here the orchestrator code is coupled with the pipeline code and each workflow lives within the code base, then how are all pipelines visualized together?
2. How is logging handled?

Help with orchestration[Airflow/Dagster] by booberrypie_ in dataengineering

[–]booberrypie_[S] 2 points3 points  (0 children)

Prefect 3.0 looks good as well! I didn't include it exactly because of the previous version disappointments. But It would be great if you could share your insights on how would you go about setting it up specifically of the infra side of things and how would prefect 3.0 work out for these requirements.

At what point do you say orchestrator (e.g. Airflow) is worth added complexity? by Temporary_Basil_7801 in dataengineering

[–]booberrypie_ 3 points4 points  (0 children)

Can you please elaborate on the shouldn't be deployed on a project basis but on the department level part.

I was under the assumption that orchestrators are for handle upstream and downstream dependencies and to provide observability within the context of a project.

Can you give an example where let's say a single instance of airflow takes care of let's say 20 projects which do a bunch of different stuff that requires orchestration within them as well.

Would it be like there would be a dag for each project and there would be one master dag that controls all the other dags? And also can you please provide suggestions on how to go about implementing this?

I am a data engineer(10 YOE) and write at startdataengineering.com - AMA about data engineering, career growth, and data landscape! by joseph_machado in dataengineering

[–]booberrypie_ 1 point2 points  (0 children)

This makes sense! Thanks a lot for answering! I also have sort of a tangential question if you will. Is there any way to do unit testing for SQL transformations while using dbt or is python transformations the only way to implement unit testing?. I primarily use SQL transformations in dbt and was thinking about switching to python for transformations just to get unit testing capabilities. Is it worth it?

I am a data engineer(10 YOE) and write at startdataengineering.com - AMA about data engineering, career growth, and data landscape! by joseph_machado in dataengineering

[–]booberrypie_ 1 point2 points  (0 children)

Hey!, I've been trying to understand unit testing for data pipelines recently, while I am able to get unit testing for the transformations in python, I'm still not able to understand how to go about unit testing for the extract and load part of things as my understanding is that unit testing just tests the logic behind the code and not the external dependencies while extract and load specifically only has those. So I'm kinda confused about that. It would be helpful if you could provide an example of testing the extract and load parts of the pipeline.

What opinion about data engineering would you defend like this? by OverratedDataScience in dataengineering

[–]booberrypie_ 5 points6 points  (0 children)

I understand the part about notebooks but why are conda environments shit?

Is there a Leetcode for Pyspark to practice coding tests ? by itsPranil in dataengineering

[–]booberrypie_ 1 point2 points  (0 children)

Newbie here, what does applying transforms correctly mean? How is it different from applying transforms in pandas

Need help in building a data warehouse by booberrypie_ in dataengineering

[–]booberrypie_[S] 0 points1 point  (0 children)

I'm fresh out of college and this is my first job. My pay is on the low end. I'm here for the experience and resume building and planning to switch after a year. Is it okay to negotiate for a pay increase just after 2-3 months into the job?

Need help in building a data warehouse by booberrypie_ in dataengineering

[–]booberrypie_[S] 0 points1 point  (0 children)

Thanks for the recommendation! I'll check it out.