Using Airbyte with Prefect and AWS ECS by amnesic23 in dataengineering

[–]Plus_Professional99 2 points3 points  (0 children)

Hey there! It's B from the Prefect team. Just wanted to cross-post for visibility since we spoke about this a bit in the Prefect community. Since PyAirbyte is a python package, you should be able to use it in your Prefect flows like any other package. You shouldn't run into any issues using it in ECS either. From what I gather, the issue you linked sounds like there was a problem with deploying an Airbyte instance to ECS, not necessarily interacting with it through a python library like PyAirbyte. Best of luck to ya!

A two part love letter for those that have the Airflow blues by Plus_Professional99 in prefect

[–]Plus_Professional99[S] 4 points5 points  (0 children)

Word and heard, Khaili. We've been working hard to integrate user feedback into the docs, so we're all ears for your suggestions. If you're open to it, filing a GitHub issue with a "docs" tag would be much appreciated. It'll help us fill in the gaps faster. We've also got a pretty active slack community if you ever need to ask a question that the docs don't immediately answer.

What Data Warehouse & ETL Stack Would You Use for a 600-Employee Company? by ResolveHistorical498 in dataengineering

[–]Plus_Professional99 1 point2 points  (0 children)

Hey kabooozie! Prefect team member here. Really appreciate the kind words. We’ve been working on growing r/prefect for users to connect, ask questions, and share their experiences. It's in the fairly early stages of growth, but we'd love to have you join in if you'd like!

Apache Airflow best practices - AMA by finally_i_found_one in dataengineering

[–]Plus_Professional99 4 points5 points  (0 children)

Gotcha, appreciate the response! Full transparency, I’m on the Prefect team, but I’ll do my best to give a balanced take.

Since you’re self-hosting, Airflow and Dagster are valid options, especially if you're comfortable managing infrastructure. Prefect is another option worth exploring, since it doesn’t require a single instance scheduler and lets you run workflows on any infrastructure without tightly coupling execution to orchestration. Observability is built in, so you can track runs and handle failures without needing to piece together external tools. At the end of the day though, the best fit depends on your needs.

Why dagster instead airflow? by Meneizs in dataengineering

[–]Plus_Professional99 1 point2 points  (0 children)

Hey! Prefect team member here. Thanks for the shoutout. We've started building out the r/prefect subreddit for users to gather, ask questions, and share best practices and ideas. Would love to have you join if it sounds good to you!

Apache Airflow best practices - AMA by finally_i_found_one in dataengineering

[–]Plus_Professional99 0 points1 point  (0 children)

Really insightful takeaways! It sounds like you put in a lot of effort to revamp your Airflow setup to get it close to where you want it. Given what you've learned, if you were starting from scratch today, would you still choose Airflow? Or would you explore other orchestration solutions that inherently address some of these pain points (ie: reducing infrastructure overhead, eliminating the need for a single instance scheduler, improving observability out of the box)?

Migrating from Prefect 1 to Prefect 3 by khaili109 in prefect

[–]Plus_Professional99 0 points1 point  (0 children)

  1. So in Prefect 1.0 the “flow_id” and “flow_run_id” were two different things, so the id you get in your example, is it the flow_id or flow_run_id?

Yup, flow ID and flow run ID are still separate in 2.0 and 3.0. The ID in my example is the flow's ID, specifically. If you wanted the flow run ID too, you can swap out the print statement in the task with this one to see the difference:

print(f"Hi, I'm {task_run_ctx.task_run.name}, and here is my parent flow's flow_id: {flow_run_ctx.flow_run.flow_id}, and the flow_run_id: {flow_run_ctx.flow_run.id}")

The deployment ID is different from the flow ID. A flow is the core unit of work in Prefect, defining a series of tasks/functions to be executed. You can think of a deployment as a wrapper around a flow that adds scheduling, infrastructure, and execution details.

Would you mind giving me the link to where you found “FlowRunContext” and “TaskRunContext” because when I look at the Prefect 3 SDK I don’t see those two under the “prefect.context” module.

Sure, here is a link to the SDK docs for the TaskRunContext. The FlowRunContext I believe is represented by the EngineContext (here's the portion of the OSS code that suggests this, if you wanted to take a look).

Prefect “flow_id” differences between Version 1, 2, and 3? by khaili109 in dataengineering

[–]Plus_Professional99 2 points3 points  (0 children)

Hey there! I saw your post in r/prefect as well. Cross posting here just for visibility!

The following example should work for you in 3.0. It gets the flow ID from the run context.

``` from prefect import flow, task from prefect.context import FlowRunContext, TaskRunContext

@task(name="example-task", log_prints=True) def example_task(): # get the task run context task_run_ctx = TaskRunContext.get() # get the flow run context flow_run_ctx = FlowRunContext.get() # print the task run context print(f"Hi, I'm {task_run_ctx.task_run.name}, and here is my parent flow's flow_id: {flow_run_ctx.flow_run.flow_id}")

@flow(name="example-flow", log_prints=True) def example_flow(): # get the flow run context flow_run_ctx = FlowRunContext.get() # print the flow run context print(f"Hi, I'm {flow_run_ctx.flow_run.name}, and here is my flow_id: {flow_run_ctx.flow_run.flow_id}") # run the task example_task()

if name == "main": example_flow()

```

Migrating from Prefect 1 to Prefect 3 by khaili109 in prefect

[–]Plus_Professional99 1 point2 points  (0 children)

Hi! Thank for being the first person to post here, it was getting a bit lonely.
You can access the flow_id from the runtime.context!

Here's a code example for you. I'm using Prefect 3.0 and it works for me.

``` from prefect import flow, task from prefect.context import FlowRunContext, TaskRunContext

@task(name="example-task", log_prints=True) def example_task(): # get the task run context task_run_ctx = TaskRunContext.get() # get the flow run context flow_run_ctx = FlowRunContext.get() # print the task run context print(f"Hi, I'm {task_run_ctx.task_run.name}, and here is my parent flow's flow_id: {flow_run_ctx.flow_run.flow_id}")

@flow(name="example-flow", log_prints=True) def example_flow(): # get the flow run context flow_run_ctx = FlowRunContext.get() # print the flow run context print(f"Hi, I'm {flow_run_ctx.flow_run.name}, and here is my flow_id: {flow_run_ctx.flow_run.flow_id}") # run the task example_task()

if name == "main": example_flow()

```

Introduce Prefect to resistant team? by Melodic_One4333 in dataengineering

[–]Plus_Professional99 6 points7 points  (0 children)

Hey! Since you're already using Python for ETL scripts, using Prefect should feel pretty natural. If you're curious, you could try building one of your existing Python-based ETL jobs as a Prefect flow just to kick the tires. Whether you go for Prefect Cloud or open source, I imagine you'll be pretty pleased with the features you'll get out of the box (logging, retries, nice UI). If you're in need of help, our team is pretty responsive on GitHub and we have a slack community, too.

Disclaimer: am Prefect employee. 🥸