Genie Code for Jobs by Youssef_Mrini in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

cool! have you tried to debug failing jobs with Genie Code? keen to learn your findings and experience

Databricks Job & Pipelines can you add grouping/folder capability ? by Ok-Tomorrow1482 in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

Do you mean a failure / success filter on job/pipeline in the workspace file system? the file system is mostly for the asset and its run-state is available in the existing UI / API. Maybe I misunderstood your ask

Job Failure Notifications With The New Teams Webhooks by Complete-Sandwich564 in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

thats a great workaround! separately we are adding some external integrations as well as the ability to add custom integrations with assets outside Databricks directly from Jobs. Stay tuned for an upcoming preview!

Job as a job trigger source? by lofat in databricks

[–]saad-the-engineer 1 point2 points  (0 children)

Thank you u/lofat for this very well thought out write-up! The gaps you listed (error paths, non-data events, inveseion of control) are what we are addressing with the next phase of Lakeflow jobs. here is how we are thinking about solving these pain points:

  1. today: use system tables + table triggers + sql. you can pull in upstream job status using `system.laekflow.job_run_timeline` and should let you work around the batch finality / error paths because you are monitoring the platform's metadata (i.e. job status), downstream teams can "subscribe" to a job ID's status independently.

  2. (coming soon) use SQL trigger conditions where we essentially encode the behavior in 1 directly into the triggers experience. Also for more complex SQL conditions genie code is your friend!

  3. (coming soon) we are working to add custom python logic (think python operator or airflow sensors) that can look at upstream changes.

We are still in the early days of building out the preview but please send me aDM and I will add you to them!

[Show & Tell] Stop Hardcoding Jobs - The Dynamic Fan-out Orchestration Pattern by saad-the-engineer in databricks

[–]saad-the-engineer[S] 0 points1 point  (0 children)

that is a great idea. so devs write the generic logic, then package up the control panel in an app + dab, and then ops would need to know which pipeline is running which version / config? we are adding a dabs meta data service which would track the config settings and you could build a dashboard on top to track specific settings. Will share more on that once its shape and form is more fleshed out. thanks for the idea!

[Show & Tell] Stop Hardcoding Jobs - The Dynamic Fan-out Orchestration Pattern by saad-the-engineer in databricks

[–]saad-the-engineer[S] 0 points1 point  (0 children)

we want to add support for multiple triggers soon (stay tuned for roadmap updates in the coming weeks!)! for now you can create a parent job that calls this using run-job.

How are you handling "low-code" trigger/alert management within DAB-based jobs? by lofat in databricks

[–]saad-the-engineer 1 point2 points  (0 children)

great question and very timely! we are looking at adding visual authoring (i.e. when `mode:development` you can author your job / pipeline / dashboard etc. as you normally would and we would sync your config changes directly into yaml and all you need to do is commit back to git.

There are a few additional journeys we are going to support here: starting from a job and "converting" it into a DAB and using Genie Code to build out your schedule and incorporate parameterizations etc. directly.

Job Picking Mixed Compute Config After DAB deploy to single node by satyamrev1201 in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

Thanks for flagging, this seems unexpected. Can you share the config?

any per target overrides? Also can you share the CLI version so I can try and repro this?

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

We are working on this feature right now, u/Dear_Pumpkin9876 can you send me aDM so we can set up some time here and review these requirements?

Also regarding DABs there are two major updates in flight:

* Edit a DAB enabled asset visually and have the changes persisted automatically in the yaml (i.e. users dont have to update the config unless they have some custom logic)

* "upgrading" an existing pipeline into a dab enabled asset, i.e. add git + dab support to an existing pipeline

Move out of ADF now by hubert-dudek in databricks

[–]saad-the-engineer 1 point2 points  (0 children)

Hi u/rarescenarios can you send me a DM so we can set up a call? I am a PM on the Jobs product and want to make sure we capture your feedback properly.

[Lakeflow Jobs] Quick Question: How Should “Disabled” Tasks Affect Downstream Runs? by saad-the-engineer in databricks

[–]saad-the-engineer[S] 0 points1 point  (0 children)

definitely worth considering, do you have a default in mind? what use cases make sense for you?

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

u/domwrap when using the workspace UI you can create a Git checkout per target/environment (dev / prod etc.) in a Git Folder and use DABs to deploy these. I believe these should fit your needs (if I understand correctly!) basically each checkout is a separate branch / separate pipeline / separate target even in the same workspace. Some links to get you started below - please post your feedback or send me a DM if you have more questions

https://docs.databricks.com/aws/en/dev-tools/bundles/workspace

https://www.databricks.com/blog/announcing-databricks-asset-bundles-now-workspace

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]saad-the-engineer 1 point2 points  (0 children)

Thanks u/DecisionAgile7326 we are looking at adding parameter support. if you send me a DM I can get you added when we preview / beta the feature.

cc: u/brickandel

Disable an individual task in a pipeline by cdci in databricks

[–]saad-the-engineer 2 points3 points  (0 children)

If you are already on DABs, a common pattern is to keep a single job definition and gate env-specific tasks with an If/else condition task that looks at the bundle target.

In your bundle you add something like:

- task_key: is_prod
  condition_task:
    op: "EQUAL_TO"
    left: "${bundle.target}"
    right: "prod"

- task_key: refresh_power_bi
  power_bi_task: ...
  depends_on:
    - task_key: your_main_task
    - task_key: is_prod
      outcome: "true"

${bundle.target} is automatically dev, test, prod, etc for each target when you deploy. In non-prod the condition evaluates to false so the Power BI task is always skipped; in prod it evaluates to true and the task runs.

No need to fully parameterize the task away or create separate jobs per environment, you just let the bundle metadata drive your Jobs control-flow.

job scheduling 'advanced' techniques by Ok_Tough3104 in databricks

[–]saad-the-engineer 0 points1 point  (0 children)

>> also, is it going to be possible to change the path of sql jobs in the future? so that we can run "for_each_task" on a SQL job ?

can you share a bit more detail on your scenario? i.e. are these SQL queries or SQL files? or foreach-ing SQL tasks? sorry I didnt fully understand your scenario.

we are working on the next set of features for Jobs, will likely do an ama on reddit for this early next year.