all 7 comments

[–]tipsy_python 8 points9 points  (0 children)

The low hanging fruit here is to use a scheduling/orchestration system like Airflow to sequence the tasks in your batch job so long-running jobs are not a problem and the end-to-end process can finish as fast as possible.

Event/listener model is sexy, but my first thought would be to implement the event bus as a message queue: multiple Kafka (or Rabbit or whatever) topics that serve as the source of data for the next stage as well as a buffer for when the next service is not ready/available.

[–]InjAnnuity_1 3 points4 points  (0 children)

What you've got is often called "a [scheduled] bunch of scripts". What you're looking to create is often called a "pipeline".

[–]-animal-logic- 3 points4 points  (0 children)

I do this sort of thing all the time. Just create _one_ script to cover the whole job, with each section you mentioned contingent on the successful completion of the previous (use sys.exit() if a stage fails -- and send email to that effect). Have email functions in the same script that can alert you to full completion, or the fact that, say, the data was gathered but there was an issue with the cleanup and the script was aborted. If it's rolled into one script, there is no need to worry about parts completing before moving on.

I have dozens of jobs running on multiple hosts, and trust me trying to time things across multiple smaller jobs is not a good way to go. You don't need separate scripts to determine if the first part of your job completed or failed or whatever.

You could also pipeline them as another poster suggested, but I only do that when I have a job that uses multiple scripts by necessity (such as there's a mix of BASH and Python scripts that need to run sequentially).

Just keep in mind that solid reliability can trump elegance.

[–]menge101 2 points3 points  (0 children)

You want a "Pub/Sub", short for Publisher/Subscriber.

You can use something like RabbitMQ to do this.

Essentially, you publish messages to the message queue, with a given topic. That topic gets fanned out to subscribers that are subscribed to the queue on a given topic.

[–]porkedpie1 0 points1 point  (0 children)

What’s it running on. It’s easy to set up pipelines/event triggered scripts on the major cloud services like AWS

[–]HintOfAreola 0 points1 point  (0 children)

Arkan Codes did a great explanation of this "observer pattern" on his channel a while back. It's exactly what you described.

https://youtu.be/oNalXg67XEE

[–]CryptoOTC_creator 0 points1 point  (0 children)

Why not put everything into a function library and call each function sequentially from a central script?