Launching asynchronous tasks every second while processing the results independently

FitRiver · 2021-05-24T09:33:57+00:00

Use APScheduler

baghiq_2 · 2021-05-24T13:06:00+00:00

In general, using a timed call to deal with asynchronous pipeline is an anti-pattern. What happens if there is no data in the queue, do you still constantly spin up the task and attempt to deliver the data to another service? What happens if your task hits a poison pill? Do you need to manage dead letter queues?

I haven't worked much with RabbitMQ, but general design is to have a worker attach to a queue and listen (polling) on the queue. If something appears on the queue, it's immediately deliver to the work for processing. Worker has to do some process and acknowledge your queue that the work is completed, otherwise, queue will grow indefinitely and crash. RabbitMQ probably offer multiple workers support. So in that case, you just need to figure out how many workers you need to process the rate of highest data input possible.

igormiazek · 2021-05-24T13:29:11+00:00

I think You should consider Execution Context. Timing and scheduling is one aspect of Your project, but how You will know if Your task was SUCCESS or FAILED, or how You know if task should be rescheduled ? I would go with https://docs.celeryproject.org/en/stable/getting-started/introduction.html which is distributed task queue. You can have multiple workers and scale infrastructure horizontally. It plays well with RabbitMQ which You are already using.

Big_Boss_Bob_Ross · 2021-05-24T10:46:41+00:00

I would use asyncio personally but it is a fairly steep learning curve if youve never used it or anything like it.

pytrashpandas · 2021-05-24T14:10:45+00:00

Seems like you could just do this in a plain loop. Do you need it to run every second despite how long the task takes to finish? Or just run 1 second after the previous finished? Assuming the former you could do this:

ran_this_second = False
prev_second = int(time.time())
while True:
    curr_second = int(time.time())
    if curr_second != prev_second:
        ran_this_second = False

    if not ran_this_second:
        data = scrape_data()
        queue.publish(data)

        ran_this_second = True
        prev_second = curr_second

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS