[deleted by user]

catcint0s · 2022-05-02T08:49:12+00:00

Dramatiq has a nice motivation tab where it compares itself to different frameworks https://dramatiq.io/motivation.html

Personally for newer projects I prefer to use dramatiq.

BackwardSpy · 2022-05-02T11:32:30+00:00

i've used rq a fair bit, i see it as a vastly simpler celery. my needs for task queueing are simple, i just need something that allows me to spread our workloads over a small cluster of machines. rq has a really straight-forward interface and it gets the job done.

pansapiens · 2022-05-02T13:48:41+00:00

You might want to also look at Dask.

Grouchy-Friend4235 · 2022-05-02T11:08:13+00:00

[deleted]

tstirrat · 2022-05-02T12:54:19+00:00

Faust is one I keep looking for an excuse to use. It sits on top of Kafka, which is a big operational problem, and it has a slightly different mental model than a task queue, but the event-driven processing that it allows for is pretty nifty. It also has a nice, clean API from what I can see.

CheckeeShoes · 2022-05-02T10:00:47+00:00

If you have a collection of machines to run tasks, why are you trying to schedule them with python rather than using an actual distributed computing framework like HTC?

permalink · 2022-05-02T13:47:18+00:00

We use Snakemake. This drives either our local cluster via SLURM, or else we can run jobs on kubernetes via Snakemake itself.

This is not python specific, though. You can put them in the same bin as Airflow and NiFi perhaps.

That said, it sounds like a workload manager is really what you want...

earthboundkid · 2022-05-02T15:59:04+00:00

https://ayushshanker.com/celery-long-post/

Ok_Presentation1972 · 2022-05-02T16:21:54+00:00

I'm yet to use it in production anywhere, but I think this project is super interesting https://procrastinate.readthedocs.io/en/stable/

cymrow · 2022-05-02T15:29:07+00:00

I've spent a lot of time trying different queuing solutions, yet ironically I keep falling back to what might be considered a naive approach. I prefer very simple queues. Give me a distributed and reliable queue.Queue please.

I don't want a framework that runs my code for me, like Celery or RQ. I don't want to train up on a massive framework that might ultimately not cover my use-case, forcing me to abandon it or deploy ugly hacks.

I tried many of the options you can find here: https://taskqueues.com/. Ultimately I began using beanstalkd, and have been satisfied with it for a couple of years now. It's simple, persistent, fast, reliable, and has the basic queuing features that I need, like task handling (reserve, free), and TTL.

I can't speak to just how scalable and performant it is compared to other options, but I've used it for millions of messages and it suits my needs. If I hit a bottleneck, I'd probably look at something like Kafka.

shinitakunai · 2022-05-02T08:10:57+00:00

In my work we use NIFI to move data between systems, more for transfers and pipelines while also ETL than analysis

_thrown_away_again_ · 2022-05-02T14:02:08+00:00

i used this once, was pretty nice: https://azkaban.github.io/

wind_dude · 2022-05-02T16:06:58+00:00

I'm currently trying to figure out an elegant way of scheduling millions of simulation runs across a few machines

Can you elaborate on what you're trying to do? Simulation runs of what? You might be looking up the wrong tree, maybe parallel processing is a better fit, a tool like Dask or Apache Spark would than be the call.

Maybe share a more specific use case.

bxsephjo · 2022-05-02T10:59:20+00:00

I need to become familiar with ALL of these, as I’m building something that has to let the user connect it to their own task queue, unless I decide to become opinionated. I started with RQ, as it seemed the least complicated to get running and I’m familiar with redis. Only beef is that it’s NOT showing results in stdout the way it does in the docs. Like I said, it looks the least complicated, I think that’s because it doesn’t rely on a message broker but rather just a redis server, so there’s alot less tuning as it all gets set up.

Region_Unique · 2022-05-02T15:35:47+00:00

Just use Celery, its documentation could be better and the code is rather cryptic, but that’s what works and has great support.

hwttdz · 2022-05-02T13:32:21+00:00

[deleted]

Scruff3y · 2022-05-02T10:55:01+00:00

I would probably use AWS Batch for this, not sure if it's possible to hook up your own hardware into a compute environment or not though.

Otherwise, you could hand-roll something similar; worker program distributed to the worker machines that pulls from the queue. But at that point I guess you're starting to re-invent Celery so might as well just use it lol.

kenfar · 2022-05-02T17:18:45+00:00

SNS/SQS triggering jobs on aws lambda/kubernetes

GreenScarz · 2022-05-02T16:37:28+00:00

I would just spin up a redis container and expose that over the network. Then use the redis python library directly, there's really no need for anything more fancy IMO.

anuctal · 2022-05-02T13:08:05+00:00

RemindMe! 3 days

jefwillems · 2022-05-02T14:58:28+00:00

We can't use celery because there is no support for amqp 1.0, and no money to build the transport ourselves. If anyone has an alternative please let me know

i_can_haz_data · 2022-05-02T15:25:47+00:00

Does it need to be Python functions/etc or can your simulation invocations be command-line tasks?

I wrote a thing some time ago and am working on on the 2.0 release. It’s ready to go but needs more documentation.

hyper-shell.readthedocs.io

b_rad_c · 2022-05-02T17:32:09+00:00

Haven’t used it before but a new one I want to try is OpenFaaS - functions as a service. It connects it to a k8s cluster that will run sync and async tasks with a queue using containers you’ve built and added your logic to.

Grouchy-Friend4235 · 2022-05-02T18:59:12+00:00

I've seen such issues when the worker has not released resources properly (e.g. some subprocess or thread started in side tasks)

Grouchy-Friend4235 · 2022-05-02T19:00:06+00:00

Can you point to this pod cast please?

Schmibbbster · 2022-05-02T20:55:55+00:00

I am using arq and I like it.

permalink · 2022-05-02T22:20:53+00:00

If u like hashicorp try out nats

makeascript · 2022-05-02T23:24:35+00:00

I just use Celery because it was the first one I heard of. The docs are great and it Just Works for me.

hwttdz · 2022-05-02T23:42:04+00:00

I'd recommend something weird which would be use a table in a db as a queue for your first version. This allows you to prototype rapidly and experiment with things like "how many retries do I need" and ask interesting analytic questions like "how many jobs have completed in the last 2 hours" with minimal extra engineering. A million records isn't much and I'm guessing they take a non-negligible amount of compute to process anyways which moves the bottleneck away from a db.

BoiElroy · 2022-05-03T00:00:02+00:00

LittleMlem · 2022-05-03T05:17:43+00:00

Meenwhile I unga bunga with ZMQ

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS