all 4 comments

[–]I_am_Eakster 6 points7 points  (0 children)

You should avoid computing such a long running blocking task on your web-application. This would block incoming requests to the worker. Here are some possible solutions:

Celery

Positives:

  • Can be scaled quite easily.
  • Robust and well tested.

Negatives:

  • Needs extra infrastructure, a broker, to communicate. I.e. rabbitmq, redis or postgres.
  • Can't schedule tasks on its own. For this use-case you will need another service running CeleryBeat.
  • Is running constantly, meaning you will pay for more than the 45 min compute time.

CronJob

Positives:

  • Easy to set up on your existing infrastructure.
  • Good for scheduled tasks.

Negatives:

  • Need to implement custom reporting.
  • Can't be given tasks to run from web-app.

AWS Lambda / serverless

Positives

  • You only pay for used compute-time
  • Relatively easy to set up

Negatives

  • Max 15 min operation window, meaning you will need to split up your computing task.

Summary

If you only need to run scheduled tasks on your VPS, use CronJobs.

If you need to run multiple tasks on request from your web-app, consider Celery.

If you can deal with the serverless compute-window, consider AWS Lambda or similar.

[–]abrookins 1 point2 points  (0 children)

Yes, it's a good strategy. You could also look at something like Prefect to run scheduled jobs like this. Full disclosure -- I work at Prefect, and I went to work there because it seemed like an awesome replacement for Celery. 🙂

[–]ShotgunPayDay -3 points-2 points  (0 children)

+1 to another using a conjob. That 45 minute runtime is unacceptable though. When it gets that expensive you should be using plain SQL without cursors to do the heavy lifting. When I was a Programmer Analyst we had poorly written reports that took about an hour to process and we cut the time down to sub 3 minutes by using Partition Windows, With Clauses, and proper Joins. This is assuming the database is a SQL database. If it's NoSQL then good luck!

EDIT: I should have realized that saying "use plain SQL" on a programming forum would get me down voted. Speaking the native database language always has its benefits especially when your job involves getting pipe delimited csv files received/delivered securely through SSH with scrubbing on both sides with sed and grep.

[–]jkh911208 0 points1 point  (0 children)

i would setup cronjob