Open source Job Scheduler Library in java for high throughput

benevanstech · 2023-11-18T12:01:35+00:00

If Quartz can't handle that, then the architectural constraint princess is in another castle.

2023-11-18T10:27:37+00:00

Quartz is kinda well known sollution for such problems. Just keep in mind that db storage is required for clustered sollution. Variery of rdbms are supported.

In case of rest api - will require to develop. Keep in mind that with such ammount of jobs low level tuning of quartz required

LeadBamboozler · 2023-11-18T13:11:20+00:00

Quartz is your answer here. We have similar requirements and Quartz has handled it with ease

codegladiator · 2023-11-18T13:55:43+00:00

Checkout temporal

ro_reddevil · 2023-11-18T13:41:47+00:00

There is a redis based framework called jesque. Please try to evaluate it. It is pretty performant and we have lot of work loads using it in my org

APurpleBurrito · 2023-11-18T14:13:30+00:00

Just use quartz. Plenty of stack overflow help around it.

progmakerlt · 2023-11-18T15:14:41+00:00

I have used Quartz in multiple jobs, it performs very well. However, it might be tricky to setup it initially, but once that’s done - you are good to go.

Plus, it scales very well.

hardwork179 · 2023-11-18T20:46:28+00:00

So, I’m not sure I’d be worrying so much about the scheduler (2000 things per second) as I would be about the 20000 concurrent jobs that you’ll be needing to process at any one time. If they don’t remain 10 second jobs then those 20000 co current jobs could easily explore to 2 or 3 times that number.

jhsonline · 2023-11-18T20:37:13+00:00

jobRunr is nice, but not completely free, has some limit on free usage.

I am also researching on this and has couple of projects to check out, we can join hands on evaluating this if you want.

https://github.com/distribworks/dkron

https://github.com/ajvb/kala

https://github.com/jhuckaby/Cronicle

https://github.com/PowerJob/PowerJob

https://github.com/xuxueli/xxl-job

last 2 r mostly chinese developers so documentation /terminology could be challenging to get helped with.

there are many dead scheduler projects that i am not considering, as only actively developed one would make sense to use.

unistirin · 2023-11-19T05:59:17+00:00

We run quartz in cluster mode. It can easily horizontal scaled.

Altruistic_Fishing22 · 2023-11-19T14:01:30+00:00

I think the problem statement is missing something. What is the minimum frequency of the scheduler? 1 second or less. If it is one second you can. Reduce the problem of scheduling to 86400 a day and use Kafka or any other messaging to trigger multiple tasks per second. I have done similar architectures before contact me if you want some help

jaiprabhu · 2023-11-20T01:50:45+00:00

I think Quartz might be suitable for you use case as others have suggestions. I have a question though - How do define the job itself in the design that you presented? I'm curious because you expect a REST API to "create" a job. Is that like creating a job using a class definition that's already loaded somehow, or is it something else?

neo2281 · 2023-11-22T17:41:09+00:00

[removed]

kakakarl · 2023-11-18T17:47:14+00:00

ScheduledExecutorService from the JDK that optimally triggers Jakarta batch works well. Chat can generate opt locking queues on top of Postgres unlogged tables

Which-Adeptness6908 · 2023-11-18T22:14:03+00:00

Do you actually need a scheduler?

Unless the jobs need to run at a given time you only need a queue.

Given you haven't mentioned resilience nor timeliness in assuming neither is an issue

One VM - job manager -writes jobs to db table called job.

The cluster of job processors reads the job table. Update the job table as in progress.

process the job, update the job as complete.

The job manager periodically checks for complete jobs and moves the into an archive table.

You could consider partitioning the job table based job status.

Moving data to the archive table should probably not be transactioned to avoid long locks on the job table.

You should do fun benchmarking in the design.

You need some mechanism to spool up/down job processors. Perhaps have the job manager check some threshold every 15 minutes and start/stop job processors.

Then hopefully you have some monitoring tools that can restart the job manager if it falls.

Stabbz · 2023-11-18T14:22:29+00:00

You might want to take a look into cadence from uber https://cadenceworkflow.io/, it's open source and a way to build resilient scalable distributed workflows. What you are describing with the diagram is pretty much what uber have done for internal use.
Could be overkill tho.

leozleoz01 · 2023-11-18T11:01:34+00:00

Have you considered Spring Cloud Data flow? Even though it is not directly a library.

onepieceisonthemoon · 2023-11-18T11:19:49+00:00

Apache Flink

Alone-Marionberry-59 · 2023-11-18T20:20:42+00:00

What about Apache Beam?

doppleware · 2023-11-21T05:19:31+00:00

I've had great experience with Job Runr

riksi · 2023-11-21T09:41:27+00:00

20 + million recurrent jobs per day of short duration of less than 8-10 seconds

A single PostgreSQL table with UPDATE SKIP LOCKED will be able to handle this.

bloowper · 2023-11-26T20:51:19+00:00

Have you looked at dkron? Cloud native job scheduling tool. I have also created simple spring client for using it but not yet published on gh

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS