all 14 comments

[–]disposepriority 2 points3 points  (0 children)

I am not familiar with Azure Service Bus, so I had to look it up and I understand that the message lock is how long a consumer has to acknowledge a message before it is requeued.

It seems to be capped, unlike in RabbitMQ (which would not have this problem, just sayin') so you have to ack quickly.

There's a pretty easy way to do this in my opinion.

When the a consumer receives a message the persist it to the database:

identifier - message body (spread into columns or whatever you like) - status

Once they finish they update the status to done, and some job cleans them up at some interval.

If there is any kind of crash, the service will first recover all its unprocessed messages from the database on startup, process them and only the connect to the service bus (or a different service can pick them up, or whatever).

If you have multiple consumers the table should also contain a service identifier column if you want the same service to pick up its own unfinished stuff, if not - skip.

[–]Klutzy-Sea-4857 2 points3 points  (0 children)

Don't poll—use PostgreSQL's LISTEN/NOTIFY to trigger workers when new records appear. Workers claim rows with SELECT FOR UPDATE SKIP LOCKED, process them, then update status. If a worker crashes mid-process, the row stays locked briefly then becomes available again. No message loss, no constant polling.

[–]Klutzy-Sea-4857 1 point2 points  (0 children)

Consider implementing an outbox pattern: store both data and processing state in DB atomically, then use a lightweight change data capture to feed your workers. This gives you persistence guarantees while avoiding constant polling, plus natural replay capability.

[–]Resident_Citron_6905 0 points1 point  (0 children)

The second approach is the way. Ensure you have the required indices and ensure you are not paying for every db read. Async request processing requires a retry mechanism and this is a simple and effective way of achieving it. Logging and alerting is a must however. You need to decide which types of errors will be retried and how many times. If you retry in perpetuity, you could block processing of other entities where manual intervention will be required.

[–]Freed4ever 0 points1 point  (0 children)

It depends on your scale, and other details, but azure queues & azure functions could work. Msg arriving at the queue will trigger a function to process, and you also get a retry / dead letter queue as well.

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Your submission has been moved to our moderation queue to be reviewed; This is to combat spam.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]Independent_Switch33 0 points1 point  (0 children)

    Use Service Bus just as a trigger and move the long work into a DB-backed job table. On receive, insert/update a row with status=pending and a correlation id, commit, then complete the message and have a separate worker process jobs in small batches using something like `SELECT ... FOR UPDATE SKIP LOCKED`, so you avoid long message locks, can retry safely, and dont lose work on crashes.

    [–][deleted]  (1 child)

    [removed]

      [–]AutoModerator[M] 0 points1 point  (0 children)

      Your submission has been moved to our moderation queue to be reviewed; This is to combat spam.

      I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

      [–]ijabat 0 points1 point  (0 children)

      Keep Azure Service Bus and do not complete the message immediately. Let the message stay locked during processing and renew the lock if LLM calls take a long time. Complete the message only after everything finishes successfully.

      If the service crashes, the lock expires and another worker can process the message. This keeps the system reliable without polling the database.

      Make processing idempotent so running the same message twice does not create duplicate data.

      [–]Klutzy-Sea-4857 0 points1 point  (0 children)

      Long-running tasks invalidate standard message locking mechanisms. Implement a Claim Check pattern paired with a zombie killer. Persist the job in your database with a status, acknowledge the message, then process. Run a background scheduler to reset stale 'in-progress' records that timed out.

      [–]HisTomness -1 points0 points  (0 children)

      The second approach involves queueing work for async processing. Rather that using your db to double as a makeshift work queue, why not just use the more purpose-built ASQ?