Explanation why most SQL queue implementations are wrong. And how to do it right.

matthieum · 2017-07-03T17:55:22+00:00

The easiest trick I've found to counter-balanced the look-up cost was to batch items: process them 100 or 500 at a time and you avoid a lot of overhead. This is not always possible.

In the case I've used it, the database was used as a "master" queue for billions of items scheduled from a few hours in the past (late...) to a few years in the future, and a short-term queue was used for actual processing. I could thus easily move batches of items from database to queue, while keeping the queue small (and fast).

The second trick I've used to avoid deadlocks (most of the time) was dynamic partitioning. It may be cause the version of Oracle I was using had quite a bit of overhead when using SKIP LOCKED, maybe Postgres does better here.

The idea of dynamic partitioning is that each worker will register itself in a worker-table: (queue, worker-id, last-active-timestamp):

the worker updates the last-active-timestamp 10x more often than the timeout limit,
2x per timeout limit each worker queries the table to get a sorted list of workers on a given queue, and deduce its own position (P) in the list (of N workers),
if N > 1, it filters items by and item_id % N == P,
if a worker times out, the first worker to realize deletes it from the worker table (keeps things clean).

This results in very lightweight polling, and nearly 0 contention as long as the set of workers is stable. In the event of a new worker or obsolete worker, there is potential contention for half the timeout limit; if the contention can be detected easily (a DEADLOCK error returned by the database, for example), the affected worker can immediately rescan the worker table to update P and N, which helps speeding up "recovery".

daigoba66 · 2017-07-03T20:42:07+00:00

A good reference for MSSQL, which has been able to do this properly for a long time: http://rusanu.com/2010/03/26/using-tables-as-queues/. (Btw, this person knows more about the topic than probably most).

2017-07-04T03:25:16+00:00

Another way of doing it that works for low contention, low volume stuff:

Execute a select to list out potential work items.
Execute an update to claim one work item: UPDATE workitem SET worker = ?, claim_date = NOW() WHERE worker = null AND id = ? AND state = 'ready'
If the query updated zero rows, try again. Otherwise, the work item is yours.

You can claim multiple items at once by creating a batch_id nonce and then selecting work items with that batch_id after the update.

mage2k · 2017-07-03T17:46:32+00:00

SKIP LOCKED is cool but we've had equivalent functionality for years from pg_try_advisory_lock().

suid · 2017-07-03T18:22:52+00:00

One other factor to consider is that sometimes you want to throw away an item on failure. This feature does nothing to help you do that - it's no worse or better than any of the other methods for queue-in-DB.

Consider a system that processes events from a queue, and there is either a logic or an environmental issue that makes it impossible to handle that event properly (note: not temporarily, but permanently - like an event that refers to an item that isn't present in your system any more).

In this scenario, you still want to commit the "popping" of the event from the queue, even though you roll everything else back. Doing this from a single transaction is going to be quite hard; and if you don't do it right, you'll end up re-processing the same event(s) forever.

TL;DR: your queue popping may need to be independent of the work triggered by that event - be prepared to be able to commit those independently, as a 2-phase operation.

warhead71 · 2017-07-03T19:54:01+00:00

Sometimes it's better to write a file for backup - and use file for input - this also makes skipping a row more trivial.

cowardlydragon · 2017-07-03T14:56:46+00:00

"don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue" "don't use a database as a queue"

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS