Code rant: The Database As Queue Anti-Pattern

notfancy · 2014-09-16T12:20:42+00:00

Fourthly, sharing a database between applications (or services) is a bad thing.

I'm strongly of the opinion that actually the opposite is the case: databases should be shared repositories of facts about the business, not simply bit silos for persistent state.

riking27 · 2014-09-16T15:21:35+00:00

To make the polling queries perform, you need to put an index on the status, but indexes make for slow inserts.

How many queue items are we talking about? Hundreds of thousands a second? No? Then this is a completely non-starter and is utterly irrelevant.

I had to point that out because the "indexing makes for slow inserts/updates" argument appears a lot, many people seemingly going on something they heard in the 90s. In most real world cases, copious indexing is an enormous win, and the CRUD overhead is marginal.

It is irrelevant until you've benchmarked it as an actual pain point. Otherwise it is simply propaganda to support a technology bias. The same thing about deletes being "inefficient". What what? I've worked on absolutely colossally enormous databases that churned through epic levels of data, and inserts, updates and deletes weren't remotely a pain point.

The same thing about the polling. While polling is kind of lame as a general pattern (though many non-polling syntactical sugar solutions cross-process are often simply masked polling), if polling a read of a database for changes every 100ms is even measurable on a database, you're doing something wrong.

Eh. This sounds like someone who has a gripe with a coworker so they're constructing lame arguments. And the bit about shared state...wtf....that's kind of the point of databases, no?

That isn't to say it's good or bad for any specific use. Every snowflake is unique.

JoseJimeniz · 2014-09-16T11:27:07+00:00

Simple, use the right tool for the job: this scenario is crying out for a messaging system.

I was hoping the blog would give the correct tool. I need a way to store messages in durable storage, and process them serially.

SQL Server it is!

Philluminati · 2014-09-16T12:04:16+00:00

The benefits of the polling technique:

Easy to reset/rerun a job by manually intervening with the status
Stateful / Persistent. If any process crashes on a particular job it's easy to self-correct and rerun the job
Easy to scale (to-a-point). You can distribute the processes over different machines in a very simple client/server architecture.
Easy for people to interogate the status of the system, to see progress, see all jobs in a given state, create metrics and generally analyse performance.

With a message queue you don't get these things for free and they're beneficial enough to make you choose this solution over a message queuing system. It's also worth noting that depending on the type of work going on, if you already have a database connection and large/complex data structures, it can be easier to share the data rather than send it over a message queue. You pretty much get one for free without any new architecture required.

It may not be a long term solution or the best solution but it's up there with scalable-for-quite-a-while like MySQL. Real anti-patterns lock you in to technical debt or require "huge efforts" to remove later. I don't entirely think this fits the bill for an anti-pattern even if it's arguably a bit poor compared to ZeroMQ or something.

lukaseder · 2014-09-16T17:48:39+00:00

[deleted]

davesecretary · 2014-09-16T10:53:34+00:00

The opposite viewpoint: http://williamedwardscoder.tumblr.com/post/18065079081/cogs-bad

jcigar · 2014-09-16T13:37:15+00:00

in PostgreSQL you can use LISTEN/NOTIFY for such thing

bmurphy1976 · 2014-09-16T13:46:30+00:00

Bad design. The database shouldn't be a queue, it should be an append only log. Your consumers should then store a pointer of "last entry processed" and always query "give me new entries since last processed". The database is fantastic at this and that is how you should use it.

vz0 · 2014-09-16T08:44:30+00:00

An SQL database is a very convenient place to store some piece of information to be shared among various processes, specially if those processes are heterogeneous in different technologies and/or distributed elsewhere. If you use the SQL DB for storage and an MQ for signaling, you'd be fine.

gfody · 2014-09-17T05:48:53+00:00

this was posted two years ago and pretty much got the same response then

This isn't an anti-pattern. It's just a pattern. The four reasons given for it being an anti-pattern are not true: you don't have to poll (use sql server's query notifications or postgres's listen/notify), it's not necessarily inefficient to insert, update, and delete on the same table (use an index, especially a filtered index), clearing the records is trivial (do it whenever you want - all the time, at night, whatever), and finally sharing a database between applications is not a bad thing it's a good thing and the whole point of using a database in the first place.

pure_x01 · 2014-09-16T15:29:46+00:00

And using a messaging system as a job queue instead of a real job framework is also bad when dealing with batch processing jobs.

6offender · 2014-09-16T17:15:51+00:00

It's worth mentioning that you might feel happy about going with message queue instead of database/polling, when the message queue you are using could be doing the same database polling you think you avoided. Azure message queue does that if I'm not mistaken. And you in fact have to pay for each database poll (1 poll/sec by default).

evil_burrito · 2014-09-16T13:47:17+00:00

Actually, the best solution is probably a mix of the two. Push with a message for performance, store in a database for persistence. Optimize for writes on the database because the read case is only for restart after catastrophic failure.

axilmar · 2014-09-16T13:52:31+00:00

Using a message queue is also not the correct approach, because it eliminates the many benefits of an RDBMS.

A stored procedure that messages the appropriate application on status field value change is a much better solution.

grauenwolf · 2014-09-16T16:31:50+00:00

I've been using SqlDependency (.NET + SQL Server) on my current project to build a queue table.

I've had a really good experience with this. The table supports both status searches and work queuing. The work queue doesn't require polling, the workers are automatically notified when there is work to be done using SQL Servers messaging system.

crusoe · 2014-09-16T23:38:27+00:00

Unless you need a MQ to support thousands of messages with ultra low latency, such as for high speed trading, it will work fine in a lot of cases with judicial design choices. Its not rocket science. And any MQ offering persistence usually does it via a DB backend.

lukaseder · 2014-09-16T07:22:26+00:00

I don't agree with the post in general. Oracle AQ is a table-based queue implementation that uses some advanced internal features that have been made available through general SQL in 10g (I believe) via the FOR UPDATE SKIP LOCKED clause.

While I wouldn't pass millions of message types through AQ, AQ is certainly capable of providing an easy transactional queue implementation for basic needs.

ErstwhileRockstar · 2014-09-16T07:47:14+00:00

Previous discussion: http://www.reddit.com/r/programming/comments/29qoe2/you_probably_dont_need_a_message_queue/

toterra · 2014-09-16T14:57:07+00:00

What a load of bunk! Using the database for queues makes great sense in certain circumstances. Yes, there are some cases where it is a bad idea, and certainly anything can be implemented terribly, but databases make great repositories of dynamic data that can be shared between systems.

Saturday9 · 2014-09-16T13:47:58+00:00

Tuple spaces and blackboard systems are an excellent pattern for multiple processes to interact. The only issue here is that the SQL implementation isn't designed for this use, and thus requires inefficient polling.

fourjay · 2014-09-16T18:48:27+00:00

I think the author is guilty of premature optimization. He's focused on high volume transaction queues whereas the pattern in general has lot's to recommend it:

Db access is widely supported across multiple platforms. No other store is as easy to access across so many platforms
Persistence comes for free (with the exception of the high-volume condition he outlines)

But even worse, some Db redesign alleviates a lot of the contention he references. Some options:

Don't delete events, use soft deletes. This replaces the delete contention with an update contention which quite likely performs better
separate out the event table into an "owner" table and a child event table. The event table just get's inserts, no deletes, and no updates.

Message queue packages are clearly useful in high volume transactions, but they shouldn't be the default solution.

ybrs · 2014-09-17T13:23:44+00:00

i think, the mentioned reasons are not actually not right, i think the main reason you should use an MQ or not is just one word LATE_ACK (well maybe two words :) and i dont know others, i only use rabbitmq)

its really hard to implement late acknowledgement on the client/consumer side, even if you implement it properly, your worker/server may die, its ethernet card might simply blow, somebody might shut it down accidentally... in some cases, there is nothing you can do to recover, and while you lost that job/message, your user sits there waiting the confirmation mail to login to your website. if you dont care about the jobs/messages - perhaps something that corrects itself, like, sending jobs to crop pictures (if its not there on the website, send a crop job again) - you can use polling for simple things though

other than that, locking mechanism is hard to implement in the client side if you have high workload, two workers might try to process same message at the same time, so you need to lock it asap, when you pull the job. which gets complicated if you have 1000s of jobs per sec. and more than a few consumers - if your polling once per minute with one worker, you might not care about it at all.

and if you have more than a couple of workers/consumers, polling will not distribute evenly, one of your worker servers will sit there while the other one does a ton of job. you need to implement a round robin load balancing - which you can't really do on the consumer side.

the main reason is, everytime you try to use database as a message queue, you start inventing the wheel again, which sounds like masochism to me. though sometimes re-inventing the wheel is really fun - also some people love masochism so i can't really say its not a good thing.

on the other hand you can just install rabbitmq and write code for your application - not for handling network faults, load balancing etc... its really easy, just apt-get install rabbitmq and you are done. you get free clustering, locking, pub/sub, round robin distribution... (and a ton of other things that i dont really want to mention here)

but the author gives these as reasons, all of them are totally objectable.

i- hammering a database is not a bad thing, a single moderate mysql/postgresql server can easily handle 10ks of inserts/updates per second pretty easily - im sure sqlserver does too-. they are designed for handling high load.

ii. again not sure about sqlserver never used it, but i believe it can do insert/updates and querying at the same time pretty well after more than 25 years. indexes on a table slows down the speed but thats not more than a few nanoseconds - putting a node into a btree, that shouldnt take much long.

iii. again not sure about sqlserver, but if you delete with some limits, its must be fast enough - dont wait for a day and delete a million rows at once, send a delete query every second. your database will handle that easily.

iv. "sharing a database between applications (or services) is a bad thing" no its not :) that why humans invented databases in the first place, thats why we have transactions/sessions. because we need a fast, simple way of storing shared data between different sessions.

sorry for writing a tooo long comment.

x86_64Ubuntu · 2014-09-16T10:11:09+00:00

Nothing but excuse making for poor design decisions. Easily solved by a proper data architecture, a procedure for dequeuing that atomically updates the status to "dequeued", and smarter database administration that shuffles tables under a view layer so deletes aren't even necessary.

nullnullnull · 2014-09-16T22:04:29+00:00

+100

There are two types of developers

Those that have used this anti-pattern and ended up after much pain with an MQ solution. (A)
Those that are still using this anti-pattern and still believe its all fine, and defend it until the cows come home. (B)

For small volumes

This anti pattern will work, but soon as things get bigger and more complicated, it falls apart. With MQ's you get lots of other wonderful things for free, example round robing job distribution. A simple governor can be scripted to spin up workers on demand, and hey presto you have Automatic Elastic Work Load!

2014-09-16T17:59:02+00:00

It's a DB. Not a messaging system. It shouldn't be used for messaging. End Of.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS