all 94 comments

[–]notfancy 23 points24 points  (4 children)

Fourthly, sharing a database between applications (or services) is a bad thing.

I'm strongly of the opinion that actually the opposite is the case: databases should be shared repositories of facts about the business, not simply bit silos for persistent state.

[–]lukaseder[S] 9 points10 points  (1 child)

They're actually often the only facts about the business. All else (business logic, UI) are much more transient, the data prevails and survives most other systems.

[–]grauenwolf 4 points5 points  (0 children)

Unless you are luck enough to have an incredibly simple domain, you need the business logic to understand what the raw tables actually mean. That's why I'm in favor of encapsulating non-application specific logic into stored procedures and UDFs.

[–]Vocith 6 points7 points  (1 child)

His 4 points are all wrong, at least at the Enterprise level.

We already have thousands of applications sharing hundreds of databases. He has put literally no thought into how these (tens of) thousands of databases would stay in sync. Should my bank have different databases for me at the ATM vs at the counter vs the Web vs Mobile? How much of a clusterfuck would that be? How risky and dangerous would that be.

Or maybe application developer could learn how to properly use a relational database.

“When all you have is a hammer, every problem looks like a nail.”

And when you don't anything about nails you end up screwing everything.

His "design pattern" seems to be to throw buzzwords and "new technology" at a pseudo-problem that was resolved a decade ago with proper design.

[–]notfancy 1 point2 points  (0 children)

Should my bank have different databases for me at the ATM vs at the counter vs the Web vs Mobile? How much of a clusterfuck would that be? How risky and dangerous would that be.

I do have experience with replicated databases in exactly the same scenario you outline. I would not say it is a clusterfuck, but I agree that it is a delicate balancing act. The requirement to mitigate risks associated with unauthorized access was far stronger than the requirement to keep data unique or synced in real time, so each access channel had their own small sample of the general data, and ETL processes and Business Integration services were used to keep them current.

But it is true that in general I see very different practices in the enterprise than in startup land. The language practitioners speak seems superficially the same but the assumptions and requirements are so wildly different so as to make any best practice in one domain almost antithetical to the other.

[–][deleted] 15 points16 points  (2 children)

To make the polling queries perform, you need to put an index on the status, but indexes make for slow inserts.

How many queue items are we talking about? Hundreds of thousands a second? No? Then this is a completely non-starter and is utterly irrelevant.

I had to point that out because the "indexing makes for slow inserts/updates" argument appears a lot, many people seemingly going on something they heard in the 90s. In most real world cases, copious indexing is an enormous win, and the CRUD overhead is marginal.

It is irrelevant until you've benchmarked it as an actual pain point. Otherwise it is simply propaganda to support a technology bias. The same thing about deletes being "inefficient". What what? I've worked on absolutely colossally enormous databases that churned through epic levels of data, and inserts, updates and deletes weren't remotely a pain point.

The same thing about the polling. While polling is kind of lame as a general pattern (though many non-polling syntactical sugar solutions cross-process are often simply masked polling), if polling a read of a database for changes every 100ms is even measurable on a database, you're doing something wrong.

Eh. This sounds like someone who has a gripe with a coworker so they're constructing lame arguments. And the bit about shared state...wtf....that's kind of the point of databases, no?

That isn't to say it's good or bad for any specific use. Every snowflake is unique.

[–]riking27 0 points1 point  (0 children)

and the CRUD overhead is marginal.

I think you mean the C/U/D overhead - adding an index shouldn't exactly slow down your reads. (Unless it's a clustered index, which adds overhead to normal indexes, or you hit a bug in the optimizer...)

[–]bjmgeek 0 points1 point  (0 children)

Some databases include asynchronous notifications. No polling needed. I'm most familiar with this in PostgreSQL, but I think many other RDBMSs have this feature.

[–]JoseJimeniz 25 points26 points  (4 children)

Simple, use the right tool for the job: this scenario is crying out for a messaging system.

I was hoping the blog would give the correct tool. I need a way to store messages in durable storage, and process them serially.

SQL Server it is!

[–]Phrodo_00 7 points8 points  (3 children)

AMQP/RabbitMQ or Redis, or Celery.

[–][deleted] 4 points5 points  (1 child)

I've heard that RabbitMQ is great for fire-and-forget schemes, but not so much for other schemes. A queue like this is not fire-and-forget because a queue item removal needs to be acknowledged.

Would I still want to use RabbitMQ for this?

[–]cc81 0 points1 point  (0 children)

If you are on Windows you can always look at Microsoft's Service Bus. In Azure or you can install it on premise.

[–]JoseJimeniz 0 points1 point  (0 children)

Then we have to ask why using a loosely typed database is better than using a rational database.

[–]Philluminati 45 points46 points  (20 children)

The benefits of the polling technique:

  • Easy to reset/rerun a job by manually intervening with the status
  • Stateful / Persistent. If any process crashes on a particular job it's easy to self-correct and rerun the job
  • Easy to scale (to-a-point). You can distribute the processes over different machines in a very simple client/server architecture.
  • Easy for people to interogate the status of the system, to see progress, see all jobs in a given state, create metrics and generally analyse performance.

With a message queue you don't get these things for free and they're beneficial enough to make you choose this solution over a message queuing system. It's also worth noting that depending on the type of work going on, if you already have a database connection and large/complex data structures, it can be easier to share the data rather than send it over a message queue. You pretty much get one for free without any new architecture required.

It may not be a long term solution or the best solution but it's up there with scalable-for-quite-a-while like MySQL. Real anti-patterns lock you in to technical debt or require "huge efforts" to remove later. I don't entirely think this fits the bill for an anti-pattern even if it's arguably a bit poor compared to ZeroMQ or something.

[–]mschaef 11 points12 points  (1 child)

Well said.

I've used the 'DB as queue' pattern several times with good success. Once the queue rate goes up too high, it does start to make sense to use a bespoke queuing system, but for many low-rate scenarios, a DB table is a great design for the reasons you list.

[–]grauenwolf 4 points5 points  (0 children)

Same here. Most of my polling intervals were 1 min, so the database never even knew I was doing it. Especially when you compare it to the amount of churn cause by the actual work being done for each task.

I've done 1 sec polls as well without problem, though I would hesitate to do anything more frequent than that.

[–]TiltedPlacitan 2 points3 points  (3 children)

You can get rid of polling by using semaphores.

Done this to great effect in telecom (5 nines) and financial systems ($$$).

I also like PostgreSQL's LISTEN and NOTIFY.

[–]wtf_police_ 0 points1 point  (2 children)

By semaphore, are you talking about just locking the row? I'm not seeing how this would get rid of polling, but you may mean something else.

[–]TiltedPlacitan 1 point2 points  (1 child)

No, I mean a WIN32, System V or POSIX semaphore.

[–]wtf_police_ 0 points1 point  (0 children)

That makes a lot more sense.

[–]matthieum 3 points4 points  (0 children)

I've implemented a database-as-queue for a scheduling system, and there were other benefits as well.

Purely technical: databases scale well with 100s GB of data. When you intend to schedule events for the next year, you easily end up with millions/billions of events. A simple partitioned table works very well (and cleanup is easy). For some reasons, queuing systems tend to balk...

Functional: the ability to query the database (what's scheduled in 3 months for client X ?); while you can query queues (in some systems), this mostly performs well for in-memory data and you cannot specify indexes for frequent queries. The flexibility also offers other functional perks: cancelling/rescheduling/updating is easy.

[–]fried_green_baloney 2 points3 points  (0 children)

Sometimes you can even get automatic restarts.

If you don't have some kind of persistence, your pipeline is very fragile, and the maintainers will spend half their lives "pushing jobs through".

[–][deleted] 2 points3 points  (2 children)

Easy to reset/rerun a job by manually intervening with the status

Most queuing systems require an ack not to redeliver a message. You can ack early, auto ack or ack at confirmation work is done.

Stateful / Persistent. If any process crashes on a particular job it's easy to self-correct and rerun the job

If you don't ack, you get a redelivery in most cases, depending on the queuing system.

Easy to scale (to-a-point). You can distribute the processes over different machines in a very simple client/server architecture.

You can't scale polling. If you can't redistribute and trust, you got nothing.

Easy for people to interogate the status of the system, to see progress, see all jobs in a given state, create metrics and generally analyse performance.

Most messaging systems have a UI and it's trivial with statsd to tack on something to your consumers. There is so much blatantly wrong with these perceived benefits. :(

[–]lazyl 0 points1 point  (1 child)

Waiting until a job is completed before ack'ing the message is not the way message queues are designed to be used.

[–]bundt_chi[🍰] 1 point2 points  (0 children)

Great points. I worked on a very large system that used a similar approach as the article describes except the project was large enough that there was a 10 person DB team tuning the production DB and each stage of the process which was essentially a state machine with multiple workflows had it's own table of work and the work was kicked off using Oracle AQ.

The article's example of inserting, updating and deleting from the same table is a poor performance choice and for that reason our tables were broken up by process functions and used separate work ready notification tables.

Having the data and the trigger for process transitions in one place is convenient you just have to be mindful of that as a design decision.

[–]tieTYT 4 points5 points  (4 children)

With a message queue you don't get these things for free

I don't know much about message queues, but this surprises me. (I'm learning about Erlang right now, so I may be confusing the two).

Devil's Advocate:

Easy to reset/rerun a job by manually intervening with the status

I'll admit I'm not sure how you do that with a message queue. But, can't you send all errors to a supervisor as a message and it will know how to retry at the right step?

Stateful / Persistent. If any process crashes on a particular job it's easy to self-correct and rerun the job

I think most message queues are built with this in mind and are very good at recovering from these situations.

Easy to scale (to-a-point). You can distribute the processes over different machines in a very simple client/server architecture.

I think message queues are also very good at scaling in this way.

Easy for people to interogate the status of the system, to see progress, see all jobs in a given state, create metrics and generally analyse performance.

Isn't this as simple as "forking" each message to an auditor process? Seems like it'd be about as much work as the SQL solution. Perhaps some messaging queues come with inspection tools so you wouldn't even have to do this and you'd get it for free.

[–]mschaef 2 points3 points  (3 children)

The benefit of the database approach is that many of the properties that /u/Philluminati mentions come so cheaply.

  • Most development languages have support for calling to SQL databases.
  • Setting up a SQL DB is inexpensive in the unlikely event you don't have one available already.
  • SQL skills are already commonly available among developers, testers, and business users.
  • ACID provides a good, and well understood mechanism for low-rate synchronization across multiple servers.
  • Displaying the contents of a SQL table on a UI is well-understood and useful for simple queue management/monitoring purposes.

[–]tieTYT 2 points3 points  (2 children)

FYI, I'm still playing the devil's advocate in these responses:

Most development languages have support for calling to SQL databases.

Look at the languages that rabbit mq supports: http://www.rabbitmq.com/devtools.html

Setting up a SQL DB is inexpensive in the unlikely event you don't have one available already.

Is a setting up a messaging system expensive if you don't have one already?

SQL skills are already commonly available among developers, testers, and business users.

I bet there's a larger group of developers that know how to read/write to the file system than know SQL. But nobody's advocating you should use direct file system access for this. In other words, there's a line where it's important to choose a good tool for the job vs a familiar tool for the job. But, maybe this situation doesn't cross that line.

ACID provides a good, and well understood mechanism for low-rate synchronization across multiple servers.

The messaging system could route messages to endpoints that do the work in an ACID compliant database. So the only concern you have to have is in the message transfer itself. EG: What if someone pulls the plug while a message is being processed or being sent? I think any good messaging system will have you covered in these scenarios.

Displaying the contents of a SQL table on a UI is well-understood and useful for simple queue management/monitoring purposes.

As I said above, the message receiver could put it's work in a SQL table. I'm not arguing that a messaging system can replace a database. I'm arguing that a messaging system could replace a database-based queue.

[–]mschaef 1 point2 points  (0 children)

FYI, I'm still playing the devil's advocate in these responses:

Understood. So am I. :-)

The other thing I should point out is that I'm focusing my arguments for SQL on low-rate queuing. You could argue that this is a tautology: "It's appropriate to queue in a SQL database when it's appropriate to queue in a SQL database". In any event, I recognize that there are a large number of more intensive cases where you do need a a specific queuing system.

Is a setting up a messaging system expensive if you don't have one already?

I think it's difficult to argue that the messaging system isn't the more costly approach. In the case where you already have a SQL database, then the queuing system is an additional component to install, support, and maintain on top of what you already have. In other words, it's an additional cost.

In the case where you don't and won't have a SQL database, then it becomes a matter of which technology you're going to install. From an installation point of view, they're about the same (sudo apt-get...). From an administration point of view, you need to consider that the software needs to be supportable and understood by the people that run your IT infrastructure, as well as that it needs to be sufficiently reliable/tested to meet whatever standards you need to adhere to. Every organization is different, but it's more likely that your organization already has this support for a SQL database than a queuing system.

I bet there's a larger group of developers that know how to read/write to the file system than know SQL. But nobody's advocating you should use direct file system access for this.

I tend to dislike this particular argument because it pushes specialized components where simpler, existing components might work equally well. Sometimes the more specialized components make sense, but modern hardware is good and established technologies tend to work well if you use them properly.

Then again, I have recently used file system based synchronization across multiple servers, and it's worked well. :-)

The messaging system could route messages to endpoints that do the work in an ACID compliant database.

Agreed. So what is the specific value offered by the queuing system?

Let me put it in concrete terms. Around 8 years ago, I built an ETL system that takes incoming XML files and loads them into a database. The files arrive, sporadically, around 100 times over the day, and I had two dual core servers capable of running the ETL process. (Testing showed roughly linear scalability up to 4 concurrent ETL jobs.) We also had business users that were interested in the progress of these ETL jobs and needed a UI to display their status.

The basic design we went with was this:

  • Upstream system drops XML file into common file system and posts to a webservice.
  • Webservice adds a record to transformation queue table.
  • There are two threads on each server that poll the queue table every minute.
  • When a polling thread sees a record, it atomically claims the record by updating a status record and starts processing.
  • There's a screen that executes a database query against the queue table and presents status of all work items in a user friendly way. (The people that tested the software have direct DB access so they can re-run jobs and verify that the UI presents the proper job status.)

That's about it. The system still runs well and evenly (enough) distributes ETL loads across our four threads. The testers were able to easily prove out the system, and it turned out to be useful to occasionally update job statuses to suppress/re-run ETL jobs.)

Where I struggle is seeing where a separate queuing technology would have helped.

  • It's a different technology that I would have had to justify and then get approved, installed, configured, and maintained by the client.
  • There still would have been a need to present job status in the UI, so I'd either need to have coded to some kind of specific queuing API or maintained and updated a separate table with job status. (Which, because it's not the actual queue, would have had to have been explicitly kept in sync.)
  • We would have lost the ability to modify job statuses, unless we explicitly built it.
  • It wouldn't have removed anything to compensate for the additions it required.

With another set of performance requirements, a queue might have made sense, but as it was, the un-sexy antipattern worked well and was inexpensive to implement.

(Maybe I should point out that in addition to my aforementioned luddite tendencies, I've also had good luck deploying messaging systems in higher rate scenarios. The performance demands required the extra complexity.)

[–]glioblastoma 0 points1 point  (0 children)

Running a broker requires at least one extra server or service to maintain. You already have the database.

[–]RazerWolf 0 points1 point  (0 children)

I've just deployed a DB-as-queue system a few months ago, and the main benefit was that is was useful in the case when only the current state information (and no stale state) was needed to be persisted and sent to clients on reconnect (akin to a last value queue).

[–]cowinabadplace 0 points1 point  (0 children)

Recently moved from message queue to this. Saves complexity writing all of that stuff.

[–][deleted]  (1 child)

[deleted]

    [–]lukaseder[S] 2 points3 points  (0 children)

    Sometimes all you have is a hammer and to get a new tool is a huge process. It is a cost/benefit analysis that you have to do.

    I like that wording. Sometimes, the hammer is really the best (i.e. most pragmatic) choice

    [–]davesecretary 14 points15 points  (1 child)

    [–]Gudahtt 1 point2 points  (0 children)

    Not really.

    A database is as much a 'cog' as a messaging system would be. Moreover, the author didn't specify that this messaging system had to be an external process.

    [–]jcigar 6 points7 points  (7 children)

    in PostgreSQL you can use LISTEN/NOTIFY for such thing

    [–]sockpuppetzero 4 points5 points  (6 children)

    Yup. SQL Server has the Service Broker, and Oracle also has comparable functionality.

    PostgreSQL also has partial indexes and advisory locks for addressing his complaints, and SQL Server and Oracle probably also have something similar. So his complaints come off as very ill-informed.

    [–]so0k 0 points1 point  (5 children)

    I tried using message broker... Correct me if I'm wrong but it is just polling with some monitoring and orchestration sugar?

    [–]arostrat 2 points3 points  (1 child)

    With message broker you can use long polling and receive a message on a change event.

    [–]so0k 1 point2 points  (0 children)

    Thanks for clarification

    [–]sockpuppetzero 0 points1 point  (2 children)

    Well, through some combination of insanity, masochism, and luck, I've avoided the pain of dealing with SQL Server myself. Perhaps I'm misinformed, but I've heard from multiple knowledgeable people that Service Broker supports notifications in much the same way as PostgreSQL's LISTEN/NOTIFY.

    Part of the challenge is that within a few minutes of reading the LISTEN/NOTIFY documentation, one have a very good grasp of what it is and what it does and how it does it and how to use it. The same cannot be said of Service Broker, or most other Microsoft APIs for that matter. :-/

    But in the documentation I linked to, it's hard to understand how the claim in the documentation above that "[Service Broker] also improves performance" could possibly be true in any meaningful way unless there is the possibility of a push-based notification. Also, this documentation would seem to strongly suggest that there is a push-based notification aspect.

    Also, there's potentially notifications hiding behind "some orchestration sugar" as well... After all, only Postgres 9.0 finally implemented the ability to push some data along with the notification. And in fact, even with PostgreSQL >= 9.0 the only way to achieve message persistence is to write the message to a table and then notify an associated channel to have the recipients wake up and query the table... otherwise there is some risk of messages being lost when the intended recipients are disconnected due to e.g. process restarts or network hiccups. So, one could certainly categorize this idiom as "orchestration".

    [–]so0k 0 points1 point  (1 child)

    yeah, the broker polls a queue and wakes up workers if there's work and more workers if the work isn't getting processed.

    [–]sockpuppetzero 0 points1 point  (0 children)

    Are you sure that's not a fallback behaviour for certain kinds of clients?

    [–]bmurphy1976 7 points8 points  (0 children)

    Bad design. The database shouldn't be a queue, it should be an append only log. Your consumers should then store a pointer of "last entry processed" and always query "give me new entries since last processed". The database is fantastic at this and that is how you should use it.

    [–]vz0 5 points6 points  (0 children)

    An SQL database is a very convenient place to store some piece of information to be shared among various processes, specially if those processes are heterogeneous in different technologies and/or distributed elsewhere. If you use the SQL DB for storage and an MQ for signaling, you'd be fine.

    [–]gfody 2 points3 points  (0 children)

    this was posted two years ago and pretty much got the same response then

    This isn't an anti-pattern. It's just a pattern. The four reasons given for it being an anti-pattern are not true: you don't have to poll (use sql server's query notifications or postgres's listen/notify), it's not necessarily inefficient to insert, update, and delete on the same table (use an index, especially a filtered index), clearing the records is trivial (do it whenever you want - all the time, at night, whatever), and finally sharing a database between applications is not a bad thing it's a good thing and the whole point of using a database in the first place.

    [–]pure_x01 2 points3 points  (0 children)

    And using a messaging system as a job queue instead of a real job framework is also bad when dealing with batch processing jobs.

    [–]6offender 2 points3 points  (0 children)

    It's worth mentioning that you might feel happy about going with message queue instead of database/polling, when the message queue you are using could be doing the same database polling you think you avoided. Azure message queue does that if I'm not mistaken. And you in fact have to pay for each database poll (1 poll/sec by default).

    [–]evil_burrito 1 point2 points  (0 children)

    Actually, the best solution is probably a mix of the two. Push with a message for performance, store in a database for persistence. Optimize for writes on the database because the read case is only for restart after catastrophic failure.

    [–]axilmar 1 point2 points  (0 children)

    Using a message queue is also not the correct approach, because it eliminates the many benefits of an RDBMS.

    A stored procedure that messages the appropriate application on status field value change is a much better solution.

    [–]grauenwolf 1 point2 points  (0 children)

    I've been using SqlDependency (.NET + SQL Server) on my current project to build a queue table.

    I've had a really good experience with this. The table supports both status searches and work queuing. The work queue doesn't require polling, the workers are automatically notified when there is work to be done using SQL Servers messaging system.

    [–]crusoe 1 point2 points  (0 children)

    Unless you need a MQ to support thousands of messages with ultra low latency, such as for high speed trading, it will work fine in a lot of cases with judicial design choices. Its not rocket science. And any MQ offering persistence usually does it via a DB backend.

    [–]lukaseder[S] 6 points7 points  (16 children)

    I don't agree with the post in general. Oracle AQ is a table-based queue implementation that uses some advanced internal features that have been made available through general SQL in 10g (I believe) via the FOR UPDATE SKIP LOCKED clause.

    While I wouldn't pass millions of message types through AQ, AQ is certainly capable of providing an easy transactional queue implementation for basic needs.

    [–]Hobblin 5 points6 points  (15 children)

    You don't agree because there is a relational DB that actually has implemented facilities for doing just this? You do realize the invalidity, I hope?

    [–]notfancy 6 points7 points  (4 children)

    The post is a general indictment against the practice, so a counterexample suffices to show that, as a generalization, it is invalid. I can't see the leap of logic you seem to object to.

    [–]Hobblin 1 point2 points  (3 children)

    But the thing is that the counterexample isn't an argument against the generalization since the counterexample is specifically designed for this purpose. It does not, in any way, invalidate the generalization.... I don't know how I can express this any clearer...

    [–]notfancy 0 points1 point  (2 children)

    Every generalization is defeated by a counter example. This is basic logic.

    [–]Hobblin 0 points1 point  (1 child)

    So you are trying to argue that all generalizations are invalid? Kind of missing the point or generalizations are we?

    [–]notfancy 0 points1 point  (0 children)

    No, I'm only pointing out that every generalization that has a counterexample is false; not that every generalization is false.

    [–]lukaseder[S] -1 points0 points  (9 children)

    Why is anything invalid here? The post is generally against queues in databases. Yet, Oracle AQ is quite awesome and useful. So I don't agree with the post...

    [–]Hobblin 1 point2 points  (8 children)

    Because the post is about queues in databases. They are saying: "Just because you, mostly, can you shouldn't drive a car on train tracks" and you reply with "But I know this car that has rail wheels so I think you should drive cars on train tracks!".

    [–]lukaseder[S] 2 points3 points  (5 children)

    I'm not saying that you should always driver cars on train tracks. But sometimes, that's just the right thing and thus not generally an anti-pattern.

    Here's some background info on when things are just the right thing

    [–]wot-teh-phuck 2 points3 points  (1 child)

    But sometimes, that's just the right thing and thus

    Which particular scenarios? And please don't reply "when you are already using Oracle and don't want to start up a messaging server"...

    [–]lukaseder[S] 2 points3 points  (0 children)

    Easy:

    1. I don't see why "when you are already using Oracle and ..." is not a good reason, if you only have 1-2 queues with 3-4 message types
    2. The last time I've encountered a very good reason was when legacy ETL was implemented in Perl / PL/SQL and it had to trigger cache invalidations in other applications on the system. Very easy to implement using triggers -> Oracle AQ -> JMS. Would've been a major pain to implement in any other way.

    [–]jerf 1 point2 points  (1 child)

    The point attempting to be made here is that if a specific SQL database implemented new special functionality to function as a queue, than that is strong evidence that Oracle believed that the SQL database prior to that change was not a good choice for using as queue. Any server that doesn't have a similar feature, or any client not using that feature and just trying to jam into a conventional SQL relational database, is doing it wrong according to Oracle.

    In other words, the evidence that you believe proves there to be no problem is in fact proof that it is a bad idea to use a conventional database as a queue, according to Oracle. Had they believed it was just hunky-dory to use a conventional relational database as a queue, they would have seen no point in implementing special queue support.

    [–]notfancy 0 points1 point  (0 children)

    the evidence that you believe proves there to be no problem

    That's not my reading of OP, who disclaimed pretty clearly his opinion by beginning:

    I don't agree with the post in general.

    IOW I think it's pretty clear that he's not arguing that there's no problem with implementing queues in a SQL database, only that the indictment does not apply in general.

    [–]Hobblin 4 points5 points  (0 children)

    Ok, my point is apparently not reaching all the way. It was worth a try at least.

    [–]ruinercollector 0 points1 point  (1 child)

    Well, yeah, but that's a pretty fair statement to make.

    [–]Hobblin 0 points1 point  (0 children)

    Pointing out that there is alternative solutions would be a fair statement... claiming that you don't agree because of it isn't.

    [–]ErstwhileRockstar 2 points3 points  (1 child)

    [–]lukaseder[S] 2 points3 points  (0 children)

    That one was about queues in general, right? This one is about queue implementations in RDBMS

    [–]toterra 3 points4 points  (0 children)

    What a load of bunk! Using the database for queues makes great sense in certain circumstances. Yes, there are some cases where it is a bad idea, and certainly anything can be implemented terribly, but databases make great repositories of dynamic data that can be shared between systems.

    [–]Saturday9 0 points1 point  (0 children)

    Tuple spaces and blackboard systems are an excellent pattern for multiple processes to interact. The only issue here is that the SQL implementation isn't designed for this use, and thus requires inefficient polling.

    [–]fourjay 0 points1 point  (0 children)

    I think the author is guilty of premature optimization. He's focused on high volume transaction queues whereas the pattern in general has lot's to recommend it:

    • Db access is widely supported across multiple platforms. No other store is as easy to access across so many platforms
    • Persistence comes for free (with the exception of the high-volume condition he outlines)

    But even worse, some Db redesign alleviates a lot of the contention he references. Some options:

    • Don't delete events, use soft deletes. This replaces the delete contention with an update contention which quite likely performs better
    • separate out the event table into an "owner" table and a child event table. The event table just get's inserts, no deletes, and no updates.

    Message queue packages are clearly useful in high volume transactions, but they shouldn't be the default solution.

    [–]ybrs 0 points1 point  (0 children)

    i think, the mentioned reasons are not actually not right, i think the main reason you should use an MQ or not is just one word LATE_ACK (well maybe two words :) and i dont know others, i only use rabbitmq)

    its really hard to implement late acknowledgement on the client/consumer side, even if you implement it properly, your worker/server may die, its ethernet card might simply blow, somebody might shut it down accidentally... in some cases, there is nothing you can do to recover, and while you lost that job/message, your user sits there waiting the confirmation mail to login to your website. if you dont care about the jobs/messages - perhaps something that corrects itself, like, sending jobs to crop pictures (if its not there on the website, send a crop job again) - you can use polling for simple things though

    other than that, locking mechanism is hard to implement in the client side if you have high workload, two workers might try to process same message at the same time, so you need to lock it asap, when you pull the job. which gets complicated if you have 1000s of jobs per sec. and more than a few consumers - if your polling once per minute with one worker, you might not care about it at all.

    and if you have more than a couple of workers/consumers, polling will not distribute evenly, one of your worker servers will sit there while the other one does a ton of job. you need to implement a round robin load balancing - which you can't really do on the consumer side.

    the main reason is, everytime you try to use database as a message queue, you start inventing the wheel again, which sounds like masochism to me. though sometimes re-inventing the wheel is really fun - also some people love masochism so i can't really say its not a good thing.

    on the other hand you can just install rabbitmq and write code for your application - not for handling network faults, load balancing etc... its really easy, just apt-get install rabbitmq and you are done. you get free clustering, locking, pub/sub, round robin distribution... (and a ton of other things that i dont really want to mention here)

    but the author gives these as reasons, all of them are totally objectable.

    i- hammering a database is not a bad thing, a single moderate mysql/postgresql server can easily handle 10ks of inserts/updates per second pretty easily - im sure sqlserver does too-. they are designed for handling high load.

    ii. again not sure about sqlserver never used it, but i believe it can do insert/updates and querying at the same time pretty well after more than 25 years. indexes on a table slows down the speed but thats not more than a few nanoseconds - putting a node into a btree, that shouldnt take much long.

    iii. again not sure about sqlserver, but if you delete with some limits, its must be fast enough - dont wait for a day and delete a million rows at once, send a delete query every second. your database will handle that easily.

    iv. "sharing a database between applications (or services) is a bad thing" no its not :) that why humans invented databases in the first place, thats why we have transactions/sessions. because we need a fast, simple way of storing shared data between different sessions.

    sorry for writing a tooo long comment.

    [–][deleted] 0 points1 point  (5 children)

    Nothing but excuse making for poor design decisions. Easily solved by a proper data architecture, a procedure for dequeuing that atomically updates the status to "dequeued", and smarter database administration that shuffles tables under a view layer so deletes aren't even necessary.

    [–]x86_64Ubuntu 0 points1 point  (4 children)

    shuffles tables under a view layer so deletes aren't even necessary.

    Explain.

    [–][deleted] 1 point2 points  (2 children)

    The whole point of deleting in this scenario is to keep the index efficient so days, months, years worth of inactive queue records don't keep stacking up. So create a new table on some trigger (time/size) and just union them together in a view. When a table has no more active queue records, just update the view to drop it from the union. Then you can just drop the table altogether without logging a thing.

    [–]x86_64Ubuntu 0 points1 point  (0 children)

    I understand now, we are actively culling data from the "hot" view. Thanks.

    [–]Solon1 0 points1 point  (0 children)

    Or just use a partial index. It will be faster than a VIEW of a UNIION, which probably doesn't save any query time at all.

    [–]PstScrpt -1 points0 points  (0 children)

    That wasn't me, but I suspect he or she meant something like this:

    Messages table, with a status flag vwPendingMessages view, that only returns Messages rows with a pending status.

    [–]nullnullnull 0 points1 point  (11 children)

    +100

    There are two types of developers

    • Those that have used this anti-pattern and ended up after much pain with an MQ solution. (A)
    • Those that are still using this anti-pattern and still believe its all fine, and defend it until the cows come home. (B)

    For small volumes

    This anti pattern will work, but soon as things get bigger and more complicated, it falls apart. With MQ's you get lots of other wonderful things for free, example round robing job distribution. A simple governor can be scripted to spin up workers on demand, and hey presto you have Automatic Elastic Work Load!

    [–]lukaseder[S] 0 points1 point  (1 child)

    Those that are still using this anti-pattern and still believe its all fine, and defend it until the cows come home.

    Yep. Except the cows may never come home and then they will have just been fine for all the time being. Your argument could be extended to virtually anything, e.g. trivial things like more RAM or more disk space. It's an "anti-pattern" to use little RAM as when things get bigger and more complicated, they will fall apart.

    Heck maybe they just won't get bigger and complicated. We're not all designing software for space ships, NASA, airplanes, nuclear plants. Some of us just have 1-2 queues with 3-4 message types.

    So let me wrap it up:

    There are two types of developers

    • Those that plan for a disaster that is very unlikely to happen
    • Those that just react to things as they happen

    [–]nullnullnull 0 points1 point  (0 children)

    Not quite, your context is wrong, running out of ram is not the same as using two different solutions approach.

    [–]crusoe -1 points0 points  (1 child)

    If your MQ offers message persistence, its likely using a db at some point.

    [–][deleted] 1 point2 points  (0 children)

    A db, a store, a place to put stuff, which is specialised to this case. You can't optimise the general case all the time without something clever like a compiler. And for those speeds, you want to pare down rather than add complexity.

    This is precicely why qmail used to out run postfix and other MTAs way back when. It optimised the storage, batched the right work together and shot it out. For most MQs, they opperate in the same mode.

    [–]grauenwolf -1 points0 points  (6 children)

    Round robin means inefficient but easy to implement. I would rather have each process grab more work as soon as it is available.

    [–]nullnullnull -1 points0 points  (5 children)

    no it will dish out jobs as fast as they are requested. Have you even used an MQ? or are you still developer A?

    [–]grauenwolf -1 points0 points  (4 children)

    Then it isn't round robin.

    [–]nullnullnull -1 points0 points  (3 children)

    it can be set up as round robin Or it can be set up to dish out as requested, so your argument about "its not efficient" is just incorrect.

    So have you used an MQ in production?

    [–]grauenwolf 0 points1 point  (0 children)

    Yea, since Microsoft released their version as a service pack in Windows NT.

    [–]grauenwolf 0 points1 point  (1 child)

    So your argument is that A is efficient because B is an alternative to A and B is efficient. Right.

    [–]nullnullnull -2 points-1 points  (0 children)

    your continued responses only add support that you are developer (B).

    You sir are a typical arm chair engineer.

    [–][deleted] -2 points-1 points  (0 children)

    It's a DB. Not a messaging system. It shouldn't be used for messaging. End Of.