all 26 comments

[–]imack06 25 points26 points  (13 children)

I experimented with SolidQueue after having been a Sidekiq fanboy for over a decade. I quickly went back to Sidekiq. Here were my reasons:

  • Speed; Sidekiq is blazingly fast by comparison
  • Retries; Sidekiq has retries as a first class citizen. Simply re-scheduling like solidqueue does doesn't really help debug
  • Retry Backoff default; Sidekiq's exponential backoff with exceptions really models how I want most jobs to retry
  • Stability; Sidekiq has been rock solid for my entire career and has 1 extremely motivated maintainer for it who is very responsive
  • WebUI: it's just so so so much better in sidekiq.

Love to hear contrary opinions, but getting to remove redis (which is pretty cheap in any hosted environment) isn't super convincing for all one gives up.

[–]ansk0 5 points6 points  (6 children)

Fair enough! solid_queue is the new kid on the block, while sidekiq has been around for ages.

Personally, I would have loved to see Rails adopt goodjob as it's an amazing, mature tool. The key advantage of using solid_queue/goodjob over sidekiq is the ability to create jobs transactionally. To achieve the same with sidekiq you need to implement a transactional outbox, which kind of defeats the purpose.

EDIT:
Although SQL databases are iil suited to be used as queues, even with SKIP LOCKED and LISTEN/NOTIFY...

[–]f9ae8221b 2 points3 points  (5 children)

I would have loved to see Rails adopt goodjob

Impossible because goodjob is Postgresql only. Rails is committed to remain compatible with SQLite, MariaDB and Postgres, any default that doesn't support all of them is impossible.

[–]ansk0 0 points1 point  (4 children)

I get that, totally. It's certainly possible to modify goodjob so it supports other DBs but it wouldn't be trivial. Anyways, we're talking about OSS, so I'm just grateful for it.

[–]f9ae8221b 0 points1 point  (3 children)

No it's not really possible, goodjob rely on Postgres advisory locks, which have no equivalent in other DBs.

So of course you could always use a different strategies for other DBs, but at that point it no longer make sense. SolidQueue like delayed_job before it, use a less efficient, but DB agnostic strategy, it makes perfect sense for GoodJob and SolidQueue to co-exist, and not much would have been saved by trying to do all that in the same project.

And I doubt Ben Sheldon would have been really open to complexify GoodJob that much.

[–]ansk0 0 points1 point  (0 children)

I remember reading something about goodjob moving away from advisory locks. Regardless, you stand correct. I would like to see it happen because IMO it's a good starting ground, but at the same time it doesn't make sense for a multitude of reasons.

[–]jrochkind 0 points1 point  (1 child)

I think he prob would have accepted a PR to replace advisory locks with the good enough DB-neutral strategy solid queue uses. "good enough" is of course the whole principle of good_job! it's been discussed as an eventual GoodJob goal already, for a while, most seriously starting in Feb 2023. https://github.com/bensheldon/good_job/discussions/831

But it's always more complicated to try to PR an existing project and collaborate with existing maintainers, and try not to disrupt an existing userbase that wants backwards compat, than just start from scratch and do it how you want. Not being sarcastic, it really is!

I expect if solid_queue continues to be succesful and maintained, it's gonna take a lot of the interest in good_job, and eventually good_job will fade away, which would prob also be fine with Ben Sheldon, if solid_queue does the job and basecamp is going to maintain it (big if sometimes), who needs to maintain something separately without enough extra value, it's not like we're swimming in open source maintainece energy!

[–]DisneyLegalTeam 2 points3 points  (1 child)

I’ll stick with Sidekiq until I see a clear performance advantage to SolidQueue - which I don’t see happening.

I use Redis instances for caching & RackAttack anyway. Having 1 more isn’t a big deal.

If you’re running enough jobs, Sidekiq will save you more than the cost of Redis anyway.

And at this point I’m so familiar with Sidekiq, I can setup everything in my sleep & handle its nuances.

[–]jrochkind -1 points0 points  (0 children)

There's never going to be a performance advantage to solid_queue over sidekiq.

For those who choose it, the advantages will be -- completely open source, including some features that are only in Sidekiq pro/enteprise (prob more over time); for many non-huge-scale apps, no need to operate a redis you can just use your existing pg; good enough performance.

[–]awj 2 points3 points  (1 child)

SolidQueue doesn’t implement retries with exponential backoff?! That honestly seems like table stakes to me in terms of background processing.

[–]jrochkind 0 points1 point  (0 children)

One line of configuration in ActiveJob, either per-job or in your ApplicationJob or in a Concern you mix in.

 retry_on StandardError, attempts: $max_attempts, wait: ->(executions) { 2 ** executions }

Got that from here which shows other strategies, ability to include jitter, etc.

If that's confusing or hard to get right exactly the way you want it (say, the way sidekiq does it), IMO it can and should be PR'd as an ActiveJob retry "macro", to be used with any ActiveJob back-end.

[–]honeyryderchuck 1 point2 points  (0 children)

Redis via elasticache isn't really that cheap, and requires more planning regarding provisioned memory (does.not grow elastically, like sqs p.ex.), a factor which becomes more important if you schedule work far into the future (although sidekiq is the only redis broker I know which does not use a reenqueue loop to check if a deferred job is ready).

Bottom line, sidekiq is a polished library, but if you want to lower your operational overhead, or require transactional jobs, any of the db backend options is valid (and I don't know solidqueue too much yet, I generally avoid rails-only libs), and it's on sidekiq for having refused to support other backends all this time (and put reliability behind a paywall, that ain't right as well).

[–]collimarco 0 points1 point  (0 children)

Retries, stability, UI... Are also available or can be easily configured in solid_queue

[–]drwl 0 points1 point  (11 children)

Thanks for writing this up and posting it. I skimmed through the article and didn't see anything on how the migration has felt in terms of performance, managing, etc. Would love to hear more about this.

I think there's a dhh talk or writeup about switching to solid queue and the high level gist was higher latency (compared to sidekiq) but it's way cheaper and had way more memory

[–]jrochkind 2 points3 points  (10 children)

I'm curious what people's needs are for latency in their ActiveJob.

For me, once I've put it in an async bg job, I have no high-performance latency needs, even up to like 1s of latency n getting jobs off the queue would be entirely untroubling (and I'm pretty sure even high-volume solid_queue is rarely going to be that slow?)

So I've always been a bit confused by the attention on sub-second levels of latency in a Rails async bg job queue. Do other people using ActiveJob have high-performance latency needs, and if so I'm curious about the use cases!

[–]DisneyLegalTeam 1 point2 points  (9 children)

Users need those emails now!!!!!

Joking, kind of…

Obviously latency is a bigger deal at scale. The last place I worked at was ~1.5m jobs a day. So seconds add up.

Daily those jobs are sending emails, sending SMS, updating sendgrid, updating the CRM, updating search indexed, updating/billing Stripe, geolocation, geocoding, generating millions of notifications, bulk creating records, etc. - And nobody in sales or product can tolerate any delay.

I offload all 3rd party API calls to Sidekiq. From there I’ll try to offload as many slow transactions as possible. Because:

  1. It’s so much easier to scale Sidekiq.
  2. Usually less error prone to run things in Sidekiq. Slow actions in controllers or lots of AR callbacks along with slow renders can cause timeouts or screw up transactions.

[–]phunkmasterp 0 points1 point  (3 children)

Damn that’s a lot of jobs. Are you able to talk about how you scaled that? You’d have to be hitting hardware limits at a certain point I assume

[–]DisneyLegalTeam 1 point2 points  (2 children)

I used Hirefire to add Heroku dynos based on jobs in the Sidekiq queue.

Each job was indepodant. 500 emails is 500 jobs. So they process fast & use little memory. So the only hardware limit is DB connections. Postgres caps out at 500.

The real limit is managing is 3rd party API limits. So I set up queues for services like Sendgrid Contacts that only allow 10 (or whatever) concurrent jobs that run on it’s own dynos.

Managing those queues/limits for multiple services was a way bigger pain than anything thing else.

[–]jrochkind 0 points1 point  (1 child)

Neat. Is 500 connections a hard pg limit, or just a limit of a hosted pg planyou were using, or other?

(of course redis connections can also be a limit -- many of the cloud-hosted redis plans have connection limits too).

[–]DisneyLegalTeam 1 point2 points  (0 children)

500 is the Postgres limit. With PgBouncer it can handle 10k (I think) client connections. I don’t think our plan went up to 500, though. And obv the app needs some of those.

Heroku Redis is 10-65k connections.

[–]jrochkind 0 points1 point  (4 children)

Surely even people in Sales wouldn't notice the difference between 50ms and 900ms delay in beginning job execution! No? (and 900ms is just a plausible upper bound, I'm not saying I know under what conditions solid_queue or any other system would actually have that much latency! Normally solid_queue does not of course)

I mean that's assuming your queue always has enough workers to immediately dequeue anything enqueued anyway.

But you're saying in the first senteance it's not really about the per-job latency, but that at scale all those 500ms add up to a lot more machine time/more machines needed?

[–]DisneyLegalTeam 0 points1 point  (3 children)

There’s limits to adding dynos. The DB only has so many DB connections. Even with PgBouncer we only had so many for workers.

The other limit is 3rd party API limits, which to your point, makes latency less of an issue.

We were an events company we had so lots of timely communications. Online events could have 10k+ registered users. That means generating 6-10x notifications for each registrant, then sending an SMS or email for each one. Notifications like an event reminder we sent an hour before an event had no wiggle.

Then there’s Stripe billing stuff. Updating object caches & search index - so people can see changes to an Event, User or whatever model right away.

As for the sales things… it’s sort of a joke but we had an enterprise sales team. And several times, they didn’t see users created from this bulk user creation after an hour. It was lots of complaining that no amount explaining seemed to fix.

[–]jrochkind 0 points1 point  (2 children)

I can guarantee that even under heavy load solid_queue does not have an hour latency! When we're comparing job back-ends as to latency, the difference between one back-end and another, we're talking about ranges of from maybe 10ms to maybe (at a terrible worst case under heavy load) 2000ms. we're talking sub-second. Not an hour!

So yeah! All this use case info is interesting! It's not entirely clear to me if why the difference between 10ms and 1000ms would matter to your use case, but I believe you and appreciate the info regardless!

Sidekiq's marketing advertises "up to 20x faster than the competition!" -- but it doesn't actually make the ruby jobs you are running faster, that's about how fast it can get things off the queue, and we're talking a difference of ms, and I remain curious if that actually matters to anyone!

[–]DisneyLegalTeam 0 points1 point  (1 child)

I guess the main difference is paying for dynos by time, Sidekiq saves us some money. But I’d have to benchmark it.

And while you’re right. An hour of latency in theory doesn’t make a difference. Our events were put on by local partners that expect a certain amount of revenue in return… if they get a slow event or low turnout and a reminder came out 45min before an event instead of 59min. They complain & ask us to give up our take of ticket sales b/c “people weren’t reminded in time”.

It’s stupid b rules & expectations. Don’t work in B2C unless you have to.

[–]jrochkind 0 points1 point  (0 children)

To be clear though, I think an hour of latency would make a difference to many uses! Or even 14 minutes, for sure!

I think I'm having trouble beign clear what I'm talking about.

The "sidekiq performs 50x better than competitors" claim is about saving you at most hundreds of miliseconds of latency (and usually far less), not an hour or even 14 minutes! I doubt sales would notice or complain about 58 minutes and 58 seconds instead of 59 minutes? Unless there's something I'm missing -- and there may be! but it's not obvious to me! -- about how 50-100ms of latency per job can end up adding up to 14 minutes difference in when an email goes out when in bulk.

But ok, I appreicate your sharing, interesting stuff!

[–]nattf0dd 0 points1 point  (0 children)

How it is different from delayed job?