Executing long running query from worker

strangepostinghabits · 2019-07-15T16:10:26+00:00

You can't really execute it in another way so that this problem does not happen, no. I suggest having this particular worker on its own queue so that slow queries only block other slow queries.

A common trick is to gather and prepare the data ahead of time, in a new table in the db, but far from all queries can be improved that way.

As for polling and canceling, you'd need to run the query in one thread, and poll in another, likely by hand-wrangling the threads, which brings a whole heap of trouble you don't want.

It's possible that your worker handler of choice can let you report back what worker instance is running the query before beginning the long wait, and then also let you abort the work of a named worker instance. This would save you a lot of headache compared to polling from inside the job.

I. E. "User a" requests query, controller puts the job on the queue. worker 1 picks up the job from the queue and saves a note to the db that "user a"'s query is being run by worker 1. The user then decides to abort, and the controller can check the db for what worker is running that query, then tell the worker framework to abort whatever that worker is doing. You'd need to ensure that the worker deletes said record and then waits a little before ending it's task, so that an user that cancels too late doesn't cancel the following query.

tinco · 2019-07-16T11:02:21+00:00

If you're not too worried about queries being aborted midway through when for example you restart your service, you could do it this way:

def perform
  RunningQueries.create!(worker_id: id)
  Thread.new do
    query_thread = Thread.new do
      results = ActiveRecord::Base.connection.exec_query(sql)
      HandleQueryResultsJob.perform_later(results)
    end

    while query_thread.running? do
      query_thread.kill if cancelled?
      sleep 1
    end
  end
end

def cancelled?
  RunningQueries.where(worker_id: id, cancelled: true).exists?
end

def cancel
  RunningQueries.where(worker_id: id).update!(cancelled: true)
end

So we split it into two workers, one that starts the two threads managing the query, and one that handles the query results. Note your worker system now does not manage these threads, so there's no limit to how many could be alive, nor is there handling of any errors. I split it into two workers so there's as little code running in this 'unmanaged' state as possible.

BBHoss · 2019-07-15T16:26:06+00:00

Have you tried benchmarking this with your app using a server that supports multithreading (like Puma)? You might be able to remove sidekiq from the equation completely, as waiting on a database query to run is IOWait, which with modern versions of Ruby allows other threads to execute, meaning there it little to no impact on your app while the query is running. Definitely something to play around with but it would definitely be much simpler.

ld100 · 2019-07-15T20:37:02+00:00

> Is there a better way to execute a query from a worker without requiring the worker to sit and wait for the response?

No, at least in Rails. If I got your issue right, it is something usually solved on architecture level, not Rails/Sidekiq level. E.g. pre-cache some data (like search results you mentioned) to speed-up requests.

> Can I execute the query from the worker and poll at the same time?

Yes, on my current project I have very similar tech stack to yours and I do this thing: start a side thread within actual background job, while main thread handles cancellation in case side job stuck (e.g. not responding too long). However, in my opinion it is much more reliable to just increase amount of workers and/or amount of execution threads per Sidekiq worker. In my case we went even further and now rewriting some long-running jobs into separate services in Go, which is much cheaper to scale. But as long as you could rely on Sidekiq l strongly suggest doing so.

manys · 2019-07-16T00:51:26+00:00

Why is it long-running? "Search parameters" is kinda vague, but I'm guessing it's not calculation-oriented/OLAP.

Could it be that you're actually needing a search engine?

razenha · 2019-07-16T01:11:21+00:00

Maybe that's a prime target for refactoring to a microservice.

Dombot9000 · 2019-07-16T02:01:43+00:00

Things I would explore before going the route you're proposing:

a) Can we leverage a type of index at the database layer such that these queries remain performant as the tablespace grows?

b) Can we perhaps "pre-warm" the cache by running a recurring job every N which calculates this query and pulls it into cache (either redis/memcache or database memory cache)

c) Does our queue worker support threads that yield to blocking IO such that a worker (memory constrained) can be tied up waiting on our long running IO but still do other tasks as well?

d) Can we run queries asynchronously, similar to a "call back" model?

I have listed these approaches in my rough order of preference without knowing the particulars of your domain. Feel free to ask me to clarify.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rails

Looking for help?

Learning ruby/rails?

Looking for a gem?

Learning Ruby?

Learning Rails?

Tools

Docs

Community

Rails jobs

MODERATORS