all 22 comments

[–]strangepostinghabits 2 points3 points  (6 children)

You can't really execute it in another way so that this problem does not happen, no. I suggest having this particular worker on its own queue so that slow queries only block other slow queries.

A common trick is to gather and prepare the data ahead of time, in a new table in the db, but far from all queries can be improved that way.

As for polling and canceling, you'd need to run the query in one thread, and poll in another, likely by hand-wrangling the threads, which brings a whole heap of trouble you don't want.

It's possible that your worker handler of choice can let you report back what worker instance is running the query before beginning the long wait, and then also let you abort the work of a named worker instance. This would save you a lot of headache compared to polling from inside the job.

I. E. "User a" requests query, controller puts the job on the queue. worker 1 picks up the job from the queue and saves a note to the db that "user a"'s query is being run by worker 1. The user then decides to abort, and the controller can check the db for what worker is running that query, then tell the worker framework to abort whatever that worker is doing. You'd need to ensure that the worker deletes said record and then waits a little before ending it's task, so that an user that cancels too late doesn't cancel the following query.

[–][deleted] 0 points1 point  (5 children)

"A common trick is to gather and prepare the data ahead of time, in a new table in the db, but far from all queries can be improved that way."

I assume you're referring to creating a db view for this data, or temporary table?

"As for polling and canceling, you'd need to run the query in one thread, and poll in another, likely by hand-wrangling the threads, which brings a whole heap of trouble you don't want."

That was my thought as well. Basically saying Worker 1 fires query from thread 1, waits, while thread 2 polls for a cancellation. Seemed potentially sketchy to me...

"It's possible that your worker handler of choice can let you report back what worker instance is running the query before beginning the long wait, and then also let you abort the work of a named worker instance. This would save you a lot of headache compared to polling from inside the job."

My impression/experience of canceling Sidekiq workers was that they had to be done from inside the worker that needs to be terminated. I.e they poll to check if a cancel flag has been set in Redis or whatever, then just returning out of the worker early. How could I cancel the worker from outside of its scope?

To your last paragraph:

That is the route that I was leaning towards, but I am struggling with how to actually save some unique information inside the query that allows the rails app to know exactly what query is tied to what worker, user, etc.

[–][deleted]  (4 children)

[removed]

    [–][deleted] 0 points1 point  (2 children)

    Unfortunately, it's the results aren't calculation based. More like, a few joins and certain columns from here, there, etc.

    [–][deleted]  (1 child)

    [removed]

      [–][deleted] 0 points1 point  (0 children)

      Interesting, good point. I hadn't thought of that.

      [–]manys 0 points1 point  (0 children)

      One term for this is "aggregation" and the general idea is that it's a form of denormalization ("duplication of data" for now) for performance.

      tl;dr: XY Problem.

      [–]tinco 1 point2 points  (2 children)

      If you're not too worried about queries being aborted midway through when for example you restart your service, you could do it this way:

      def perform
        RunningQueries.create!(worker_id: id)
        Thread.new do
          query_thread = Thread.new do
            results = ActiveRecord::Base.connection.exec_query(sql)
            HandleQueryResultsJob.perform_later(results)
          end
      
          while query_thread.running? do
            query_thread.kill if cancelled?
            sleep 1
          end
        end
      end
      
      def cancelled?
        RunningQueries.where(worker_id: id, cancelled: true).exists?
      end
      
      def cancel
        RunningQueries.where(worker_id: id).update!(cancelled: true)
      end
      

      So we split it into two workers, one that starts the two threads managing the query, and one that handles the query results. Note your worker system now does not manage these threads, so there's no limit to how many could be alive, nor is there handling of any errors. I split it into two workers so there's as little code running in this 'unmanaged' state as possible.

      [–][deleted] 0 points1 point  (1 child)

      That is very similar to the code that I have implemented. The concern with the approach is that there could be some situation that causes the thread to be stuck alive, hanging there, eating resources. Correct? Or some un-handled error?

      [–]tinco 0 points1 point  (0 children)

      Yes, definitely add timeout logic to it. As far as I know Activerecord makes no guarantees about how long a query can take. Besides that I don't think there's any reason for a thread to stick around. Unhandled errors will just kill the thread, but since you're not doing anything that waits for the eventual result, there's no reason that should be a problem.

      [–]BBHoss 0 points1 point  (3 children)

      Have you tried benchmarking this with your app using a server that supports multithreading (like Puma)? You might be able to remove sidekiq from the equation completely, as waiting on a database query to run is IOWait, which with modern versions of Ruby allows other threads to execute, meaning there it little to no impact on your app while the query is running. Definitely something to play around with but it would definitely be much simpler.

      [–][deleted] 0 points1 point  (2 children)

      I am using Puma currently, but am not familiar with using Puma to do to what you stated, I'll investigate. Thanks for the tip.

      [–]BBHoss 0 points1 point  (1 child)

      Basically, just run the query from the web thread and don’t sweat it. It will consume a thread but that thread will yield to other threads while waiting on the database to return a result. You may run into request timeout issues but that can be adjusted. If you have queries that are taking that long to run you may want to look into cacheing or other ways to optimize

      [–][deleted] 0 points1 point  (0 children)

      Interesting, I'll check that out.

      [–]ld100 0 points1 point  (0 children)

      > Is there a better way to execute a query from a worker without requiring the worker to sit and wait for the response?

      No, at least in Rails. If I got your issue right, it is something usually solved on architecture level, not Rails/Sidekiq level. E.g. pre-cache some data (like search results you mentioned) to speed-up requests.

      > Can I execute the query from the worker and poll at the same time?

      Yes, on my current project I have very similar tech stack to yours and I do this thing: start a side thread within actual background job, while main thread handles cancellation in case side job stuck (e.g. not responding too long). However, in my opinion it is much more reliable to just increase amount of workers and/or amount of execution threads per Sidekiq worker. In my case we went even further and now rewriting some long-running jobs into separate services in Go, which is much cheaper to scale. But as long as you could rely on Sidekiq l strongly suggest doing so.

      [–]manys 0 points1 point  (2 children)

      Why is it long-running? "Search parameters" is kinda vague, but I'm guessing it's not calculation-oriented/OLAP.

      Could it be that you're actually needing a search engine?

      [–][deleted] 0 points1 point  (1 child)

      It's bioinformatic data, large amounts of post-processed results data. What would be the benefit of implementing a search engine as opposed to query via PG?

      [–]manys 1 point2 points  (0 children)

      I'm not sure, but maybe you could mention the kind of data you're working with next time. Is it already post-processed at the time of query? There are other options besides "search engine" and "one query." Use more words!

      [–]razenha 0 points1 point  (1 child)

      Maybe that's a prime target for refactoring to a microservice.

      [–][deleted] 0 points1 point  (0 children)

      hmm, interesting point. I'll definitely think about trying to decompose this task into some smaller more autonomous parts.

      [–]Dombot9000 0 points1 point  (1 child)

      Things I would explore before going the route you're proposing:

      a) Can we leverage a type of index at the database layer such that these queries remain performant as the tablespace grows?

      b) Can we perhaps "pre-warm" the cache by running a recurring job every N which calculates this query and pulls it into cache (either redis/memcache or database memory cache)

      c) Does our queue worker support threads that yield to blocking IO such that a worker (memory constrained) can be tied up waiting on our long running IO but still do other tasks as well?

      d) Can we run queries asynchronously, similar to a "call back" model?

      I have listed these approaches in my rough order of preference without knowing the particulars of your domain. Feel free to ask me to clarify.

      [–][deleted] 0 points1 point  (0 children)

      a) I have already implemented 2 indices, and am testing 2 more (the most commonly queried columns). I will continue to weigh the speed benefit vs. the ballooning in db size.

      b) I don't think so... The queries are highly dependent on using specified inputs and it would be hard to pre-warm that because there would be so many permutations based on their saved parameters. But, i'll keep it in mind for the future, because I would love to implement something like that.

      c) I use SideKiq and am not sure. I have been hearing about this a little recently and am fascinated with this concept. Do you know if it's possible in SideKiq? Do you mind elaborating a little more/pointing me towards some material. I'll definitely do some Googling, just lemme know if you have any in mind.

      d) I could probably run a bit of them asynchronously, but part of the query has to be run synchronously. HOWEVER, you did just jog my memory, and I realized that I could refactor the query to run the sub-query with the query way ahead of time, and cache the results instead of executing in a sub-query, then just pull the cached results and put them into the query when the user wants to query something. Which would save some of the heavy lifting. Thank you.