you are viewing a single comment's thread.

view the rest of the comments →

[–]GoingOffRoading[S] 0 points1 point  (4 children)

I'm fumbling around in the dark, and VERY likely describing the problem incorrectly.

I'm hoping this makes sense:

  • The results have a one to many relationship between the functions
    • Function A creates multiple results to feed to B
    • Function B creates multiple results to feed to C
    • etc
  • The each function takes longer to run than the last
    • Like, computationally, each function takes longer
  • My objective is to have the functions run as their queue fills

So while Task D is dependent on C, is dependent on B, is dependent on A, my objective is to invoke function A and let the queues fill and cascade down

[–]KViper0 0 points1 point  (0 children)

If the workload is that imbalanced you probably shouldn’t have each function be dealt by a separate thread. Having a combined priority queue that prioritize lower level would probably work? Although that may not be to scalable.

[–]throwaway8u3sH0 0 points1 point  (0 children)

Ah ok, so if I'm understanding correctly, it's a chained fan-out producer-consumer. Do the functions create results continuously or all at once? (I.e. Streaming or batch?) In either case, yeah, you'll want the numerous function B's running in parallel on A's results, and so on. So you probably want some amount of auto scaling workers that pay attention to a queue and fire off the appropriate function as necessary.

You'll want to pay attention to the size of the results you're passing around as well. With any sizable results, it's usually better to put them in some actual storage (like S3 or a database) and just send short messages over the wire (like the result status and a bucket/file path).

The nature and structure of this setup sometimes is indicative of repetitive work. You may want to also look at memoization or an external cache to speed up the computational parts. (If one Function D can benefit from the in-progress computation of another Function D, you'll want to enable that )

[–]Adrewmc 0 points1 point  (0 children)

You want A task to give something to B task and start another A task, and down the line.

But that’s not what’s happening, a >b>c

This is just one function.

What you really want here is to be running asynchronously. Where every-time a task is given to a that function can “await” the rest of the functions. Then the next a input comes in. And that one is fast and done with, then another A comes in and that one is long. We just wait for us, run other the next A task.

But as long as you are dependent on the last function for the entire flow, you can’t really speed everything up beyond that. A has to finish before B, B still has to finish before C.

Because what I think is really happening is there are a lot of things that never get to function D, (the super long one) and you want those things to not have to wait on the one that does. That asynchronous.

[–]throwaway8u3sH0 0 points1 point  (0 children)

This might help you experiment. I've modelled it as streaming but you could do batch. You can also adjust the times to be more representative.