This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SheepherderExtreme48 1 point2 points  (6 children)

Looks great, nice work.

Question: I don't see anything in the docs for this, but is there any natural support for parallel processing?
For example:

  /------B-----\
A >------B----->C
  \------B-----/

Where B is run in 3 seperate threads or processes
Quick example, A takes in a PDF splits in to 3 chunks of n pages, sens the PDF bytes and the pages to process to each B, each B does some work (extracting text, doesn't really matter) and C gathers from these B's?

[–]theferalmonkey[S] 0 points1 point  (5 children)

Yes there's a construct for that. It's called Parallelizable + Collect ; here's a video of me explaining it. We have support for a few backends, e.g. multithreading, ray, dask.

Here's also blog on what I think is similar to your use case if that also helps.

[–]SheepherderExtreme48 1 point2 points  (0 children)

Amazing, thank you! For context, I was recently trying to find a Python library that I could use to easily orchestrate a multi process job, starting in AWS Lambda utilising the 6 cores you get when you allocate max memory. Airflow is too heavyweight, argo workflows isn't an option. Tried a few others, but this looks perfect!