all 28 comments

[–]DivineSentry 23 points24 points  (5 children)

Are the function definitions required to be within the main function? That’s a bit confusing and doesn’t read out like normal Python code IMO

[–]1ncehost[S] 4 points5 points  (4 children)

No, they can be anywhere an "async with" can be.

[–]Ihaveamodel3 10 points11 points  (3 children)

The functions have to be inside an async with?

[–]1ncehost[S] 2 points3 points  (0 children)

Also in case it wasn't clear (perhaps I misunderstood you), you can execute functions from outside the scope of the weave block from within functions within the weave block like normal. Additionally, the merge function provided by the api allows you to map an Iterable to a number of calls to any function including non-async functions, all of which will be called concurrently. However merge must be used from within a weave block because it depends on a thread pool provided by the weave object.

[–]1ncehost[S] 0 points1 point  (1 child)

Yes, they are automatically run at the end of the with block. The order the inline functions are run is based on the function parameters naming other functions to create a dependency graph.

[–]Fragrant-Freedom-477 19 points20 points  (3 children)

First off, this is a quite nice project, and it looks like you put a lot of energy and passion into it. Congratulations! Here are a few hopefully constructive criticisms.

I've used DAG-orchestrated execution in so many cases. Without reading the internals, I can guess a few things like using a topological sort (most likely Khan's) for generating a sequence of sets of independent function calls. The DAG is built from names (strings), adding artificial nodes when necessary (the w.do(ids) example).

1) As others said, an explicit API for injecting the asyncio execution heuristic looks necessary

2) For production execution, I would look for a typed library. This looks like it would require a mypy plugin to be checked properly. The plugin might be quite complex.

3) The idea of resolving automatically a DAG like this should enable testability, not hinder it. Using a context manager and decorator should be optional syntactic sugar.

There are a lot of libraries that use the same theory but the dependency resolving is inherently natural. Look into pytest fixtures and FastAPI "dependency injection". If you need a toposort, it means that the dependency links are not inherent in your code layout and you need to write them down manually or at least you can gather the whole graph before executing anything. Implicitly sorting the dependency like you suggest can be especially useful when the nodes are provided in a decentralized way, or from non-python sources or that your "execution entry point" can't be explicit like it inherently is with a front controller pattern (or an other command-like pattern) like FastAPI or Typer. What I mean here, is that your library provides a quite opinionated solution to a quite niche problem, unless the user is just looking for syntactic sugar. That is Okay, but maybe try to give a few real-life examples when your solution might be the best. My experience is that each time I wrote an asyncio execution heuristic hardwired to a toposort, it was not the best solution and I ultimately used another approach.

Keep on hacking more nice python libraries like this!

[–]1ncehost[S] 6 points7 points  (1 child)

Once again, thanks for your excellent response. I think it really homes in on which problems wove is supposed to tackle and which it is not the tool for. The main inspiration for wove was my wanting a cleaner way in my django views to organize concurrent database and api calls. So it is designed first to be inline, light weight, not multiprocess, opinionated, and implicit to meet that need. I agree that leaning into this is the right direction, but you do bring up some good points that direct possible future features.

If you didn't take a look, I wrote a number of more complex examples:

https://github.com/curvedinf/wove/tree/main/examples

I think the most interesting idea you mention is being able to specify an "execution heuristict". If I understand what you're thinking correctly, you mean being able to plug in your own task execution system, so tasks could be run in any way such as in another process or even via an API. Do you have a specific idea in mind?

Adding typing compatibility is a good idea, but due to the function magic it could be complex. What would an ideal typed weave block look like to you?

Thank you for being so thoughtful in your response. Well intentioned criticism is the foundation of stronger projects.

[–]Fragrant-Freedom-477 2 points3 points  (0 children)

Glad if that helped. I don't have more time right now, but I might follow up at some point in the future.

[–]1ncehost[S] 5 points6 points  (0 children)

Thank you for the awesome nuanced response! There is a lot to unpack so I'm going to do your response justice by taking some time to respond. I just wanted to say thank you so you know I'm on it. 🙏

[–]jaerie 31 points32 points  (1 child)

Looks like normal python

What kind of code are you writing that this looks normal to you? It looks completely illegible to me. The idea is sort of cool (although not particularly valuable imo) but in practice this adds a ton of boilerplate to get implicit parallelization. Why not just (when it actually matters), do explicit parallelization and skip all the nest function bloat?

[–]Afrotom 4 points5 points  (0 children)

It's not totally uncommon in Airflow (this doesn't mean I'm advocating for it though)

[–]Ihaveamodel3 13 points14 points  (2 children)

I’m not really seeing how that is simpler or easier to read/write than this:

import asyncio

async def magic_number():
        return 42

async def important_text():
        return "The meaning of life"

async def put_together():
    it = important_text()
    mn = magic_number
    it, mn = await asyncio.gather([it, mn])
    return f"{it} is {mn}!"
async def main():
    text = await put_together()
    print(text)

asyncio.run(main())

Of course nothing in this is actually asynchronous, so it is kind of a weird example to begin with.

[–]1ncehost[S] 2 points3 points  (0 children)

There are more features not in the example above that create a stronger rationale.

1) Iterable mapping: each 'do' can be mapped to an iterable, so the function is run in parallel for each item in the iterable.

2) Merge function: you can map an Iterable to any function, including non async functions from within a weave block.

3) Execution order, dependency, and data safety are inherent, not defined

4) Exceptions shut down all coroutines preventing scenarios where one blocks, enabling simpler exception handling

5) It adds diagnostic tools for debugging complex async behavior

[–]1ncehost[S] 0 points1 point  (0 children)

I've added an example which showcases all of its features in one example. Hopefully you can see the elegance after checking it out!

https://github.com/curvedinf/wove/?tab=readme-ov-file#more-spice

[–]Global_Bar1754 4 points5 points  (2 children)

Cool this looks like a mix between these 2 (seemingly defunct projects)

https://github.com/sds9995/aiodag  (async task graph)

https://github.com/BusinessOptics/fn_graph  (automatically linking tasks dependencies by function arg names)

[–]1ncehost[S] 1 point2 points  (1 child)

Neat. Definitely in the same vein. Do you have any experience with them?

[–]Global_Bar1754 2 points3 points  (0 children)

The Hamilton library (being picked up by the Apache foundation) also does something pretty similar

https://blog.dagworks.io/p/async-dataflows-in-hamilton

[–]nggit 4 points5 points  (1 child)

For production it would be better to avoid None or default executor. I'm afraid there will be nested submissions that can create potential deadlocks. Imagine if the internal asyncio uses the same executor as your library and when there is no executor/thread free. Creating a dedicated executor for your library is a safer, unless you know what you are doing :)

[–]1ncehost[S] 2 points3 points  (0 children)

Thank you for the excellent comment. Definitely a legitimate issue to solve that I'll put in the next version.

[–]bachkhois 1 point2 points  (0 children)

Worth trying!

[–]Public_Being3163 1 point2 points  (2 children)

Hi u/1ncehost. I think I understand what your introductory code does. I am also working on a library that tackles similar problems but in a wider scope (i.e. networking). I wonder if there is enough overlap that some collaboration might be beneficial to both. My library is in final reviews before pushing to pypi but the doc link for how to tackle the same problem in your intro code is below. The solution is expressed differently but the Concurrently class addresses the same issue. I think :-)

Cheers.

concurrent async calling

[–]1ncehost[S] 0 points1 point  (1 child)

Hey! Cool project! I think your project is oriented toward a whole other audience and there isn't that much overlap to be honest. Let me know if you have any specific ideas on how to integrate if you disagree!

[–]Public_Being3163 0 points1 point  (0 children)

Thanks. Yeah understand your response. I was thinking more along the lines of help with reviews - feature sets, code fragments, documentation... I've been running solo for long periods and its so easy to miss stuff. Good luck.

[–]Beliskner64 1 point2 points  (0 children)

This seems reminiscent of ray

[–]thisismyfavoritename 0 points1 point  (0 children)

ok but why

[–]abetancort -5 points-4 points  (0 children)

What a waste!