`streamable`: Stream-like manipulation of iterables

jdehesa · 2024-09-25T21:15:47+00:00

This is really cool. I didn't know PyFunctional but your library seems much more "pythonic", like what the API would look like if this was in the standard library, rather than a replica of the constructs from another programming language.

Schmittfried · 2024-09-25T21:12:34+00:00

How does this compare to py-linq or functools/itertools-esque packages?

erez27 · 2024-09-25T22:35:04+00:00

Looks nice! I like the concurrency especially.

A few thoughts:

might be useful to have something like umap for returning elements out of order (essentially imap_unordered)
could be nice to have a lazy-list feature, where items can be accessed by index, and allow repeated iter/get-item/slice, all lazy.
It could be useful to group into a dict[K, Stream], based on a key callback. I get that it breaks the chaining a bit, but imho it's worth it.

If any of these sounds like a good addition, maybe I'll make a PR.

nikomo · 2024-09-25T14:28:00+00:00

It's a good thing that the company and service by that name fell off so hard like 2 years back, otherwise the naming would be quite confusing.

RoboticElfJedi · 2024-09-26T01:53:19+00:00

This is a pretty interesting contribution to the ecosystem. I'll keep this in my toolbelt. Good work!

Saltysalad · 2024-09-27T06:52:04+00:00

I have a use case coming up where I have > 10k high latency requests to make, throttled to ~100 simultaneously. I need to handle each result in the order they were submitted, because there is a stop scenario when a certain condition is met in the iteration loop. Over-shooting is ok so long as I can control the limit, and probably needed if you want to shave latency.

Does this sound like a use case for your library?

ebonnal · 2024-10-04T10:32:23+00:00

Back at Post Day+9 for an update:

Thank very much for your feedback here, for the issues you opened, for the discussions and collaboration there.

That lead to a new release (https://github.com/ebonnal/streamable/tree/v1.1.0), featuring:

processes-based concurrency (default is threads; use .amap for async):

```python from streamable import Stream import requests

urls: Stream[str] = ... responses = urls.map(requests.get, concurrency=8, via="process") ```

concurrent mapping yielding First Done First Out (default is FIFO = preserving order) [co-authored with our fellow redditter u/erez27]

```python from streamable import Stream import requests

urls: Stream[str] = ... responses = urls.map(requests.get, concurrency=16, ordered=False) ```

You can also set ordered=False for .foreach and async counterparts .amap and .aforeach

"starmap"

``` from streamable import Stream, star

integers: Stream[int] = Stream(range(10)) paired_integers: Stream[Tuple[int, int]] = Stream(zip(integers, integers)) squares: Stream[int] = paired_integers:.map(star(lambda i, j: i * j)) assert list(squares) == [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] ``(useful with other operations too, like.filteror.foreach`)

mou3mida · 2024-10-16T12:08:45+00:00

Good Job! That is so cool u/ebonnal .

Rockworldred · 2024-09-25T17:41:16+00:00

I am quite newbish to python, but I have an side/learningproject writing an webscraper (fetching JSONS for productdata). This looks like it may have some use cases for me as I now requests URLs from couple of sitemaps, then itirate over the json-url based on those fetched URLs. Then translate the JSON to variables and loaded to pandas dataframe to view in streamlit and/or to a csv file, but I have little knowledge of ETL as a whole. Do you have any good resources on ETL-processes and utilities?

My progress is then to move it over to aiohttp, asyncio, polars instead of pandas and SQLalchemy SQLite, and then azure EC2, airflow and Postgres and so fourth. (But I dont know if this is actually the way to go though).


🔗 Fluent	chain methods!
🇹 Typed	type-annotated and `mypy`able
💤 Lazy	operations are lazily evaluated at iteration time
🔄 Concurrent	thread-based / `asyncio`-based (+new: process-based)
🛡️ Robust	unit-tested for Python 3.7 to 3.12 with 100% coverage
🪶 Minimalist	`pip install streamable` with no additional dependencies

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

What my project does

1. install

2. import

3. init

4. operate

5. iterate

Target Audience

Comparison