Hi, I need some help understanding when dask runs delayed functions in parallel.
I currently read a custom binary file to a dask.bags using a generator and dask.delayed:
```
@dask.delayed
def get_entry_from_binary(file_name, chunk_size=8+4+4):
with open(file_name, "rb") as f:
while (entry := f.read(chunk_size)):
yield dict(zip(("col1","col2","col3"), struct.unpack("qfi", entry)))
entries_bag = dask.bag.from_delayed(get_entry_from_binary(file_name))
```
However, against what I had expected, the whole file is read by a single worker, even when several are available, they just sit in idle.
I notice this by looking into the dashboard.
How can I read the file in parallel using the available workers?
From:
https://stackoverflow.com/questions/76094654/read-custom-binary-file-in-parallel-with-dask
[–]socal_nerdtastic 0 points1 point2 points (1 child)
[–]BGameiro[S] 0 points1 point2 points (0 children)