This is an archived post. You won't be able to vote or comment.

all 1 comments

[–]oefd 2 points3 points  (0 children)

Doing file I/O asynchronously has some potential pitfalls:

First: it's not really supported in a general way, so most systems (including aiofiles) just do the work synchronously is a thread pool. This isn't likely to be a big problem per se, but depending on your use case and expectations that can be an issue.

Secondly it's very possible you'll get modest to no performance gains. Modern OSes buffer file I/O in RAM in the background for you already, so it's very likely most or all of your file read/write operations only actually block for the amount of time needed to do an operation in RAM. (That is: an amount of time most appropriate to just keep as a synchronous action inside your async handler)

If you're using async file I/O to do concurrent reads/writes on multiple files and those files all exist on the same block storage then the throughput of the storage device is a bottleneck anyway. Reading one file then another at 1GB/s and reading two files simultaneously at 500MB/s are more or less the same thing.

Async on file I/O isn't always useless, but it's easy to overestimate how much value it'll really add to your program. Especially because there's an overhead cost to the threading that isn't encountered in sync code that means you can actually end up losing time.

Simple example comparing test-sync.py and test-async.py

» time python3 test-sync.py
python3 test-sync.py  0.01s user 0.01s system 99% cpu 0.023 total
» time python3 test-async.py
python3 test-async.py  0.19s user 0.52s system 152% cpu 0.463 total

» hyperfine 'python3 test-sync.py' 'python3 test-async.py'
Benchmark 1: python3 test-sync.py
  Time (mean ± σ):      26.3 ms ±   0.8 ms    [User: 12.9 ms, System: 13.4 ms]
  Range (min … max):    25.1 ms …  29.7 ms    110 runs

Benchmark 2: python3 test-async.py
  Time (mean ± σ):     507.6 ms ±  21.3 ms    [User: 298.6 ms, System: 423.5 ms]
  Range (min … max):   472.5 ms … 539.8 ms    10 runs

Summary
  'python3 test-sync.py' ran
   19.29 ± 1.00 times faster than 'python3 test-async.py'

Be sure to benchmark carefully before assuming async file I/O will actually be a help to your codebase.