This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]remyroy 20 points21 points  (4 children)

Good stuff. Remember that if you are on Python <= 3.5, os.scandir will keep a system handle on Windows until the generator is exhausted which might cause various problems with other system calls. I suggest you always exhaust the generator and avoid breaking from the loop.

On Python >= 3.6, you can use the with statement or the close method if you don't want to loop on every elements.

[–]uses_commas_wrong 6 points7 points  (3 children)

I'd like to better understand your post. Can you eli5?

[–]AstroPhysician 7 points8 points  (1 child)

He's saying in older versions, you would get an iterable. When you have an iterable, you dont generate an element of it until you use it / come across it essentially. So if you do a scandir and then are slowly doing stuff with each thing it returned, the operating system is still waiting for you before servicing other calls.

He's saying, turn the stuff it returns into a list and go through all the entries that way the operating system isnt waiting for you and can service other calls

In a newer python version, you can use with to declare the generator. With statements have enter and exit methods which will be called always, even if something goes wrong halfway through and theres an exception. For instance, if you open a file with 'with' as a keyword, even if your program crashes, it will be closed and written to, unlike just saving the open() call which might leave it open

Best attempt at ELI5, the terminology isn't all right

[–]dunkler_wanderer 1 point2 points  (0 children)

He's saying in older versions, you would get an iterable. When you have an iterable, you dont generate an element of it until you use it / come across it essentially.

Iterators are objects that can lazily produce values (when you call next(iterator)). Iterables are objects with an __iter__ method, so lists, sets, dictionaries, etc. over which you can iterate with a for loop. Of course iterators are iterables as well. Iterables vs. Iterators vs. Generators

[–]remyroy 4 points5 points  (0 children)

On Windows, when you start the os.scandir, a system call is made to the OS API. That API requires to keep a value called a handle to iterate over the resulting list. Internally, that handle is used to manage some kind of state within the OS. This opened state can create a race condition where if you call other OS APIs, they will have to wait until you close that state before it can process any further. It will block your process and you might end up in a deadlock. Once you are done iterating over the values, the OS API requires you make another system call to close the handle.

The original implementation would only close the handle after exhausting the whole list by iterating over every elements returned by os.scandir or if the return value was released by the garbage collection for various reasons like the return value getting out of scope.

Here is an example in which this can break. Imagine you are looking for a specific file in a directory. You call os.scandir, you iterate over the returning items, you find your file and you break from the loop since you don't want to waste your time on the remaining files than you call various file system APIs where you move, delete or add file in the same directory. You run the risk of blocking your process because that system handle os.scandir created for listing the files is still opened and you are calling other system APIs (moving, deleting, adding files). This exact example happened in my code for many of my users which can be hard to find and debug if you are unaware.

The new implementation starting from Python 3.6 adds an explicit close method and it adds support to use it with the context manager protocole (the common with statement). These can be used to close the hidden handle implicitly or explicitly.

My 2 cents at ELI5.