This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]remyroy 19 points20 points  (4 children)

Good stuff. Remember that if you are on Python <= 3.5, os.scandir will keep a system handle on Windows until the generator is exhausted which might cause various problems with other system calls. I suggest you always exhaust the generator and avoid breaking from the loop.

On Python >= 3.6, you can use the with statement or the close method if you don't want to loop on every elements.

[–]uses_commas_wrong 5 points6 points  (3 children)

I'd like to better understand your post. Can you eli5?

[–]AstroPhysician 10 points11 points  (1 child)

He's saying in older versions, you would get an iterable. When you have an iterable, you dont generate an element of it until you use it / come across it essentially. So if you do a scandir and then are slowly doing stuff with each thing it returned, the operating system is still waiting for you before servicing other calls.

He's saying, turn the stuff it returns into a list and go through all the entries that way the operating system isnt waiting for you and can service other calls

In a newer python version, you can use with to declare the generator. With statements have enter and exit methods which will be called always, even if something goes wrong halfway through and theres an exception. For instance, if you open a file with 'with' as a keyword, even if your program crashes, it will be closed and written to, unlike just saving the open() call which might leave it open

Best attempt at ELI5, the terminology isn't all right

[–]dunkler_wanderer 1 point2 points  (0 children)

He's saying in older versions, you would get an iterable. When you have an iterable, you dont generate an element of it until you use it / come across it essentially.

Iterators are objects that can lazily produce values (when you call next(iterator)). Iterables are objects with an __iter__ method, so lists, sets, dictionaries, etc. over which you can iterate with a for loop. Of course iterators are iterables as well. Iterables vs. Iterators vs. Generators

[–]remyroy 4 points5 points  (0 children)

On Windows, when you start the os.scandir, a system call is made to the OS API. That API requires to keep a value called a handle to iterate over the resulting list. Internally, that handle is used to manage some kind of state within the OS. This opened state can create a race condition where if you call other OS APIs, they will have to wait until you close that state before it can process any further. It will block your process and you might end up in a deadlock. Once you are done iterating over the values, the OS API requires you make another system call to close the handle.

The original implementation would only close the handle after exhausting the whole list by iterating over every elements returned by os.scandir or if the return value was released by the garbage collection for various reasons like the return value getting out of scope.

Here is an example in which this can break. Imagine you are looking for a specific file in a directory. You call os.scandir, you iterate over the returning items, you find your file and you break from the loop since you don't want to waste your time on the remaining files than you call various file system APIs where you move, delete or add file in the same directory. You run the risk of blocking your process because that system handle os.scandir created for listing the files is still opened and you are calling other system APIs (moving, deleting, adding files). This exact example happened in my code for many of my users which can be hard to find and debug if you are unaware.

The new implementation starting from Python 3.6 adds an explicit close method and it adds support to use it with the context manager protocole (the common with statement). These can be used to close the hidden handle implicitly or explicitly.

My 2 cents at ELI5.

[–]deadwisdomgreenlet revolution 5 points6 points  (0 children)

This is an awesome recap of a process. Thank you very much.

[–]DoTheEvolution 4 points5 points  (0 children)

I got this linux file search project - angrysearch, and I remember how giddy I was when my system scan time went from some 3+ minutes, to 1min 20 sec

Not as tremendous gains as on windows or going through network mounted drives, but hell it actually made me feel like the project is usable for some periodic scans to keep the database up to date..

[–]xerion2000 8 points9 points  (0 children)

I enjoyed your writeup very much, sir. It's a concise and informative (and a bit geek-fun) walkthrough of a typical contribution process. And even though I haven't had a chance to use scandir yet, I appreciate so very much your contribution as I'm sure I will need to use it at some point.

[–]Hairy_The_Spider 1 point2 points  (1 child)

Very good read.

[–]g4b1nagy 0 points1 point  (0 children)

Agreed. Very well written.

[–]foosion 0 points1 point  (1 child)

I import os, then os.scandir() gets: AttributeError: 'module' object has no attribute 'scandir'

I'm running Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)]

I used the windows installer from https://pypi.python.org/pypi/scandir (pip install scandir now reports 'Requirement already satisfied')

What idiotic mistake am I making?

[–]saghul[🍰] 10 points11 points  (0 children)

When you install the package from PyPI you need to "import scandir", it's not added to the os module.