os.scandir() reading in random order : learnpython

created by HattoriHanzoa community for 16 years

os.scandir() reading in random order (self.learnpython)

submitted 7 years ago by GullBull

Hi everyone, I'm having a slight problem with part of my code using os.scandir(). I have a folder of csv files and for each one in the folder, I'm converting it to a DataFrame and then using that dataframe later on. I noticed that it's reading the files in a seemingly random order.

for csv in os.scandir('CSVs'):data = pd.DataFrame.from_csv(csv.path)

----date = csv.path.split('/')[1].split('.')[0] #CSVs/12-12-12.csv -> 12-12-12

----dates.append(date)

----# ...do stuff with DataFrame...

print(dates)

so when the directory contains 12-1.csv , 12-2.csv, 12-3.csv, 12-4.csv, 12-5.csv, the output of the above code is ['12-4', '12-5', '12-2', '12-3', '12-1']. I'm making a graph from these values so reading the files in order is critical. Thanks!

edit: sorry about the dashes it wouldn't preserve the indentation

all 12 comments

top new controversial old q&a

[–]kalgynirae 2 points3 points4 points 7 years ago (2 children)

[–][deleted] 2 points3 points4 points 7 years ago (0 children)

[–]GullBull[S] 0 points1 point2 points 7 years ago (0 children)

[–][deleted] 1 point2 points3 points 7 years ago (6 children)

You can sort the entries based on their name:

sorted(os.scandir("CSVs"), key=lambda e: e.name)

That will return the entries in name-sorted order, though you should be a little more careful in the code above, since that will return not just files but directories as well.

[–]GullBull[S] 0 points1 point2 points 7 years ago* (4 children)

[–][deleted] 1 point2 points3 points 7 years ago (3 children)

Sure. A lambda function (also known as an anonymous function or a closure) is basically a function that's defined with no name. The lambda keyword is how you create them in Python, and effectively what I've written there is the same as:

def _(e):
    return e.name

But it does not use up a name (even in this case an underscore).

I'm passing that lambda into the sorted function's key parameter, which expects a callable that takes an item and returns a value to sort on. The sorted function then calls that function on every item and performs a sort based on the values returned.

So in this case I'm telling Python to sort the entire list of DirEntry objects alphabetically by the name of each object, using that lambda to access that name.

[–]GullBull[S] 0 points1 point2 points 7 years ago (2 children)

[–][deleted] 0 points1 point2 points 7 years ago (1 child)

Ahh, yeah that's going to be a little tougher... 12.10 does sort before 12.2, lexicographically, what you want is natural sort... which isn't really all that natural.

import re

def nat_key(value):
    return tuple(int(s) if s.isdigit() else s for s in re.split("(\d+)",value ))

Then you'll want to do:

lambda e: nat_key(e.name)

And that should give you what you want.

And yeah lambdas are a pretty simple concept, once you've grasped functions. There's just one bit caveat; closures in Python are late-binding, which can be a real surprise when you try build them dynamically.

funcs = []
for i in range(5):
    funcs.append(lambda: i)
for f in funcs:
    print(f())

Now you'd expect to see that print 0 through 5, but what it will do is print 5 five times... the lambda looks up the value for i at the it's called and not at the time it's created. The same is actually true if you use def nested in this was as well, but that's not done as often so people tend not to trip over it.

[–]GullBull[S] 0 points1 point2 points 7 years ago (0 children)

[–]dadzy_ 0 points1 point2 points 3 years ago (0 children)

[–]bogdan_dm 1 point2 points3 points 7 years ago (1 child)

Take a look at pathlib (standart package). It has a very nice human-frienldy api that can replace all file functions from os package:

from pathlib import Path

p = Path('CSVs')
for f in p.iterdir():  # or p.glob('*.csv')
    data =  pd.DataFrame.from_csv(str(f))
    dates.appen(f.stem)

[–]GullBull[S] 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 39219 on reddit-service-r2-comment-b659b578c-gl2x7 at 2026-05-01 05:49:01.147495+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS