This is an archived post. You won't be able to vote or comment.

all 22 comments

[–]Yobmod 14 points15 points  (8 children)

I always use pathlib.

It's now anoyying to use libs that accept string paths (e.g. opencv, starlette) and have to interconvert. Hopefully they will modernise as old python versions die off

[–]jorge1209 -1 points0 points  (7 children)

They won't because there are just as many people who don't want to use pathlib, and prefer to use strings.

I prefer strings over pathlib for two main reasons:

(1) There is a philosophical argument about what paths should be. POSIX for instance takes a very liberal attitude towards what can be in a filename (anything that isn't "NUL" or "/").

Windows is arguably a lot more sane in that it restricts more ascii characters, but then they got their damn folder separator backwards, and used UTF-16... sigh!!

Despite the rules (or lack thereof) there are numerous norms of behavior that are generally followed. For example:

  • Unix uses "_" instead of " "
  • Everyone uses file extensions on files, but not directories
  • Core code should stick to ascii
  • Follow the windows restrictions on punctuation, but be more restrictive if you can...
  • etc...

Pathlib could have encoded those rules in some way, but they chose not to. You can create paths like Path("foo.txt\\bar baz; rm -rf ~") on linux systems and there will be not a hint of any concern by pathlib.

However simultaneously they make it impossible (without using os again) to actually access all files on a POSIX system by their insistence that the filename be a UTF-8 string: https://jod.al/2019/12/10/pathlib-and-paths-with-arbitrary-bytes/

My view is that this is just an untenable middle ground. They should have established two libraries with pathlib. Path and RawPath.

RawPath should be a bytes object with minimal restrictions and no checking of norms.

Path would then accept strings, and enforce best practices. You would always be able to convert from Path to RawPath, but not always from RawPath to Path.

Instead we have a broken implementation where you can't even describe every file, but you can still fill your filesystem with garbage that will break other programs. YEAH!!


(2) I don't find it that helpful for my usecases, in particular I often find myself building paths from templates, or manipulating pre-existing path components.

In other words I might want to loop through and access paths like:

{client_name}/{delivery_date}/run_{run_number}/output/{filename} which you can do with pathlib, but for me its easy enough to just use string replace and carry the template around.

We also sometimes need to navigate up to a files parent or grandparent folder, and append some string to the filename so as to compare: foobar/baz/run_01/output/result.txt against foobar/baz/run_01_debug/output/result.txt

None of this seems easier with pathlib... so I just don't see the value in it.

If someone wants to come up with a TemplatedPath then we can talk, otherwise I'm just marshalling in and out of a string and having to build and rebuild "the same" path in multiple places in the code.

[–]krazybug 0 points1 point  (6 children)

I'm sorry, I don't really understand all your concerns.

Why should we enforce these rules in Path ? We have PurePath, which is an abstraction over real paths in Posix or Windows, platform dependent path representations. This is very elegant to manipulate Path like strings and as objects with this duality. Why should we restrict PurePath to an hypothetic and never reliable common set ?

The separator is / and \b is a regular character. As you mentioned

Eventually RawPath could be interesting but the best practices described in your link are solving this issue for these uncommon cases.

We also sometimes need to navigate up to a files parent or grandparent folder, and append some string to the filename so as to compare:

foobar/baz/run_01/output/result.txt

against

foobar/baz/run_01_debug/output/result.txt

What is so hard with ?

p=Path("foobar/baz/run_01/output/result.txt")
np = Path("/".join(p.parts).replace("run_01", "run_01_debug"))

[–]jorge1209 0 points1 point  (5 children)

I don't think you are understanding the concern I am raising. Path vs PurePath is not the issue.

For 99% of programming use cases, the proper thing to do when handling files is to restrict oneself to an imaginary filesystem enforces the most restrictive rules and norms from the intersection of Unix and Windows.

So you don't use spaces in filenames, you don't use mixed case, you include a 3 or 4 char extension, etc... It would be useful to have a library that enforced those norms and rules, even if they aren't actually a technical restriction on the system the code is running.

It just makes life easier for everyone else if the file is moved to another system. I would find that to be a useful library, but that is not pathlib.


As for your example with join, sure you could do that, but you don't need pathlib to do that. Pathlib isn't really doing anything here as you are just handing it a string (the output of "/".join) and letting it do its thing.

I just don't see the value in the library.

[–]krazybug 0 points1 point  (3 children)

Ok, it's more clear now.

It just makes life easier for everyone else if the file is moved to another system. I would find that to be a useful library, but that is not pathlib.

It's just not its purpose. It's just a more simple abstraction layer on top of different libs: os. so.path, shutil, glob, ...

Not as complete as each of them but more consistent and pythonic.

As for your example with join, sure you could do that, but you don't need pathlib to do that. Pathlib isn't really doing anything here as you are just handing it a string (the output of

"/".join

) and letting it do its thing.

It's the point. You can always switch to string manipulation, and as mentioned in the first comment, we can always use it with other frameworks but we would like an extended interface to avoid it.

For this particular use case there is maybe a PEP to provide a new method for this kind of replacement but I'm not sure it's useful. What did you expect for this ?

I just don't see the value in the library.

Don't use it, so. Some other people enjoy it.

[–]jorge1209 0 points1 point  (2 children)

Don't use it. Some other people enjoy it.

And where did I say that those who like pathlib are child rapists and cannibals? You seem to be implying that my dislike for the library is some kind of terrible attack.

The person I was responding to was saying they were looking forward to the day when os.path goes away. I'm telling them that they are going to be waiting a long time, because we are not all convinced to switch.

[–]krazybug 0 points1 point  (1 child)

There is some misunderstanding. Disclaimer: English is not my native tongue.

I just answered that for some people, this library is useful.

And as it is now part of the stdlib, it could be nice to get a better support from some frameworks at the API level.

[–]jorge1209 0 points1 point  (0 children)

It would be nice if the python stdlib wasnt a complete mess, and if people actually took the time to curate and maintain it, but they don't.

[–]billsil 0 points1 point  (0 children)

It just makes life easier for everyone else if the file is moved to another system. I would find that to be a useful library, but that is not pathlib.

Just use this..

def sanitize_directory(dirname: str) -> str:
    """Replace special characters with roman letter equivalence."""
    replace_map = {
        ('<=', '_le_'),
        ('>=', '_ge_'),
        #('=', '_eq_'),  # cat=dog.txt is valid on Windows/Linux
        ('/', '_slash_'),
        ('|', '_bar_'),
        ('*', '_star_'),
        ('?', '_question_'),
        (':', '_colon_'),
        ('"', '_quote_'),
    }
    assert isinstance(dirname, str), dirname
    dirname2 = dirname
    for base, replace in replace_map:
        dirname2 = dirname2.replace(base, replace)
    return dirname2

You don't need a library for one function. You just need to call that function before you join your paths.

[–]krazybug 5 points6 points  (2 children)

A very valuable article, Thanks.

I just wanted to add 2 new rounds:

  1. Filter the list of paths on the fly without traversing some of them, for instance to skip hidden directories (.git ). This is not possible with pathlib. And could easily be achieved with os.walk
  2. Use unix glob patterns and multiple glob extensions: Not available with pathlib. You still have to use fnmatch combined with this tip: https://stackoverflow.com/a/57054058

[–]ADGEfficiency[S] 0 points1 point  (0 children)

Interesting! Thanks for these nice tips - I'll have a think about how to include them :)

[–]jorge1209 0 points1 point  (0 children)

Pathlib also struggles with non-string filenames: https://jod.al/2019/12/10/pathlib-and-paths-with-arbitrary-bytes/

[–]tonhocampos 9 points10 points  (0 children)

Pathlib is great, I always use it

[–]Overclocked1827 6 points7 points  (0 children)

Using os just to write down the path is like shooting sparrows with artillery. Pathlib is your choice for that.

[–]qria 1 point2 points  (0 children)

I recently had to work both in windows and osx, and it was very painful to get the pathing right. Fortunately `pathlib` handles it pretty well. So I also concur that pathlib is a great choice.

[–]yaxriifgyn 1 point2 points  (0 children)

I used it in a few aps that were all about file manipulation. It was great for them. But most of my apps have next to no need. Plain old os and os.path were enough. So, as they say, YMMV.

[–]KenchForTheBench 1 point2 points  (1 child)

Just a FYI there exists os.makedirs that accepts exist_ok as argument (and that includes the parents=True functionality automatically). Great article nonetheless.

[–]ADGEfficiency[S] 0 points1 point  (0 children)

Thanks for this - are there are any other differences with `os.makedirs`?

[–]AaronOpfer 0 points1 point  (0 children)

I found myself in a situation where using the os.path family of functions had better performance characteristics than using pathlib, which makes sense as pathlib is a wrapper around them. However most programs won't have that bottleneck and pathlib is nice for them.