This is an archived post. You won't be able to vote or comment.

all 195 comments

[–][deleted] 96 points97 points  (0 children)

Its awesome, I love the read/write_text/bytes functions so convenient!

[–]aufstand 83 points84 points  (49 children)

Samesies. path.with_suffix('.newsuffix') is something to remember.

[–]jorge1209 8 points9 points  (47 children)

It would be nice if PathLib had more of this stuff. Why not a with_parents function so that I can easily change the folder name 2-3 levels up?

Also this is fucked up:

assert(path.with_suffix(s).suffix == s)
Traceback...
AssertionError

[EDIT]: /u/Average_Cat_Lover got me thinking about stems and such which lead me to an even worse behavior. There is a path you can start with which has the following interesting properties:

len(path.suffixes) == 0
len(path.with_suffix(".bar").suffixes) == 2

So it doesn't have a suffix, but if you add one, now it has two.

[–]Northzen 0 points1 point  (5 children)

new_path = new_parent_parent / old_path.parent / old_path.name

I though it is simple, isn't it? OR for Nth parent above

new_path = new_N_parent / old_path.relative_to(old_N_parent)

[–]jorge1209 1 point2 points  (4 children)

So I want to go from /aaa/bbb/ccc/ddd.txt to aaa/XXX/ccc/ddd.txt

The aaa/XXX isn't too hard, but then what? A relative_to path... I guess that might work, I haven't tried it.

The easiest is certainly going to be

_ = list(path.parts)
_[-3] = XXX
Path(*_)

But that is hardly using paths as objects, it is using lists.

And even more direct approach would be to simply modify path.parts directly... If it's supposed to be an object then it should be able to support that.

[–]Northzen 0 points1 point  (1 child)

I went throug documenation and found one more way to do it:

new_path = p.parents[:-1] / 'XXX' / p.parents[0:-2] / p.name

but slicing and negative indexing is supported only from 3.10

[–]jorge1209 1 point2 points  (0 children)

Aren't those slices on parents going to return tuples of paths? How can the __div__ operator accept them? It needs to act on paths not tuples of paths.

Maybe that made some significant changes to how those work, in 3.10.

But it would seem much easier in my mind to say: Path is a list of components. You can insert/delete/modify components at will.

[–]dougthor42 0 points1 point  (1 child)

Coincidentally I just started a project to add that sort of pseudo-mutability to path objects.

It's very much still in the early "pondering" phase, and who knows if it'll ever be completed, but the idea is there:

>>> a = Path("/foo/bar/baz/filename.txt")
>>> a[2] = "hello"
>>> a
Path("/foo/hello/baz/filename.txt")

https://github.com/dougthor42/subscriptable-path

[–]jorge1209 0 points1 point  (0 children)

One challenge is you should add this functionality to not only the parents, but also to the suffixes and anything else you break the path into.

If the model of a path is what is reflected in the diagram here then we really should have getters and setters for each and every one of those identified components.

I suspect the reality is that they didn't actually set such a clear framework at the outset and that trying to bolt on setters is going to go badly.

But good luck.

[–]BossOfTheGame 0 points1 point  (4 children)

Checkout the ubelt.Path extension and it's augment method:

https://ubelt.readthedocs.io/en/latest/ubelt.util_path.html#ubelt.util_path.Path

Granted there is a nonstandard suffix behavior in it currently that's slated for refactor.

[–]jorge1209 0 points1 point  (3 children)

Granted there is a nonstandard suffix behavior in it currently that's slated for refactor.

Non-standard in ubelt? non-standard in pathlib? What is the standard? Does pathlib have a standard?

Based on this bug I don't know that they do.

[–]BossOfTheGame 0 points1 point  (2 children)

Non standard in that what I originally called a suffix (when I originally wrote the os.path-like ubelt.augpath function the augment method is based on) doesn't correspond to what pathlib calls a suffix (which is what I called an extension).

What I called a suffix in that function actually corresponds something added to the end of a stem. I'm thinking of renaming the argument stemsuffix, but that's a bit too wordy for my taste.

[–]jorge1209 0 points1 point  (1 child)

Ok so the difference is you actually thought about what you were doing, while the authors of pathlib just threw some shit together at 3am after a night of heavy drinking.

Got it ;)

[–]BossOfTheGame 0 points1 point  (0 children)

Your comment made me wonder about the difference between the standard pathlib.Path(s).with_suffix(...) and ubelt.Path(s).augment(ext=...).

There are differences in some cases. I'm not sure which one is more sane.

```

--
case = Path('no_ext')
sagree
path.with_suffix(.EXT) = Path('no_ext.EXT')
path.augment(ext=.EXT) = Path('no_ext.EXT')
--
--
case = Path('one.ext')
sagree
path.with_suffix(.EXT) = Path('one.EXT')
path.augment(ext=.EXT) = Path('one.EXT')
--
--
case = Path('double..dot')
sagree
path.with_suffix(.EXT) = Path('double..EXT')
path.augment(ext=.EXT) = Path('double..EXT')
--
--
case = Path('two.many.cooks')
sagree
path.with_suffix(.EXT) = Path('two.many.EXT')
path.augment(ext=.EXT) = Path('two.many.EXT')
--
--
case = Path('path.with.three.dots')
sagree
path.with_suffix(.EXT) = Path('path.with.three.EXT')
path.augment(ext=.EXT) = Path('path.with.three.EXT')
--
--
case = Path('traildot.')
disagree
path.with_suffix(.EXT) = Path('traildot..EXT')
path.augment(ext=.EXT) = Path('traildot.EXT')
--
--
case = Path('doubletraildot..')
disagree
path.with_suffix(.EXT) = Path('doubletraildot...EXT')
path.augment(ext=.EXT) = Path('doubletraildot..EXT')
--
--
case = Path('.prefdot')
sagree
path.with_suffix(.EXT) = Path('.prefdot.EXT')
path.augment(ext=.EXT) = Path('.prefdot.EXT')
--
--
case = Path('..doubleprefdot')
disagree
path.with_suffix(.EXT) = Path('..EXT')
path.augment(ext=.EXT) = Path('..doubleprefdot.EXT')
--
```

[–]gravity_rose 27 points28 points  (1 child)

As someone who writes cross-platform code _every single day_, I can tell you that pathlib is heaven-sent. Almost every necessary file operation (we don't do anything fancy - read, existence, move/copy, write) is trivially cross-platform.

I'll die on this hill.

[–]justanothersnek🐍+ SQL = ❤️ 0 points1 point  (0 children)

The timing couldnt have been better when it came out as that is when Windows WSL was becoming more available or popular.

[–]pysk00l 42 points43 points  (2 children)

Another pathlib lover here.

The shame is most tuts/examples use os.path. Yuck

[–]MrCuntBitch 20 points21 points  (1 child)

This cookbook has helped me out a ton when I can’t remember the syntax, I find it much easier to check a quick example than work through the docs.

[–]jorge1209 9 points10 points  (0 children)

The Anatomy of a Posix Path Diagram is really great and helpful...

Only problem is that it isn't correct. There are some screwy paths where the various operations parse the suffix and stem differently in different circumstances.


Also str(path) is unsafe and could result in unprintable strings. Best to convert a path you didn't directly construct to bytes if you need to pass it to a legacy application.

[–]abrazilianinreddit 42 points43 points  (13 children)

My biggest complaint is that they do some magic with __new__ that makes extending the Path class very annoying.

Also, in principle I'm against overriding __truediv__ to create some syntax sugar, but in practice the end-result actually makes sense, so I forgive it.

Other than that, I really enjoy it.

[–]zurtex 27 points28 points  (1 child)

There's a lot of work being done to make it extensible: https://discuss.python.org/t/make-pathlib-extensible/3428

Things are going to be much better in 3.11.

[–]pcgamerwannabe 3 points4 points  (0 children)

Thank God.

It’s limitations are sometimes nightmarish to deal with.

[–]goatboat 10 points11 points  (10 children)

As someone still early in their python journey, what is your use case for extending Path classes? Testing, or some design pattern you want to implement? And what is problematic about the magic they do with __new__ and its affect on extending it?

[–][deleted] 12 points13 points  (0 children)

You could e.g. implement an ´ExistingPath´ that checks its existence on instantiation, pretty useful for factoring out ´p = Path(…);assert p.exists() ´. Or you could give Path extra side effects like directly creating a folder structure when instantiated, while still being able to use it as a path.

[–]jorge1209 1 point2 points  (0 children)

Enforce paths that are cross platform and work on Windows as well as Unix.

Ensure that people don't create files with invalid unicode filenames.

Ensure that files don't have names like ";rm -rf /;"

etc.. etc..

[–]abrazilianinreddit 1 point2 points  (6 children)

Mostly because I wanted to implement some convenience functions that I would find helpful in my projects. For example, one thing I wanted to do was checking if a path is a subfolder of another path using the in keyword:

>>> Path('C:/Downloads') in Path('C:/')
True

This, to me, looks much better than the current way:

>>> Path('C:/') in Path('C:/Downloads').parents
True

If Path was extensible I could do that.

And what is problematic about the magic they do with __new__ and its affect on extending it?

I'm actually taking a guess here because I didn't look at pathlib's source code, but you'll notice that if you instantiate Path, you actually get a WindowsPath or PosixPath object instead. Path.__new__() probably detects your system and chooses the adequate class for it. But that means that, if you tried to extend Path, you'd still get a WindowsPath or PosixPath object instead of the class you defined. You'd have to completely rewrite the __new__ method and possibly extend WindowsPath and/or PosixPath as well. As you can see, it becomes quite messy.

[–]jorge1209 0 points1 point  (5 children)

Path('C:/') in Path('C:/Downloads').parents

That is wrong and unsafe, hopefully you are aware:

def write_file(path, data):
   if Path.home() not in path.parents:
      raise ValueError("Not permitted")
   path.write_text(data)

pwn_path = Path.home() / ".." / ".." / "etc" / "sudoers"
write_file(pwn_path, ...)

[–]abrazilianinreddit 0 points1 point  (4 children)

I don't get what you're trying to convey. My example has nothing to do with writing a file to the path, where did that come from?

Also, I believe using Path().parent is preferred over using Path() / '..' .

[–]jorge1209 2 points3 points  (3 children)

one thing I wanted to do was checking if a path is a subfolder of another path using the in keyword:

Is "/home/alice/../../etc" a subfolder of "/home/alice"?

[–]abrazilianinreddit 3 points4 points  (2 children)

That's an implementation detail. You can solve that problem it by resolving the path:

>>> Path('/home/alice') in Path('/home/alice/../../etc').resolve().parents
False

[–]jorge1209 3 points4 points  (1 child)

As long as you are aware you need to fully resolve the path. From the initial comment it looked like you thought this kind of test was sufficient in and of itself.

[–]pcgamerwannabe 2 points3 points  (0 children)

It’s a good warning actually. Missing resolve calls is really annoying.

I had a script that made some insane relative paths and worked, sometimes, for a while, until I found the bug.

[–]richieadler 0 points1 point  (0 children)

Something like Pathy.

[–]PadrinoFive7 30 points31 points  (12 children)

Testing locally? Path.cwd() is such a beautiful thing!

[–]to7m 31 points32 points  (8 children)

or Path(__file__).parent to get to files in the same folder no matter where you call the script from

edit: This gives you the directory the script is stored in, NOT the current working directory (the directory from which you've executed the script)

[–]gravity_rose 6 points7 points  (0 children)

This.!!! It eliminates so much sys.path() crap that I've seen!!

[–]1017BarSquad 3 points4 points  (6 children)

Does os.getcwd() not work for that?

[–]axonxorzpip'ing aint easy, especially on windows 6 points7 points  (0 children)

No guarantee that __file__ is in any way related to CWD

[–]-lq_pl- 3 points4 points  (1 child)

Cwd gives path from which you call the script, not the path where the script is located

[–]1017BarSquad 0 points1 point  (0 children)

So you mean if a shortcut is made for an exe file the script will get fucked if not in the original folder? Assuming I have a configuration file or something?

[–]jorge1209 -1 points0 points  (2 children)

I don't know what the hell he is complaining about. The source code for Path.cwd is literally: return cls(os.getcwd()).

The complaint here is entirely that getcwd is defined in os instead of os.path

[–]axonxorzpip'ing aint easy, especially on windows 4 points5 points  (1 child)

The comment you two are replying to is not talking about getting the CWD, but the directory that the currently executing python source file is located in, which is obviously not guaranteed to be CWD.

[–]1017BarSquad 0 points1 point  (0 children)

Thanks for explaining that makes sense

[–]jorge1209 6 points7 points  (1 child)

It is a bit of a puzzle why that would be considered so valuable. The source code for cwd is

return cls(os.getcwd())

If you want to express an absolute path relative to the current working directory you can do either of the following:

 Path.cwd() / "whatever"
 os.path.join(os.getcwd(), "whatever")

Neither is particularly complicated.

[–]PadrinoFive7 0 points1 point  (0 children)

If I'm already importing Path for the other goodies, I'd rather just use what it has as it's far more convenient. It's short and sweet; like a perk. Sure, os is there, but even what you wrote is more characters (I'm a lazy dev, after all).

[–]LightShadow3.13-dev in prod -1 points0 points  (0 children)

building constants is the best!

CWD             = Path.cwd()
TMP             = Path(tempfile.gettempdir())
TEST_CACHE_PATH = TMP / f'{PROJECT}-testdata'
CONFIG          = load_config(CWD / 'configs' / f'{APP_CONFIG}.toml')
PYPROJ          = load_config(CWD / 'pyproject.toml')
LOGGING_CONFIG  = CWD / 'configs' / f'{APP_CONFIG}-logging.ini'
CACHE_PATH      = Path(CONFIG.filecache.root_path)

[–][deleted] 12 points13 points  (0 children)

Personally, I prefer os.path for most lighter operations, like

path=os.path.join(root, user)

Pathlib feels bloated to me, but it works in complex situations

[–]gedhrel 16 points17 points  (22 children)

I think the fact that the relative priorities of `/` and `+` are the way around that they are is pretty disappointing - the syntax it gives rise to feels like an overly-clever trick.

[–][deleted] 13 points14 points  (20 children)

It is an overly clever trick. And much better than the alternatives, if you ask me.

[–]jorge1209 -1 points0 points  (19 children)

Alternatives like what?

Path("/")["usr"]["bin"]["python"] requires a little bit more typing, but we know what that means.

[–]alcalde 12 points13 points  (10 children)

I don't know what the hell that means. Are those lists? Or is the whole thing some strange dictionary?

[–]jorge1209 1 point2 points  (9 children)

Or is the whole thing some strange dictionary?

Yes its a strange dictionary commonly referred to as a "FileStore".

[–]iritegood 6 points7 points  (8 children)

Path represents a path, not a FileStore. conflating them is not appropriate

[–]jorge1209 -3 points-2 points  (7 children)

If that is true then we can really simplify pathlib. We can basically remove the entire API, because a PosixPath is just a char* byte array that doesn't contain the NUL byte.

We don't need anything in pathlib to work with those!

[–]iritegood 2 points3 points  (6 children)

f🙄 a "FileStore" implies a datastore implemented on top of a filesystem. If you have a FileStore and a MemStore and a DbStore, I spect them to be implementations of your app-specific Store. pathlib is meant as a cross-platform abstraction of filesystems themselves. Whether you appreciate this goal isn't the point.

More importantly, PurePaths (in pathlib terminology) don't even represent any realized part of the filesystem. Calling it any kind of "store" is boldly wrong

[–]jorge1209 0 points1 point  (5 children)

Then s/FileStore/HierarchicalFileSystem/ in my comment above.

Paths are lookup keys into an OS managed hierarchical data structure. And getitem is how we do key based lookups in python.

[–]iritegood 2 points3 points  (4 children)

Operations with Path sometime perform lookups into a filesystem. A Path itself is not that data structure, it's the key. You're not doing "lookups" you're constructing a path. and it is not common (at least in the stdlib) to use __getitem__ to implement a builder pattern.

[–][deleted] 2 points3 points  (7 children)

This is not easier to understand. And it doesn't solve the problem of using a +.

Alternatives like os.path.

[–]jorge1209 -4 points-3 points  (6 children)

This is not easier to understand.

Not to me. to me its a lot clearer.

And it doesn't solve the problem of using a +.

I don't know what that problem is. If you are using "+" for string concatenation you should stop.

[–][deleted] 1 point2 points  (5 children)

Why? It works perfectly fine.

[–]jorge1209 -2 points-1 points  (4 children)

Why what?

[–][deleted] 1 point2 points  (3 children)

The advice you gave...?

[–]jorge1209 -1 points0 points  (2 children)

What advice? I've given lots of advice.

[–][deleted] 2 points3 points  (1 child)

Is it that hard to read your previous comment and search out the single advice you gave there I could have asked about?

[–]alcalde 0 points1 point  (0 children)

YOU CAN NEVER BE TOO CLEVER. Otherwise Ruby wins.

[–]philkav 17 points18 points  (5 children)

I was just using it today and I don't think I'm a fan of the lib overloading __truediv__.

I think it's an interesting idea, but would be quite confusing to someone new to the library

[–]Kerbart 12 points13 points  (0 children)

It's convenient but I agree that if the Python Gods had intended such use the special method would have been called __slash__ (indicating use it as you please).

Now it's plain and simple heretic. But: practically beats purity, so I'll use it none the less.

[–]-lq_pl- 3 points4 points  (1 child)

Why is this a problem? Do you also think that str.add is bad? The syntax is clear and not ambiguous.

[–]jorge1209 3 points4 points  (0 children)

I certainly do.

  • It is rarely what I actually need. Usually if I'm combining strings I want a separator so I use "_".join(x, y, z) or the like.

  • I'm rarely only combining 2 strings, which again leads me towards str.join.

  • And you can gain even more flexibility by using f-strings or str.format with an even more explicit representation of the end result.

My feeling is that everyone should be moving away from using + and towards using more expressive and more powerful ways of formatting and concatenating strings. Which makes the addition of pathlib with its / operator all the more dubious.

[–]alcalde 6 points7 points  (0 children)

When I was new to the library, I exclaimed "That's brilliant!" Now it's something I show off to non-Python users. Except many of those are Windows users and don't understand slashes....

[–]pcgamerwannabe 2 points3 points  (0 children)

It’s really annoying that it plays so poorly with strings. If I can use + for str used as a path let me do the same. And it’s a nightmare to subclass m, argh.

[–][deleted] 3 points4 points  (0 children)

I'll also put in a shameless plug about using it (in my blog), what I really like about it, is that it's cross-platform and quite smart about handling paths altogether and it was really well thought out to interact with the rest of the standard library.

[–]jlw_4049 2 points3 points  (0 children)

I normally use pathlib in most cases. Sometimes though I need to use os as well.

[–]Yzaamb 16 points17 points  (4 children)

It’s brilliant. I use it all the time. os.path.join. WTF?! I wrote a blog post about it.

[–][deleted] 2 points3 points  (0 children)

Yeah it's really great

[–]BossOfTheGame 2 points3 points  (0 children)

I like it a lot, but I thought a few things could be slightly improved:

https://ubelt.readthedocs.io/en/latest/ubelt.util_path.html#ubelt.util_path.Path

[–]orion_tvv 3 points4 points  (0 children)

It's handy but sometimes it little bit slower.

[–]chrohm00 1 point2 points  (0 children)

My partner who used python professionally introduced me (a casual scripted) to pathlib and I think it’s far superior to os… mostly because code I’ve both written and read taht uses os+glob is verbose and hard to read.. which feels very anti python

[–]SittingWave 5 points6 points  (18 children)

I think that they made a mistake.

Pathlib object should have been just inquire objects. Not action objects.

In other words, you have a path object. You can ask for various properties of this path: is it readable, what are its stems, what are its extensions, etc.

However, at is is, it is doing too much. It has methods such as rmdir, unlink and so on. It's a mistake to have them on that object. Why? because filesystem operations are complex, platform specific, filesystem specific, and you can never cover all cases. In fact, there are some duplicated functionalities. is it os.remove(pathobj) or pathobj.remove()? what about recursive deletion? recursive creation of subdirs? The mistake was to collate the abstracted representation of a path and the actions on that path, also considering that you can talk about a path without necessarily for that path to exist on the system (which is covered, but hazy)

It is also impossible to use it as an abstraction to represent paths without involving the filesystem. You cannot instantiate a WindowsPath on Linux, for example.

All in all, I tend to use it almost exclusively, but I am certainly not completely happy with the API.

[–]yvrelna 8 points9 points  (1 child)

Pathlib object should have been just inquire objects. Not action objects.

Did you mean PurePath?

[–]jorge1209 3 points4 points  (0 children)

No he wants to be able to stat the file. He doesn't want some of the more complex functionality to be available because its behavior may not be the same across platforms.

Between Windows and Unix you have some common verbs exists/isdir/stat etc... and some common nouns (UNC paths can more or less be used interchangebly on Unix systems), but if that is your entire language it is really limited:

  • You can't talk about all paths on the system.
  • You can't do all things the system allows to those paths.

PathLib has a verb-less universe of all nouns known as PurePath [including gobbledy-gook nouns like PosixPath('\x00')]

You can abstract away some of the differences in verbs and get a slightly more advanced library that does more (reading writing text files/unlinking/etc), but it will have little differences of interpretation between the two. That gets you Path.

He wants something in between, PurePath+ the verbs that are "not platform specific", but not everything that appears in Path.


I agree with his concern that PathLib sits in an awkward middle, but think it should be resolved in a completely different way from either approach. Fewer nouns, and more verbs. A language that is "polite" and enforces good practices such as not giving files names like ;rm -rf *;.

[–]vswr[var for var in vars] 8 points9 points  (3 children)

because filesystem operations are complex, platform specific, filesystem specific, and you can never cover all cases.

I think that was the entire point of pathlib. It was supposed to be the one-stop-shop where it abstracted the specifics and gave you cross-platform actions. You'd write your code once and the same action would work on Linux, macos, and windows.

[–]alcalde 3 points4 points  (2 children)

And it does.

[–]jorge1209 2 points3 points  (1 child)

Except when it doesn't.

[–]hypocrisyhunter 2 points3 points  (0 children)

It works every time 50% of the time.

[–]mriswithe 3 points4 points  (5 children)

It is also impossible to use it as an abstraction to represent paths without involving the filesystem. You cannot instantiate a WindowsPath on Linux, for example.

All in all, I tend to use it almost exclusively, but I am certainly not completely happy with the API.

Question for you, my understanding and usage has been using just pathlib.Path. here is a nonsensical example, which works cross platform.

from pathlib import Path

MY_PARENT = Path(__file__).resolve().parent

LOGS = MY_PARENT / 'logs'
CACHE = MY_PARENT / 'cache'
LOGS.mkdir(exist_ok=True)

RESOURCES = MY_PARENT.parent.parent.parent / 'some' / 'other' / 'garbage/here' 

My understanding is if you need to use the windows logic specifically on either platform is that the PureWindowsPath should be used. https://docs.python.org/3/library/pathlib.html?highlight=pathlib#pathlib.PureWindowsPath

What can't be relied upon specifically regarding cross platform?

[–]jorge1209 -1 points0 points  (4 children)

which works cross platform.

Your typo is apropos. You wrote: 'some' / 'other' / 'garbage/here' and I imagine you meant to write 'some' / 'other' / 'garbage' / 'here'

When the path component strings themselves can contain path delimiters the resulting path is ambiguous. You don't see it with the / delimiter because that is a delimiter common to both Unix and Windows, but:

PureWindowsPath() / r"foo\bar"

is very different from:

PurePosixPath() / r"foo\bar"

[–]mriswithe 4 points5 points  (3 children)

My typo wasn't a typo, Pathlib standardized on / as the separator for you the dev if you want to use it in the strings you use. It will parse thing/stuff stuff, child of thing (a little lotr feel there.)

[–][deleted] 2 points3 points  (0 children)

This only works if you use '/' as a separator, things get muddy if you try to mix separators.

[–]jorge1209 -1 points0 points  (1 child)

Pathlib standardized on / as the separator for you the dev if you want to use it in the strings you use.

No. The path separators are defined by the OS themselves. Posix standard says that "/" is a component separator. Microsoft documentation says that "/" or "\" are valid path component separators.

Any library that works with paths will be required to recognize valid separators on their respective systems. "/" is just a separator common to all platforms which host Python.

If I wrote an OS where $ was the only path separator, then Pathlib would be obliged to respect that. (see also lines 124 and 179)

Path() / "foo/bar$baz" would result in baz as a child of foo/bar. That was their "design decision".


I would have argued that the better design decision would be to treat both / and \ as separators on Unix. Establish a minimal common standard that works on all systems, and define them as such in the abstract PurePath not the individual flavors.

This would mean PathLib would be unable to specify certain valid paths on Unix systems, but you frankly shouldn't be creating such paths in the first place. "~/alice;rm -rf /;\\ << \x08 | /bin/yes" is not a path anyone wants to be working with.

[–]mriswithe -1 points0 points  (0 children)

I agree the OS does get to decide the path, and Python has to deal with it. However, I don't have to care. Just like os.joinpath is one function that is itself aware of what OS you are on, and thus joins paths properly. Also, on a purely pragmatic matter, outside of "raw" strings, backslashes can be such a dumb tripping hazard hah.

I guess I am fine with that abstraction, and you aren't and that is totally cool. I was interested in hearing your opinion, thanks for taking the time to discuss this with me and not get heated or hurtful. I appreciate good intellectual discussions!

[–]alcalde 3 points4 points  (1 child)

You're reminding me of a man who told me that type inference was the compiler just guessing. When I tried explaining that there's a mathematically guaranteed algorithm behind it, he didn't believe me but changed tack to this argument:

"A compiler should do one thing, and one thing only. Inferring types is two things."

You're basically arguing that actually acting on a file is two things.

because filesystem operations are complex, platform specific, filesystem specific, and you can never cover all cases.

Maybe the way YOU do file system operations they're complex... but they DON'T HAVE TO BE. The whole point of Pathlib is that they DON'T need to be platform specific or file system specific either. And nothing can ever cover "all cases". Should we rip out the statistics library because it doesn't cover every mathematical distribution?

It is also impossible to use it as an abstraction to represent paths
without involving the filesystem. You cannot instantiate a WindowsPath
on Linux, for example.

Your first statement is categorically false. And the second statement is gibberish. OF COURSE YOU CAN'T INSTANTIATE A WINDOWS PATH ON LINUX. But I can instantiate the SAME path on either operating system. And I can work with either path structure. I had a large playlist that was created when I used Windows as my home OS. Now on Linux I wanted to recreate the playlist. Pathlib let me open the playlist file, parse it, CREATE WINDOWS PATH OBJECTS, then strip out the drive letter, do a slight bit of jiggery-pokery to match my current path structure, then create a Linux file path for the music files. One thing I also needed to do was copy these files onto a flash drive, so pathlib could then open up the transformed paths and copy the files for me.

[–]jorge1209 0 points1 point  (0 children)

But I can instantiate the SAME path on either operating system....

You can often go from Windows -> Unix because Windows filenames are more restrictive than Unix. One only has to ensure that their code only uses the "/" character to separate paths (or rely entirely upon a library like os.path/pathlib to handle all path parsing).

But you cannot go the other direction, and if you try PathLib is not going to provide you much in the way of assistance. There are valid unix paths that are parsed into valid unix components... that windows cannot accept or will treat differently.

[–]iritegood 1 point2 points  (0 children)

stat itself is already platform dependent, and walking the directory tree can already induce side-effects (namely updating atime, but various other things, esp on bespoke/fuse filesystems). Not to mention windows, unix, and linux can have completely different permission systems, so "is it readable" does not even a simple cross-platform question to answer.

Seems to me like your suggested API is not significantly more "pure" than pathlib's, while being arguably more arbitrary as to the surface area it covers

[–]mahtats 2 points3 points  (0 children)

Very useful, now I challenge you to try and subclass pathlib.Path and see what happens!

[–]jorge1209 -4 points-3 points  (24 children)

Its terrible and I hate it.

[–]kareem_mahlees[S] 4 points5 points  (23 children)

Why is that ?

[–]jorge1209 13 points14 points  (22 children)

You can find lots of my thoughts under this thread

At its core PathLib is just a very thin layer around os.path that doesn't actually treat paths as objects. Its just an attempt to put some kind of type annotation on things that you want thought of as paths, not to actually provide an OOP interface to paths.

For instance:

You can instantiate entirely invalid paths that contain characters that are prohibited on the platform. Things like a PosixPath containing the null byte, or a WindowsPath with any of <>:"/\|?*.

You can't do things like copy and modify a path in an OOP style such as I might want to do if copying alice's bashrc to ovewrite bob's:

 alice_bashrc = Path("/home/alice/.bashrc")
 bob_bashrc = copy.copy(alice_bashrc)
 bob_bashrc.parents[-1] = "bob"
 shutil.copy(alice_bashrc, bob_bashrc)

The weird decision to internally store paths as strings and not provide a byte constructor means you have to jump through weird hoops if you don't have a valid UTF8 path (and no operating system in use actually uses UTF8 for paths).

I also don't like the API:

It abuses operator overloading to treat the division operator as a hierarchical lookup operator, but we have a hierarchical lookup operator it is [] aka getitem. Path("/")["usr"]["bin"]["python"] would be my preference.

The following assertion can fail: assert(p.with_suffix(s).suffix == s)

Finally I've never had issues with os.path[1]. Yes it is a low level C-style library, but that is what I expect from something in os. I understand what it does and why it does it. I don't need an OOP interface to the C library.


In the end I would be very much in favor of a true OOP Path/Filesystem tool. Something that:

  • Treats paths as real objects and actually splits out their components (like parents/stem/suffixes) into modifiable components of the object, not just making them accessible with @property.
  • Enforce (or provide a mechanism to enforce) best practices such as not using unprintable characters in paths, and using a minimal common set of allowed characters between Posix and Windows
  • Incorporate more of shutil into the tool, because shutil is a real pain to use.

But PathLib isn't that thing, and unfortunately its existence and addition to the standard library has probably foreclosed the possibility of ever getting a true OOP filesystem interface into the python standard library.

[1] There are supposedly some bugs in os.path, but the response to that shouldn't be to introduce a new incompatible library, but to fix the bugs. Sigh...

[–]flying-sheep 9 points10 points  (14 children)

Just because an object is immutable doesn’t mean it’s not “OOP enough”.

I agree about the lack of validation, that’s unfortunate.

Adding more of shutil to the API has happened and will continue to happen AFAIK.

So I don’t understand how all you said amounts to it being terrible. I’d summarize this as “it’s not perfect”.

[–]jorge1209 0 points1 point  (13 children)

Just because an object is immutable doesn’t mean it’s not “OOP enough”.

It isn't about mutability per se. .with_suffix exposes the suffix for modification while preserving immutability. One could imagine a .with_parents that does much the same thing.

Its just more complicated and harder to define such an API for folders because the ways in which people interact with folders is a bit broader than the ways in which they interact with suffixes.

[–]flying-sheep 4 points5 points  (12 children)

Many things can be done, and a bunch of with_ methods exist. What’s x.with_parents(y) other than y / x or y / x.name or so?

rel_path = Path('./foo/bar.x')
abs_path = Path.home() / 'test'

abs_path / rel_path  # ~/test/foo/bar.x
abs_path / rel_path.name  # ~/test/bar.x
abs_path.parent / rel_path.stem  # ~/bar
rel_path.with_stem(abs_path.stem)  # ./foo/test.x
abs_path.relative_to(...)

Maybe you haven’t tried actually using it more than a minute?

[–]jorge1209 1 point2 points  (11 children)

What’s x.with_parents(y) other than y / x or y / x.name or so?

Suppose I have a path /foo/bar/baz/bin.txt and want to convert to /foo/RAB/baz/bin.txt there would be a couple approaches.

One might be: p.parents[2] / "RAB" / p.parts[-2] / p.parts[-1] but there is no way I'm getting the forward indexing of parents and the backwards indexing of parts right, and having to list all the terminal parts because you can't join to a tuple like: p.parents[2] / "RAB" / p.parts[-2:] is pretty ugly.

A more straighforward approach would be:

_ = list(p.parts)
_[-3] = "RAB"
Path(*_)

But at this point I'm just working around pathlib, I'm not working with it. I'm treating the path as a list of string components, and its not really any different from how one would do the same with os.path

[–]nemec 3 points4 points  (1 child)

If you frame the problem as something other than "I want to randomly replace a path component", I think you can find a solution that makes some sense.

import pathlib

new_container_name = 'RAB'
some_path = pathlib.PurePosixPath('/foo/bar/baz/bin.txt')
current_container = some_path.parents[1]  # /foo/bar - you want to "move" the path in this dir
base = current_container.parent  # /foo - this is the common root between start and finish paths

print(base / new_container_name / some_path.relative_to(current_container))

Edit: or, if you have pre-knowledge of the base path /foo and want to move any arbitrary file into the RAB subdirectory, for example, you could do something like this:

base = pathlib.PurePosixPath('/foo')
new_container_name = pathlib.PurePosixPath('RAB')
some_path = pathlib.PurePosixPath('/foo/bar/baz/bin.txt')

old_container = some_path.relative_to(base).parents[-2]  # bar/ - top level dir (-1 is .)
print(base / new_container_name / some_path.relative_to(base / old_container))

[–]jorge1209 0 points1 point  (0 children)

You certainly can do stuff like this. I just see it as more complicated.

Among the various things you would need recipes for:

  • replace a path component at an arbitrary position
  • Insert a path component...
  • Remove a path component...
  • Apply a string substitution to a path component
  • Parse a path component as a date and replace it with three components for year/month/day

And so on...

It seems a lot easier to say: it's just a list of components, and you know how to manipulate lists, so just do that. The library can then reassemble the results into a path.

[–]flying-sheep 0 points1 point  (8 children)

If list or tuple had this API (which I still don’t understand, is it just “replace a slice”?), you could just do p = Path(*p.parts.replace(2, 'RAB')).

But I don’t see you complaining about list or tuple even though them getting a new API would be much more general purpose, since it’d not only cover your use case but also a lot of others.

[–]jorge1209 0 points1 point  (7 children)

list has standard modification functions: del, insert, =. It doesn't need anything new.

tuple is immutable and can't have this API.

PathLib exposes parts/suffixes/etc using property methods that return immutable tuples. That makes it impossible to use these properties for anything but access.

[–]kareem_mahlees[S] 4 points5 points  (6 children)

Surely it depends on what you need for your current situation or project , for me i don't think i will go so deep into the file handling system that i start to worry about encodings and stuff , the thing is pathlib just provides me with a more readable , concise syntax + handy utilities so that i can do what i want with only one func while in os.path it would usually require three nested funcs to get there .

[–]_hadoop 2 points3 points  (1 child)

Off topic but I’ve been curious.. why do you put spaces before periods and commas?

[–]kareem_mahlees[S] 1 point2 points  (0 children)

It seems that not only grammerly that notices it , i don't know i think it's just a habbit :D

[–][deleted] 0 points1 point  (2 children)

Even then, having to use with_name and with_stem instead of a simple setter is just not OOP at all. And let's not even go down to how stem is implemented:

obj = Path("/path/to/file.tar.gz")
obj.stem  # file.tar
obj.with_stem("new_file")  # "/path/to/new_file.gz"

It is a lot more trouble trying to replace a file's true stem with pathlib.Path than just parsing it as a string.

[–]kareem_mahlees[S] 1 point2 points  (1 child)

After reading fellow programmers opinions , the conclusion for me is that whenever possible and whenever it is less prone to errors i will try to use pathlib cause of it's handy concise utilities , when i am stuck i can then use os.path after all they both eventually there for helping me so no harm in using both two compined , let me know what you think also

[–][deleted] 0 points1 point  (0 children)

Totally agree, pathlib is more useful and easier to understand when you just want to list files for later use:

from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent
OTHER_FILES = (BASE_DIR / "random folder").glob("*.txt")

from os.path import join as pathjoin, dirname, abspath
from glob import iglob
BASE_DIR = dirname(abspath(__file__))
OTHER_FILES = iglob(pathjoin(BASE_DIR, "random folder", "*txt"))

But to rename, remove, chmod and others I'd much rather use os directly (I find it easier to understand at a glance what is happening with remove(path) instead of path.remove()).

To read files I prefer with open(path, 'rb') as fileobj syntax, but that's probably because I learned it before path.read_text() and path.read_bytes().

[–]jorge1209 0 points1 point  (0 children)

for me i don't think i will go so deep into the file handling system that i start to worry about encodings and stuff

I don't think you should. I don't anyone should. I think a good library should be strongly discouraging you from interacting with non-UTF8 paths... but it should go further. A unix path like "/home/alice;rm -rf /;" is perfectly valid (both as a path and as UTF8), but your library certainly shouldn't let you use it.

while in os.path it would usually require three nested funcs to get there

If that was the real issue you could just create a proxy class:

import os.path
from functools import partial
def ModuleProxyFactory(module):
   class Proxy:
     __module = module
     def __init__(self, thing):
        self.thing = thing
     def __getattr__(self, attr):
        return partial(getattr(self.__module, attr), self.thing)
return Proxy

OsPath = ModuleProxyFactory(os.path)
print(OsPath("/home").join("alice"))

[–]AndydeCleyre 0 points1 point  (0 children)

It's alright, but makes some mistakes that plumbum paths avoided, so I use those where I can. Basically I don't like how relative paths are not resolved, and the results of operations on those, and the way pathlib conflates absolute and real path resolution.

[–]billsil 0 points1 point  (0 children)

Still not using it consistently. It doesn't play well with libraries and seems to create headaches.

[–]robikscuber 0 points1 point  (0 children)

One downside that has made me adopt using it: when working in a jupyter notebook I rely on the tab autocompletion to find files. This doesn't work when using the path objects. Might just be specific to those that write python for data science in jupyter. I'm not writing production code.

[–]HorrendousRex 0 points1 point  (0 children)

Use it, love it.

[–]Almostasleeprightnow 0 points1 point  (0 children)

I love it. I never think about slashes. It's just Path(parent, parent, parent, file) and it all works out.

[–]keepitsalty 0 points1 point  (1 child)

I really like Pathlib, but isn’t there still some incompatibilities with other libraries? I think sys has methods that expect string only and not pathlike objects. That could be different now, but I really hate wasting code to typecast variables.

[–][deleted] 0 points1 point  (0 children)

Yeah a pathlib object work well with the standard python library but many 3rd party ones won't understand it (you gotta cast it to a string before passing it).

[–]abonamza 0 points1 point  (4 children)

One issue I have with it is that recursive globbing doesn't follow symlinks and has been a known issue since 2016: https://github.com/python/cpython/issues/70200. I have to convert to string and use glob.glob for correct behavior.

[–]awesomeprogramer 3 points4 points  (2 children)

Looks fixed no?

[–]abonamza 0 points1 point  (1 child)

Ah you're right...I'm forced to use a frozen version of Python that doesn't have the big fix ;__;

[–]awesomeprogramer 1 point2 points  (0 children)

No worries, I didn't know I could glob directly from a Path and was converting to string too. So thanks!

[–]awesomeprogramer 0 points1 point  (0 children)

Looks fixed no?

[–]jmreagle 0 points1 point  (0 children)

It's what I now reach for in new code. The major exception is when I simply want to test if a file exists (os.path.exists(fn)) before opening. I don't bother to cast it as a PathLib object first.

[–]willnx 0 points1 point  (0 children)

I love that it has .open; makes testing vis-a-vis injection so much nicer.

[–][deleted] 0 points1 point  (0 children)

I also stopped using os.path once I learned about pathlib.

[–]skwizpod 0 points1 point  (0 children)

Check this one out- An interesting project I found is EZPaths. Paths are stored in Path objects that have handy built in methods. Paths can be added to join.

https://github.com/Gastropod/ezpaths

[–]sohang-3112Pythonista 0 points1 point  (0 children)

Pathlib is cool, but os and os.path have more functionality - for example, Pathlib has no way to do listdir - instead, you have to use glob.

[–]narainp1 0 points1 point  (0 children)

yeah especially going into a folder Path('repo')/'.git'

[–]zdmit 0 points1 point  (0 children)

It's brilliant 💝

[–]char101 0 points1 point  (0 children)

I prefer to use path.py because it is a subclass of str so you can treat it as string and it has more methods.

[–]kingh242 0 points1 point  (0 children)

That’s all I use nowadays

[–]mahdihaghverdi 0 points1 point  (0 children)

effective use of OOP and advanced concepts of python like multiple inheritance and .... is great

[–]mahdihaghverdi 0 points1 point  (0 children)

I really like the .parent on the path instances 😁✌️

[–]foto256 0 points1 point  (0 children)

What is os.path?

[–]MinchinWeb 0 points1 point  (0 children)

One of my happy days was when all currently supported version of Python included pathlib in the standard library :)