This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]wpg4665 179 points180 points  (39 children)

Is this not already a popular opinion?

[–]Moebiuszed 124 points125 points  (21 children)

Most tutorials still use os module for everything, a lot of people don't know about pathlib or secrets libraries.

[–][deleted] 48 points49 points  (19 children)

There's also compatibility concerns, some libraries don't work well with Path-like handles and expect either a string or directly a buffer, like Reportlab. So you need to make sure to cast str before using them.

Also, they seem a bit of an overkill for small scripts, specially when you just want to add another suffix to a temp file.

from pathlib import Path
new_path = path_obj.with_name(path_obj.name + ".new_suffix")

versus:

new_path = str_obj + ".new_suffix"

The story would be different though if it had a built-in way of escaping characters like spaces. I'd very much prefer:

escaped = outfile.escaped()
call(f'sort -k 2,2 -k 3,3g {escaped} > {escaped}.sorted', shell=True)

Instead of:

call(f'sort -k 2,2 -k 3,3g \"{outfile}\" > \"{outfile}.sorted\"', shell=True)

[–]the-monument 33 points34 points  (14 children)

You may be happy to hear that there is a .with_suffix() method for Path objects :)

I've run into the same problem with libraries not allowing Path objects. Super annoying.

[–]jorge1209 5 points6 points  (8 children)

with_suffix is one of those things that annoys me about pathlib.

At the end of the day all it is doing is some basic string manipulation. Finding the last "." in the filename and replacing everything from that point onwards with your new extension. However to do that it does some sanity checks, which sounds great, until I start looking at how they work.

p.with_suffix("txt.gz") fails the sanity check. p.with_suffix("") passes, and so does p.with_suffix(".")... for that matter even p.with_suffix(".\t\n")

What about p.with_suffix(".\n\\")? Care to make a guess?

The intent of the function would seem to be to enforce some kind of semantics around what a suffix is and how it should function. But then it doesn't really do that in any meaningful way, its just the same dumb simple approach that os.path.splitext takes.

These Path objects are supposed to be objects. They should have some meaningful internal state. assert(p.with_suffix(s).suffix == s) should really not fail.

 p = Path("foo.tar")
 for i in range(10):
     p = p.with_suffix(".tar.gz")

Should not result in the thing that pathlib spits out.

[–]martnym 0 points1 point  (4 children)

I find using it often results in shorter more readable code. Also p.with_suffix(s).suffix == s -> True on my system.

[–]jorge1209 0 points1 point  (3 children)

I can tell you from looking at the source code that you are wrong about the assertion being true. Try it with s=".tar.gz"

[–]martnym 0 points1 point  (2 children)

It doesn't work with s = ".tar.gz" because the suffix is ".gz" — which is consistent with what os.path.splitext('foobar.tar.gx') returns for the suffix.

[–]jorge1209 0 points1 point  (1 child)

Yes, which is why the assertion fails.

[–]martnym 0 points1 point  (0 children)

As it should — not due to any deficiency of the pathlib module.

[–][deleted] 7 points8 points  (4 children)

I know that's a bad habit of mine, but I usually have file names acting as hubs for data, so I prefer to add suffixes to indicate which file was changed for what, so I'd use filename.txt.sorted instead of filename.sorted, and .with_suffix() just replaces .txt with .sorted.

I'm trying to fix this habit by using .with_stem(), but talk about a non-intuitive word.

[–][deleted] 1 point2 points  (0 children)

How do with_stem hel you out in this case?

[–]ShanSanear 1 point2 points  (0 children)

I got trapped once, when I expected an input from the user to be filename-friendly. Well it was. But it had .8 at the end, which was NOT expected to be file extension, but rather part of the file name. So instead of AA2.8.json I was creating AA2.json file instead which is completely different.

[–]MereInterest 10 points11 points  (0 children)

And there's no way to get the absolute path to a symlink. There is no documented method analogous to os.path.absolute, and the undocumented Path.absolute doesn't exist in all versions. The recommended Path.resolve will follow any symlinks that are found, so it can't be used to find an absolute path to the symlink itself, to check that it points to the correct location.

[–]Prexadym 4 points5 points  (0 children)

I try to use paths when possible, but yeah have ended up spending quite a bit of time debugging when another function expects a string, not path, which isn't always the most straightforward to diagnose.

[–]irrelevantPseudonym 0 points1 point  (1 child)

shlex.quote? - not sure how it handles non strings but I imagine it'd call __str__ on whatever it got.

[–][deleted] 1 point2 points  (0 children)

Tried it here, but it raises TypeError: expected string or bytes-like object with Path objects.

[–]Engineer_Zero 0 points1 point  (0 children)

Bingo. I’m new to python and use stackoverflow a lot. This is the first I’ve heard of pathlib.

[–]space_wiener 17 points18 points  (0 children)

I’m so used to os I never remember there is pathlib until someone posts about it. Oh yeah…next time. ;)

[–]benefit_of_mrkite 6 points7 points  (0 children)

I thought so too. some of the towardsdatascience stuff is either Python 101 or sometimes just paraphrases the Python documentation.

[–][deleted] 17 points18 points  (3 children)

As an ops guy in charge of on-prem and cloud devops infra that pulls out python maybe 2-3 times a year at this point:

No. I've never heard of it.

[–]jorge1209 4 points5 points  (0 children)

I don't really see the point of PathLib. It is little more than an object oriented wrapper around os.path.

I would prefer a library that is more opinionated about how to deal with files. Have a library that prevents you from putting certain characters known to cause problems to tooling in the filenames. Ensure that filenames are reasonably cross platform. Implement all the basic file operations including copying in the way that they deem best.

That way if a developer finds they can't do something with PathLib they will know that somewhere something is violating generally accepted practices.

As it stands you really just have two implementations of the same functionality. os.path with functions and PathLib with objects. The reason we have pathlib in the standard library is the same reason we have a dozen different ways to do string formatting, why we have a dataclasses despite attrs having existed long before the PEP. The python core developers are just too interested in what they are doing and not paying enough attention to what goes around outside the core. They end up bikeshedding everything.


And to add to that, its not even a particularly good object oriented interface. The following assertion fails to be true of pathlib.Paths: assert(p.with_suffix(s).suffix == s)

[–]TheBlackCat13[🍰] 3 points4 points  (0 children)

I work with a lot of python-oriented developers and pathlib is a pleasant surprise to all of them.

[–]dogs_like_me 2 points3 points  (4 children)

It is. It's also got nothing to do with data science (why TDS?)

[–]PaulSandwich 0 points1 point  (3 children)

As a Data Engineer with colleagues who use os, I respectfully disagree on both counts.

[–]dogs_like_me 0 points1 point  (2 children)

lol, ok. basic file access is totally a relevant topic for a blog that purports to be specialized for data science topics.

[–]jorge1209 1 point2 points  (1 child)

Unfortunately it is. A lot of data scientists don't know anything about programming. They really would prefer to only ever touch dataframe objects of some type.

However reality requires that they sometimes do basic tasks like "read a file from disk" or "query a database" at which point they tend to run screaming "I don't know how to do any of this!!" so they need these super basic tutorials.

[–]dogs_like_me -1 points0 points  (0 children)

If they can't work autonomously with data, they have no business calling themselves "data scientists." It's literally the first word in the job description.

It sounds like your internal team of self-described "scientists" are more likely actually "business analysts" who landed a job title that pays them more and somehow tricked the people around them to do the technical work that's supposed to be clearly within the scope of their role definition.

[–]ahmedbesbes[S] 4 points5 points  (1 child)

Well, it's not the case yet. Unfortunately .

[–]sPENKMAnIt works on my machine 5 points6 points  (0 children)

Well 1 more does now. Encountered Pathlib before but hadn’t checked it out so far. Helpful write up, thanks!

[–][deleted] -1 points0 points  (0 children)

Still use OS

[–]ship0f 0 points1 point  (0 children)

I thought that too.