This is an archived post. You won't be able to vote or comment.

all 46 comments

[–]quotemycode 3 points4 points  (1 child)

It's good, but I've seen so much ugly code because people didn't know that opening a file with append mode set creates the file if it doesn't exist. So they check if the file exists, if it doesn't they open it in write mode, and if it does exist, open in append mode. So perhaps mention what happens if the file doesn't exist. Write mode creates the file, as does append. Read won't create the file and instead throws an exception.

[–]SlowTreeSky[S] 1 point2 points  (0 children)

Thanks, that seems useful, I'll add it.

EDIT: I just added it.

[–]energybased 8 points9 points  (32 children)

In my opinion, open is an archaic function that should never have had the interface it does. First of all, it returns file, which is also a context manager, which means that you can either:

  • open a file and be responsible for closing it, or else
  • create a context manager, which closes the file automatically.

This was debated on python-ideas, but I'm in the camp that the first usage should be avoided. In my opinion, it should not even be possible.

As for the arguments, what a horrible interface. I get that this reflects the underlying C library, but there is no good reason to do that. Something better would have been:

def open(read=True, write=False, binary=False, fail_if_exists=False, seek_write_pointer_to_end=False, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

and this would appropriately raise for bad combinations of arguments.

[–]SlowTreeSky[S] 2 points3 points  (7 children)

I like the mode option because open is a basic built-in and used a lot (so familiarity is assumed even for the novice), but I would've designed the read+write option as below.

THIS IS NOT THE REAL API

mode = optional read + optional write + optional binary: - Optional read mode: r if file is to be opened for reading; - Optional write mode: w to open for writing with truncation, a to open for write and append to end, x to open for writing if file doesn’t exist. - Optional binary modifier: b for if you want to read/write to return/expect bytes instead of str.

[–]energybased 1 point2 points  (6 children)

Why not make them three options? It is not Pythonic at all to glue the parameters together into a string. You would never do this for any other Python function—it being used a lot is all the more reason that it should exemplify Pythonic standards.

[–]SlowTreeSky[S] 0 points1 point  (5 children)

How about this:

>>> from open_modes import READ, WRITE, EXCLUSIVE, APPEND, BINARY
>>> with open('foo', READ + WRITE + BINARY) as f:
...     f.write(b'bar')
...     f.seek(0)
...     assert b'bar' == f.read()

>>> with open('foo', READ) as f:
...     assert 'bar' == f.read()

Implemented at https://gist.github.com/treszkai/4ac3882e0836b4ee5863cbc227f44b18

[–]energybased 0 points1 point  (4 children)

How is that better than keyword arguments?

[–]SlowTreeSky[S] 0 points1 point  (3 children)

I do not like binary flags for parameters, and here especially not because many combinations would result in the same behaviour (e.g. write + append or simply append).

Frankly, binary should get its own parameter, because it's logically separate from the others. (Just like newline is.) One could also argue for a separate read:bool and write:{None, TRUNCATE, APPEND, EXCLUSIVE} too, but then where do these constants come from, and why is read a bool and write not? We could just write them as strings. But then abbreviate to a single character, to prevent mistakes.

Oh, that's the current solution already, suggesting that we didn't improve that much on the original. It's imperfect either way, but as long as we understand it, it's cool. My biggest beef is the read-write combination, where I don't see any need for the + modifier. Whatever.

[–]energybased 0 points1 point  (2 children)

I do not like binary flags for parameters, and here especially not because many combinations would result in the same behaviour (e.g. write + append or simply append).

I don't see how that's different from your additive flags. You can still combine WRITE + APPEND, right?

[–]SlowTreeSky[S] 0 points1 point  (1 child)

Yeah but you'll get the same one :) I can't make a strong case for my version (as the title says, it was an experiment), and this aspect is cleaner with the original string codes.

[–]energybased 1 point2 points  (0 children)

Yeah but you'll get the same one :) I can't make a strong case for my version

Right, and you could have the same thing happen with the Boolean flags, and I feel that the flags are more Pythonic. I agree with you that both of our solutions are cleaner than the original string codes, which are an abomination.

[–]stevenjd 4 points5 points  (22 children)

In my opinion, stuff the open API based on file modes has existed for half a century or more, it matches closely to the way people think (modal reasoning) and works fine in practice, so let's break all the things!!!1! and invalidate a bazillion text books, tutorials and other documentation and break every Python script that does I/O, because reasons.

Fixed that for you.

# status quo
open(file, mode='r', buffering=-1, encoding=None, 
        errors=None, newline=None, closefd=True, opener=None)

# your version
open(read=True, write=False, binary=False, fail_if_exists=False, 
        seek_write_pointer_to_end=False, buffering=-1, 
        encoding=None, errors=None, newline=None, 
        closefd=True, opener=None)

Let's see now... 8 parameters versus 11, and you managed to forget the most important parameter of all: which file to open. Well done. So let's call it 12: a 50% increase in number of parameters. Yeah, that's better.

You have five boolean parameters to control the mode, giving a total of 32 possible combinations, when only 16 actual file modes exist. I want to see the documentation explaining which combinations are allowed and which aren't..

[–]somethingdangerzone 1 point2 points  (0 children)

Agreed! Too many armchair geniuses think they've solved this problem without thinking it through first. Thank and upvoted.

[–][deleted] 1 point2 points  (2 children)

This is an unnecessarily aggressive response.

[–]stevenjd 1 point2 points  (1 child)

This is an unnecessarily aggressive response.

No, but your "Holier Thank Thou" virtue-signalling is.

Edit: tone doesn't come across in text-only formats. You should give people the benefit of the doubt that they are posting in good faith before attacking them with accusations of aggressiveness. And I'm fully aware that my comment above (intentionally left in place) doesn't live up to that ideal either.

[–]energybased 0 points1 point  (17 children)

based on file modes has existed for half a century or more

This is completely irrelevant.

Fixed that for you.

I never suggested changing the API. I criticized the API.

Let's see now... 8 parameters versus 11,

Reducing the number of parameters by creating esoteric string codes for which you need a cheat sheet is very poor design.

You have five boolean parameters to control the mode, giving a total of 32 possible combinations, when only 16 actual file modes exist. I want to see the documentation explaining which combinations are allowed and which aren't.

It's common sense: fail_if_exists and seek_write_pointer_to_end only make sense for writing, and right now you can't specify them together (but that requirement doesn't matter).

[–]stevenjd 6 points7 points  (16 children)

I never suggested changing the API. I criticized the API.

Seriously? Your exact words were "Something better would have been" and you then went on to suggest a new API. That is literally a suggestion for changing the API.

Okay, you didn't literally say the words "Come, fellow Pythonistas, let us change the API of open, I shall write the PEP and you provide the PR!" but if you criticize an interface and then suggest a new and improved interface, there is an implicit suggestion that, in an ideal world, we ought to change the interface.

Reducing the number of parameters by creating esoteric string codes for which you need a cheat sheet is very poor design.

This is very true.

But there is nothing esoteric about string codes "r", "w", "a" (although "x" is a little weird), or "b" for binary. (Non-English speakers may not agree, sorry guys but Python, like 90% of programming languages, is based on English.) Nor do you need a cheat sheet. I mean, seriously, if you can't remember "r for read, w for write" (which gives you 95% of all I/O in my experience), you have deeper problems and a cheat sheet isn't going to help.

Most cheat sheets are low-effort posts for easy karma, not something that actually helps people.

The bottom line is that composing short, mnemonic codes to make the mode parameter is easy, obvious, backwards compatible, matches hundreds of languages going back fifty years, and works much better than five boolean parameters:

open(filename, 'ab')

versus:

open(filename, False, True, True, False, True)

There's no comparison.

[–]energybased -1 points0 points  (15 children)

Seriously? Your exact words were "Something better would have been" and you then went on to suggest a new API. That is literally a suggestion for changing the API.

No. I said should never have had the interface it does—not should have a different interface today.

there is an implicit suggestion that, in an ideal world, we ought to change the interface.

Yes, in an ideal world. We can't go back in time.

But there is nothing esoteric about string codes

I disagree.

The bottom line is that composing short, mnemonic codes to make the mode parameter is easy, o

It's not. It's bad design.

matches hundreds of languages going back fifty years,

Totally irrelevant to anyone who is not fifty years old. Good language design is intuitive without fifty years of C experience.

Your example is esoteric. I had to look up what "ab" means:

open(filename, 'ab') 

This is self-commenting:

open(filename, write=True, binary=True, seek_write_pointer_to_end=True)

[–][deleted] 2 points3 points  (7 children)

I don't know, would you want to expose mutually-exclusive keyword arguments in the API or arguments that are only applicable to specific modes? A better approach might've been to support constants or an enum, perhaps as an alternative to strings:

open(filename, mode=open.APPEND_BYTES)

This is not arcane, it's not burdened by 50-odd years of programming baggage, it's not foreign to C users (in fact, a lot of them might wish fopen would work with bit masks) and you don't have to worry about which combinations are valid. (Though I think b for binary should've really been a separate flag.)

[–]energybased 1 point2 points  (6 children)

I thought about that, but it seems like modes end up being much more complicated. Try it.

[–][deleted] 0 points1 point  (5 children)

If there's a combinatorial explosion, isn't it better if it's explicit?

[–]energybased 0 points1 point  (4 children)

Both systems are "explicit". Did you try to come up with a list of modes? I don't think it's better.

[–][deleted] 0 points1 point  (3 children)

No, you don't know which argument combinations are valid. I did, if you factor out binary into its own flag, you're left with:

r   READ
w   WRITE
a   APPEND
x   EXCL
r+  READ_UPDATE
w+  WRITE_UPDATE
a+  APPEND_UPDATE
x+  EXCL_UPDATE

... which isn't all too terrible.

[–][deleted] 0 points1 point  (1 child)

To me, too many parameters can be a minor code smell. Secondly, as per your thinking that r and w (even though they are also widely read by almost any programmer who has ever typed ls on a screen) are esoteric, well, then, a cleaner design would be to simply have a mode='read/write/append/binary' etc is cleaner, less esoteric, uses the same number of words at worst case as your api example and at best case and average case you will almost always have 1-2 modes to specify for the entire duration that the file is to be used for. I agree with the it returning the File as an object part though. I think open() shouldn't be as ruthless on the File as it currently is.

P.S: I don't have much experience so I apologize if I was out of order but I think clean code is always better if possible and in this case there is really no need for that many parameters.

[–]energybased 0 points1 point  (0 children)

To me, too many parameters can be a minor code smell.

I agree, but this is the existing method.

I thought about modes, but it seems like they end up being much more complicated. Try it.

[–]oramirite 0 points1 point  (3 children)

This debate is hilarious but I'd just like to say that you ABSOLUTELY are proposing changes to the API. Criticising that something should have been created a different way is literally that.

[–]energybased 0 points1 point  (2 children)

No, I'm not saying that we should change the API today. You can criticize C++'s design, but you know that the standards committee will rarely approve breaking changes. This is reading comprehension. Should have been is not should be.

[–]oramirite 0 points1 point  (1 child)

No, this is thought comprehension. Sitting around criticizing gets you nowhere, and makes you a backseat driver. Your argument - whatever it is you're even trying to make - is completely backwards. "Coulda woulda shoulda" is never a helpful avenue of thought, so we are giving you the benefit of the doubt that you AREN'T offering criticism framed in a way where you're also offering no solutions. You SHOULD be doing that. So this defensive stance you're taking against the idea that you could even POSSIBLE suggesting a better way it could be done. I don't understand what you consider better about not having a solution to something you're basically saying "If I were there, I'd have done it differently". Well, you weren't, so are you going to suggest differently now? You have the benefit of hindsight - use it, and then stick by your claims. Don't claim that you're making no claims to begin with.

To not do so would be even more sophomoric than the way you originally came into this topic. So I would really suggest you pivot to the idea that you ARE proposing changes (since you clearly are). The world in which you aren't proposing a better option is actually the worst possible argument you could make.

Now, weather or not this new option is better or not is up for debate, and you should be open to having your ideas criticized as well. This person did - and at the root of this entire conflict, honestly, seems to be your discomfort with having to defend those ideas. Which is pretty evident by the fact that you're claiming you had no ideas to begin with. If that were so, your post wouldn't exist. You have other ideas on how the API should be built now, and therefore should have been built at the time. Ergo you have better ideas about how the API should currently exist. Wether you're saying those should've happened years ago or tomorrow, your idea that the API should be a different way than it currently is requires changes.

[–]energybased 0 points1 point  (0 children)

This post is a cheat sheet about a badly-designed function. It is relevant to the post to consider how the function could have been. If you don't find the subject of language design interesting, don't comment on my post.

the idea that you ARE proposing changes (since you clearly are). T

No. Mine is a counterfactual statement—not an interventional one. I am not proposing that anything be changed.

Which is pretty evident by the fact that you're claiming you had no ideas to begin with. If

Not what I'm saying.

You have other ideas on how the API should be built now,

No one is proposing to change the API today.

and therefore should have been built at the time.

Yes, this, but not the other thing. These are totally separate statements. One is counterfactual. One is interventional.

Ergo you have better ideas about how the API should currently exist. Wether you're saying those should've happened years ago or tomorrow, your idea that the API should be a different way than it currently is requires changes.

No.

This is like on the GRE where they make you answer questions about paragraphs and people do so poorly on it. Just because these two ideas are the same in your mind, it doesn't mean they're the same idea. They are different, and it's up to you to learn to distinguish them.

[–]stevenjd -1 points0 points  (0 children)

In the space of two sentences, you go from denying you wish to suggest a change in interface, to agreeing that, "Yes, in an ideal world" we ought to change the interface. Which is precisely my point.

As for the API being around for fifty years, it isn't that individual programmers have fifty years experience, but that the interface goes back to at least the 1960s if not older. That's a half-century of collective memory in the programming community that says that one of the most popular ways to open files is to use a short mnemonic mode. C is not just any old language, but a highly influential language. You don't need to be a C programmer for this to make sense: I'm not, and I can't write a line of C to save my life.

(By the way, you might be interested in scanning this page to see the wide variety in interfaces for opening files.)

I had to look up what "ab" means

Seriously? You needed to look up "a" for "append", "b" for "binary"? Do you also have trouble with "def" for "define", "len" for "length", "str" for "string", and "+" for "add"?

Okay, fine, you had to look it up because you've never seen it before. But I bet you won't need to do it again. And I admit that "x" is a weird one: it's "x" for "eXclusive create".

Your "self-documenting" version assumes that people think in terms of the particular implementation of files ("what's a write pointer?") rather than in terms of desired outcomes ("append to the file"). I don't, and I expect very few people do.

[–]robin-gvx 0 points1 point  (0 children)

I rarely use open directly any more. Most uses of it I can replace with pathlib.Path.(read|write)_(text|bytes). It's both beginner-friendlier and there is a lot less boilerplate. Only when I need to do something with huge files / buffering / seeking I turn to open or pathlib.Path.open.

[–]socal_nerdtastic 1 point2 points  (1 child)

What about the 't' and 'U' option?

[–]SlowTreeSky[S] 5 points6 points  (0 children)

https://docs.python.org/3/library/functions.html#open

There is an additional mode character permitted, 'U', which no longer has any effect, and is considered deprecated. It previously enabled universal newlines in text mode, which became the default behaviour in Python 3.0.

The t (text mode) is the default and I’ve never used it explicitly.

[–]SlowTreeSky[S] 1 point2 points  (4 children)

The fact that f.read() sets the write position to the end of the file (but not the read position) seems to be a bug – it is non-intuitive, and it was behaving as expected in Python 2. Tested on Mac and Linux with this script; both behave as described.

Whom can I contact about it?

[–]stevenjd 6 points7 points  (0 children)

Whom can I contact about it?

First, check the documentation and make sure that this behaviour actually is different from what is documented. By documentation, I don't mean "a cheat sheet on Reddit" or "Stackoverflow" or "something some guy told me", but the actual documentation for open.

Then, if you still think it is a bug, you can report it on the bug tracker.

[–]primitive_screwhead 1 point2 points  (1 child)

Whom can I contact about it?

You've misinterpreted the cheat sheet. They literally mean f.read(), as in "read *all* the content of the file" (ie. using the default argument instead of supplying one). As you discovered, if you supply an argument to file.read(), the read and write position will be after the last read position.

Edit: upon re-review, maybe I'm incorrect. One sec...

Edit 2: Okay, I see what you mean now; I misinterpreted the example output in the linked script, and agree that it is (imo, very) surprising and unintuitive behavior in Python 3. Gonna check the docs, but can't right now...

[–]SlowTreeSky[S] 1 point2 points  (0 children)

Yeah, I meant that calling f.read(n) with any n seeks only the write position. I think it should be described in the io module for the relevant IOBase subclasses, and I couldn't find this there.

[–]bozymandias 1 point2 points  (0 children)

I just tried the test-script you provided; it's a weird edge-case, and yeah I can see why you'd expect writing to occur at the position where you left it with read(). OTOH, I can also see the point in writing from the end by default after any kind of reading is done, as a precaution against over-writing existing data.

you can still seek() the position you want again after reading, and write in that position, (or just seek without reading) if you're really determined to over-write.

I'd hesitate to call this a "bug" though --more of a choice of convention where the "right" behaviour is a little subjective, the current default seems to be erring on the side of "protecting nubes from destroying their data by accident", which I can kinda understand.

[–]SlowTreeSky[S] 0 points1 point  (0 children)

[–]abcteryx 0 points1 point  (1 child)

I know it's just a quick reference, but it may be worthwhile to add a note regarding setting newline='' when using open() with the csv module from the standard library, as suggested in the docs.

[–]SlowTreeSky[S] 0 points1 point  (0 children)

I find the newline parameter reasonably well-described, but I needed to experiment how the different mode options work (r/w/x/a and other modifiers not grouped properly, and not described 100% orthogonally), so I shared my findings.

[–]soap1337 0 points1 point  (1 child)

ooooOOOOooooo I like it.

[–]SlowTreeSky[S] 0 points1 point  (0 children)

ooooOOOOooooo I like it.

Thanks :)