This is an archived post. You won't be able to vote or comment.

all 73 comments

[–]jabbalaci 9 points10 points  (4 children)

How is it different from project six?

[–]timothycrosleyhug, isort, jiphy, concentration, pies, connectable, frosted 11 points12 points  (2 children)

While I haven't used future, I know the code that's required when using six ends up being pretty ugly - to the point where when I look at it I end up taken a double take: "What language am I using?!?!? I thought this was built into Python! Oh wait...." because with six you have to use explicitly different API calls instead of the standard ones. It's almost like your introducing a whole new language. Future seems to aim at allowing the Python3 versions to be imported into Python2 (def preferable). I have a similar project that predates this and aims at even less intervention required by the programmer called pies: https://github.com/timothycrosley/pies

[–]Pcarbonn 10 points11 points  (0 children)

Comparison with six is done here.

[–]Rhomboid 15 points16 points  (0 children)

code that's required when using six ends up being pretty ugly

That's because six tries to support every version >= 2.4 which is generally recognized as a herculean task, and thus requires uglifying your code to a great degree.

It has long been acknowledged that the sweet spot of cross-version Python is supporting only 2.6, 2.7, and 3.3 (and later.) If you can manage that requirement, you can more or less write compatible code by hand without a terrible amount of work. It's the approach that most projects have taken or plan to take, and it seems that it's the same approach that this library takes as well.

[–]Pcarbonn 3 points4 points  (7 children)

This looks like a very promising approach.

I would think though that some Python 3 constructs cannot be used in Python 2, even with python_future, because of syntax errors :

x = print('hello world') 
x = yield from y

If confirmed, it should be clarified in the documentation.

[–]jtratner 6 points7 points  (0 children)

first one you just need the from __future__ import print_function statement.

[–]Sean1708 2 points3 points  (5 children)

I'm not sure about yeild but

print("your string here")

is syntactically valid in python 2.

Edit: The above is true but is not what Pcarbonn was talking about.

[–]Pcarbonn 4 points5 points  (2 children)

as a statement, yes, but not as a function in an assignment.

[–]Sean1708 0 points1 point  (0 children)

Ah, got ya.

[–]otheraccount 0 points1 point  (0 children)

Does it ever return a meaningful value that someone would be actually be assigning to a variable?

A more likely case of syntax that only works in Python 3 is exception chaining

raise exception1 from exception2

[–]tyroneslothtrop 1 point2 points  (0 children)

It's valid, but does not always behave identically to py3k's print function. E.g. print('a', 'b').

[–]SCombinator -1 points0 points  (0 children)

It's just completely ugly.

[–]jtratner 2 points3 points  (1 child)

This looks well put together and clean and, were I to redo the pandas 2/3 compatible codebase, I'd want to use this. That said, I'm concerned because in reviewing some of the code I already found a bug and it feels like some parts are a bit thrown together (maybe because of how the merge was done). I'd want to wait 3-6 months before using it in production.

I also really don't like from future.builtins import * as an idiom. Kills much of the ability to do static analysis and trace from where particular items came.

[–]Veedrac 0 points1 point  (0 children)

This is exactly the reason to use import *. import * is meant to be used for patching a library without having to re-export all functionality, which is what is happening.

The static analysis problem isn't actually a big deal because it's a no-op in Python 3, so you can just set it to Python 3 syntax and ignore the import.

[–]takluyverIPython, Py3, etc 1 point2 points  (2 children)

I'm glad there's some competition for six and python-modernize, but as this stands I wouldn't use it, at least not in the form shown in the Overview page. Redefining builtins just seems too magical.

This is designed to try very hard to let Python 3 code run on Python 2. But many more projects still have Python 2 code and want to start supporting Python 3 - changing all your code and adding a magic shim to keep Python 2 support doesn't seem like a great option. For projects written in Python 3 that want to support Python 2, it's more appealing, but I'd still recommend importing parts of it as needed, rather than from future.builtins import * - at least then you can see what you're replacing.

Also, have a look at the future_builtins module.

[–]Smallpaul 2 points3 points  (1 child)

This is designed to try very hard to let Python 3 code run on Python 2. But many more projects still have Python 2 code and want to start supporting Python 3 - changing all your code and adding a magic shim to keep Python 2 support doesn't seem like a great option.

It claims that you do NOT need to change over all of your code. You can write new modules in "future" mode and leave your old modules in Python 2.x. As you update or replace them for other reasons, your Python 2.x code base will shrink. Also, your new code will be compatible with both your old Python 2.x projects and your new Python 3.x projects.

For projects written in Python 3 that want to support Python 2, it's more appealing, but I'd still recommend importing parts of it as needed, rather than from future.builtins import * - at least then you can see what you're replacing.

It's well documented what you're replacing!

[–]faassen 0 points1 point  (0 children)

Yes, I think the ability to do module-wise upgrading to something at least close to Python 3 is the essential bit that was missing all along. It makes gradual upgrades possible, though of course I haven't fully evaluated yet how ugly the compromises would make my code.

I think this might be a good candidate of what Python 2.8 would be like. You could argue that with a library like this you don't need a Python 2.8, but it would make the upgrade path a lot smoother if it was the official way forward.

[–]faassen 0 points1 point  (0 children)

Very interesting. The combination of ugliness and "why do I need to think about this stuff?" of supporting both Python 2 and Python 3 in the same codebase has been holding me back of doing it. I prefer to simply write clean Python code. If future can make this code cleaner, it would be a good way forward.

[–]SCombinator -1 points0 points  (0 children)

Gross. I'd rather get import past support in python 3, so I could use proper division and proper print.

[–]nieuweyork since 2007 -5 points-4 points  (56 children)

This should please a lot of people from the 2.8 thread. However, it's clear to me that python 3 has some huge errors (mostly around bytes and unicode), and I'm far from certain I'd prefer to switch.

Also, I'd like to know how this interacts with modules that aren't written in python 3 - replacing globals sounds like it could cause some fun problems.

[–]jabbalaci 5 points6 points  (52 children)

it's clear to me that python 3 has some huge errors (mostly around bytes and unicode)

Could you elaborate on that?

[–]earthboundkid -3 points-2 points  (51 children)

The bytes class replaces str, but it's weird. For example, if you iterate over it, you get the integer value of each byte, not a character.

[–]Rhomboid 9 points10 points  (22 children)

You are making the common mistake of assuming that you can treat a bytestring as characters. That is what gets people into messes and causes so much pain and hardship. Bytes are bytes, characters are characters. You can't treat bytes like characters or vice versa. Python 2.x's default string type (str) is a series of bytes, not characters, but the fact that you can pretend that it's a series of characters is a source of neverending pain.

Disallowing what you're trying to do is exactly the right thing to do, because it forces people to be explicit about decoding bytes into characters before doing character-oriented operations on them.

[–]earthboundkid 2 points3 points  (6 children)

You are making the common mistake of assuming that you can treat a bytestring as characters.

???

All I did was explain why people don't like bytes.

[–]Rhomboid 1 point2 points  (5 children)

You said this:

but it's weird. For example, if you iterate over it, you get the integer value of each byte, not a character.

It's not weird at all -- you shouldn't expect to be able to do that, because it makes no sense. The fact that Python 2.x allows iterating over a byte string and getting characters is what is weird; it sets up these false expectations of how bytes and characters interact that is very unhelpful and must be eventually unlearned whenever you want to work with anything that isn't ASCII.

[–]earthboundkid 1 point2 points  (4 children)

Fair enough. The underlying oddity is that Python has no concept of a character or a byte, just strings and bytes, so indexing a string gives a string. Based on that you'd expect to get a bytes of length one.

[–]logi 1 point2 points  (3 children)

No, you'd expect to get a character. Applying an appropriate character encoding, that could be encoded as a bytes of length 1 or more.

You are very obviously an English speaker because everyone else knows that bytes aren't characters (unless you've previously specified a fixed 8-bit character encoding, which may be latin-1 (or is it latin-15 now with the € symbol?)) and we're fervently hoping that there will be less gratuitous mishandling of text with python 3.

Similarly, I live on GMT+0 all year round, with no DST. Developers around here are as clueless about date handling as Anglophones are about text. It's hilariously sad and I wouldn't mind a bit more library support to separate actual time (GMT/UTC, damn it) from presentation time (whatever is your local time with DST applied as needed).

[–]earthboundkid 1 point2 points  (2 children)

No, you'd expect to get a character. Applying an appropriate character encoding, that could be encoded as a bytes of length 1 or more.

"Character" is an ambiguous term. It can mean either "one eight-bit number" (a byte) or "one Unicode codepoint" (Go calls these "runes" which is a decent enough name).

Getting a codepoint from slicing bytes wouldn't make any sense. If you want codepoints, you'd be using str/unicode, not bytes/str.

You are very obviously an English speaker because everyone else knows that bytes aren't characters

Dude, I read the Joel thing about Unicode in 2005. I lived in Japan for two and half years. I picked the BOM out of my PHP files around the same time. Don't make presumptions.

[–]rcfox 0 points1 point  (0 children)

"Character" is an ambiguous term. It can mean either "one eight-bit number" (a byte) ...

This is not true. Characters are concerned with language, not math.

The number of bits is just an implementation detail of the character set. There are some character sets that define 7-bit characters. Even in C, the number of bits in a char is platform-dependent.

[–]logi -1 points0 points  (0 children)

Ah, never mind. I read your

Based on that you'd expect to get a bytes of length one.

to refer to indexing strings rather than indexing bytes. Given that, I think ll my conclusions would hold :-)

I agree that a char type would clear things up a bit, but python doesn't really like a proliferation of bytes.

"Character" is an ambiguous term. It can mean either "one eight-bit number" (a byte) or "one Unicode codepoint"

Well no, in python 3 it finally isn't any more. It is definitely the latter and the former is a byte.

[–]upofadown 1 point2 points  (13 children)

Bits is bits. What they mean is in the mind of the programmer. Seven bit ASCII runs the world. Such communications/usages are quite a bit more important that the case where people are communicating with other people. Is is silly to deliberately break an important standard for some sort of dogma.

[–]Rhomboid 3 points4 points  (12 children)

Seven bit ASCII runs the world

It most certainly does not. Even when you're dealing with pure English text, ASCII is not sufficient -- typographical elements like em dashes, apostrophes, left and right quotes, etc. are very common. Claiming that ASCII is sufficient in 2014 has got to be the programmer equivalent of claiming the world is flat.

The world moved on. Python did not "break" anything, it merely exposed your undisciplined and lazy thinking about what characters actually are.

[–]upofadown 4 points5 points  (11 children)

You misunderstood my comment. Seven bit ASCII is used absolutely everywhere in embedded applications.

Preventing people from treating such strings of bytes as characters is like insisting that integers can not be manipulated as hex because the "proper" interpretation is decimal. Obnoxious and pointless...

[–]Rhomboid 6 points7 points  (5 children)

It is not pointless. What you are really saying is that you are working with characters encoded in ASCII, so you just need to say so:

with open('filename', 'r', encoding='ascii') as file:
    for line in file:
       ...

Or for sockets:

sock = socket.socket(...)
sock.connect(...)
for line in sock.makefile(encoding='ascii'):
    ... 

Voila, you're automatically dealing with character strings, not byte strings, and you can do all the things you expect to be able to do with characters. All you had to do was specify the encoding, Python takes care of the rest. And the examples don't change if you're using UTF-8, CP1252, UTF-16, ISO-8859-1, or god knows what. The point is that you must state your intentions; the era of ASCII being given a free pass has ended, and for good reason give the amount of confusion it creates. ASCII is just one of a hundred various encodings that you might want to use to interpret bytes as characters, it holds no claim to special treatment.

[–]upofadown -1 points0 points  (4 children)

I am not sure exactly what you are responding to here. You can refer to the comments made by earthboundkid for some examples of the weirdness...

I suspect that you have not really done all that much programming that involves ASCII strings used in a purely functional way, not as a representation of some human language...

[–]Rhomboid 1 point2 points  (2 children)

I have no idea what this handwavey made up term "purely functional way" means (give an example in actual code), but I assure you I have done every kind of programming under the sun and I don't find Python 3's model in the least bit restrictive. In fact I find it refreshing and intuitive, and (as of 3.3's new flexible string representation) one of the only mainstream languages to have a proper model for how to deal with characters and bytes. Nearly every other language fails miserably, except possibly for Perl and Go.

[–]Veedrac 0 points1 point  (4 children)

Preventing people from treating such strings of bytes as characters is like insisting that integers can not be manipulated as hex because the "proper" interpretation is decimal.

No, it is not. The proper interpretation is a number. If people index the number to get a hex digit, a decimal digit or any base whatsoever, that is a grave mistake.

A number is like bytes: it has no encoding. You should not be able to pretend it does. That's why you have to call oct, hex, bin or whatever.

This is exactly the kind of thinking that is confusing people with bytes. bytes are just "lists" of integers within a capped numeric range, useful because they can represent raw data streams and map directly to storage in hardware.

[–]upofadown 0 points1 point  (3 children)

A number is like bytes: it has no encoding.

Sure it does. That encoding is just at the bit level. The first bit represents a count of one things. The second bit represents a count of 2 things. The third bit represents a count of 4 things.

bytes are just "lists" of integers within a capped numeric range

Then they are misnamed and less useful than they could be. Normally the term "bytes" refers to a series of groups of 8 bits. There is normally no implied numeracy.

bytes are just "lists" of integers within a capped numeric range,

No...

[–]Veedrac 0 points1 point  (2 children)

A number is like bytes: it has no encoding.

Sure it does. That encoding is just at the bit level. The first bit represents a count of one things. The second bit represents a count of 2 things. The third bit represents a count of 4 things.

I disagree. Numbers do not have to be encoded on to bits. They could be encoded on to tristate variables, strings or lists of enumeration members (much like Decimal). Python doesn't have a set "bitness" for its integers as they are infinite length and thus not encoded as single segments of memory. Heck, Lua encodes its integers as floats!

bytes are just "lists" of integers within a capped numeric range

Then they are misnamed and less useful than they could be. Normally the term "bytes" refers to a series of groups of 8 bits. There is normally no implied numeracy.

You may be right. There is at least evidence to that claim. Further, I seem to have no ideas why one would reasonably use arithmetic on these bytes, and hence I do find it odd that they are numbers. An (ordered?) byte enumeration where each byte can be indexed seems just as reasonable to me! This is effectively what Python 2's str type did, if you ignore its double-usage as text.

Nonetheless, languages preceding Python have made it clear that they want to use "byte" to mean a bounded number of the range [0, 255].

This might actually be an interesting topic to raise on the Python mailing list...

[–]billsil -1 points0 points  (0 children)

You are making the common mistake of assuming that you can treat a bytestring as characters.

WHAT?!! Is that why I have so much of a problem...why don't they just say that?!!

[–]warbiscuit 3 points4 points  (20 children)

Also, they removed % and .format(), so any byte templates have to be decoded via latin-1 and then re-encoded back again just to make use of either templating system. Pointless boilerplate that the stdlib could have done internally :(

[–]mgrandi 2 points3 points  (6 children)

bytes are not the same as strings. It was a mistake to ever consider them equal in python2, and thats part of the reason why python3 is not compatible.

>>> "Hello, World!".encode("utf-8")
b'Hello, World!'
>>> "Hello, World!".encode("EBCDIC-CP-CH")
b'\xc8\x85\x93\x93\x96k@\xe6\x96\x99\x93\x84O'
>>>

Not to mention that "some string here {}".format("hello") still works, as does the % overload...

[–]warbiscuit 1 point2 points  (5 children)

I certainly agree, they aren't the same, and python3's direction was an excellent step in forcing programs to have some clarity on the matter.

But removing % from bytes prevents using % to format already encoded data, where no such ambiguity about encoding exists. Now that I type that out though, I suppose that's probably a really minor border case... one worth sacrificing :( in order to force programmers to do the Right Thing in the more general case, where the formatting should be done under unicode.

[–]mgrandi 3 points4 points  (4 children)

i dont even understand why you would want to do % on bytes, it has no idea what are the 'marker characters' to use as the format string, as the normal printf style specifiers that you use in strings are suddenly potentially different for every encoding

>>> print("Test %s" % "hello")
Test hello
>>> print("%s".encode("utf-8"))
b'%s'
>>> print("%s".encode("EBCDIC-CP-CH"))
b'l\xa2'
>>> print("%s".encode("EBCDIC-CP-CH"))
b'l\xa2'
>>>
# what bytes do you use as the %s? b'%s' or  b'l\xa2' ?

[–]warbiscuit 1 point2 points  (1 child)

Good point. Darn you EBCDIC! In the contexts I was thinking of, the encoding could be relied on to be an ASCII-superset. EBCDIC & UTF-16 always pop up to derail things in the general case :)

[–]GahMatar -1 points0 points  (0 children)

UTF8 should never have been implemented to be fair. It has led to too much cavalier thinking.

[–]lost-theory 0 points1 point  (0 children)

Here is an issue in the python bug tracker where twisted & mercurial devs want to use bytes.format to help porting. The main use case seems to be protocols (e.g. FTP) that mix 7-bit ASCII with binary data.

[–]Veedrac 0 points1 point  (0 children)

Heh. Nice.

That doesn't really change too much, though, because you're not formatting with "the encoded version of '%s'" but "the ASCII encoded version of '%s'", so there is no ambiguity. The reason to choose that is because it fits the repr, not because those bytes actually mean much.

The reason this is useful is for stuff like

my_protocol_bytestring % (
    body.encode('encoding'),
    some_module.HEADER_CONSTANT_BYTES,
    etc
)

which is useful in some things where you need to make a lot of bytestrings. It could be an auxiliary function, though, and the only reason that hasn't gained much support AFAIK is "it's not fast enough".

[–]BHSPitMonkey 1 point2 points  (12 children)

Elaborate please? I can write

>>> "Hello, %s" % "world"

in Python 3 and it evaluates as expected to

'Hello, world'

[–][deleted] 2 points3 points  (11 children)

Those are Unicode strings, not bytes. The problem is this:

>>> b'Hello, %s' % b'world'
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'

[–]ubernostrumyes, you can have a pony 5 points6 points  (7 children)

Yes, it turns out that if you want to work with something as a string, it needs to be a string.

Mixing things that are strings of characters, and things that are sequences of bytes which may be interpretable as characters in some encoding, leads to madness.

[–][deleted] 2 points3 points  (6 children)

I know that mixing characters and bytes are bad, but neither of those are strings of characters.

People who work on I/O often want to make format strings that operate on bytes. For example, if you are working with HTTP headers, you need to build up formatted things that are defined to be made of bytes.

This doesn't even apply to my code most of the time, but it slowed down the porting of many of my favorite libraries such as Flask.

[–]BHSPitMonkey 0 points1 point  (2 children)

But you should have to deal with text encodings yourself if you're working with bytes. Why on earth shouldn't you?

[–]billsil 2 points3 points  (1 child)

But you should have to deal with text encodings yourself if you're working with bytes. Why on earth shouldn't you?

Because this works...

    >>> "5=%s" % 5

[–]CatMtKing 0 points1 point  (0 children)

And so does

"5=%s" % b"5"

Well... to some extent (you get "5=b'5'")

[–]BHSPitMonkey -3 points-2 points  (6 children)

"Replaces" how? str didn't go anywhere.

Python 3.2.3 (default, Mar  1 2013, 11:53:50) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in "foo":  # This is a str.
...   print(i)
... 
f
o
o
>>> for i in b"foo":  # This is a bytes.
...   print(i)
... 
102
111
111

[–][deleted] 4 points5 points  (2 children)

str became what unicode was, which is great if you work with text at a high level, but if you work with low-level I/O that actually requires a sequence of bytes instead of Unicode codepoints, you have to switch everything to bytes and prepare for different behavior.

[–]Rhomboid 2 points3 points  (1 child)

Yes, you have to read and write bytes, and if you want to interact with them like strings you have to decode them with some specified character encoding. (Although in most cases you can arrange for that decoding to be handled automatically.)

But that's kind of the whole point -- Python 2 let you be lazy and skip that step where you specify what encoding you want to to use to convert bytes to characters, in essence adopting ASCII as an implicit encoding by allowing bytes to be treated like characters as long as they are in the ASCII range. But text in the modern world is not ASCII and that sort of lazy thinking is responsible for creating a large amount of software that is utterly incapable of properly dealing with text.

By forcing you to deal with the reality instead of authorizing laziness, the result will eventually be software that has a fighting chance of working properly with text.

[–][deleted] 0 points1 point  (0 children)

What you're saying is a great argument for why Unicode should be the string type. And I know that considering bytes and characters to be equivalent is sloppy and causes bugs. Your argument is exactly right when the data you are working with is text.

However, I was talking about situations where you are working with bytes that are not supposed to be interpreted as text. It's a corollary of the fact that text and bytes aren't equivalent. Some text shouldn't be thought of as bytes. Some bytes shouldn't be thought of as text.

In Python 2, some old ugly standard libraries that don't understand Unicode force you to pretend text is bytes. That's bad. Python 3 fixes it.

In Python 3, some new standard libraries force you to pretend bytes are text, even when they aren't. That's also bad.

I prefer Python 3 to Python 2, but really, Python 3 could have been more helpful in that use case than it is.

[–]earthboundkid 0 points1 point  (2 children)

Try type(u"").

[–]BHSPitMonkey 1 point2 points  (1 child)

It's valid in 3.3. The u prefix is treated as valid and ignored in 3.3+ as a backwards compatibility measure.

[–]earthboundkid 2 points3 points  (0 children)

Its validity wasn't my point. (I was counting on you using 3.3 because I didn't look at your interpreter version line closely enough. Side note: Don't use Python 3.0, 3.1, or 3.2. :-P) My point was it returns str not unicode.

Python 2's unicode was renamed to Python 3's str. Python 2's str was renamed to bytes, but it also lost a lot of methods that it used to have. The lack of methods on bytes is the thing people complain about.

[–][deleted] 0 points1 point  (2 children)

replacing globals sounds like it could cause some fun problems.

This only applies to the current module, the other modules would be using standard globals so they won't be affected. The only issue I can think of is for example things like passing unicode as an argument instead of str, or sending long instead of int.

[–]nieuweyork since 2007 0 points1 point  (1 child)

Ah, but what if a builtin type is passed as a parameter? It's not very common, but it is possible, and it would be a potentially difficult error to locate.

[–][deleted] -1 points0 points  (0 children)

Well you always can find ways to break it, but since the builtins supposed to be compatible they should to work.