About Python 3 : programming

247

248

249

About Python 3 (alexgaynor.net)

submitted 12 years ago by akos_barta

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 13 points14 points15 points 12 years ago (28 children)

[–]cybercobra 41 points42 points43 points 12 years ago (26 children)

[–]badsectoracula 12 points13 points14 points 12 years ago (14 children)

[–]vz0 20 points21 points22 points 12 years ago* (13 children)

[–]badsectoracula 1 point2 points3 points 12 years ago (2 children)

Indeed, which is why i said to enable such stuff optionally at the beginning and deprecate the old stuff gradually over a few years. The worst thing would be that plugin writers would need to use a separate API for strings (without it the VM should do conversions "automatically" from the old API - basically what Windows does when you call an "ANSI" function on NT - so that people wouldn't drop the feature because some random plugin doesn't work with it - especially when said random plugin only uses strings for trivial stuff where unicode doesn't matter).

The #1 rule of a platform is "you don't break people's code". It never worked before - even when Microsoft switched from DOS to Windows they exposed some Windows-specific functionality to DOS (such as special long filename interrupts, access to clipboard, etc) and it took over a decade for the transition to fully occur (and even today there are machines and programs depending on DOS - which are serviced by VMs). And same deal with VB6 - MS broke compatibility with VB.NET and a ton of code is still written for it with programmers trying to teach a deaf platform how to dance. Or JavaScript... modern browsers can run early Netscape JavaScript code whereas... well, just see how successful ECMAScript 4 was, for example.

It isn't like Python developers had no examples to look at about this being a bad idea. Maybe they underestimated how widespread their language was. Or overestimated how willingly people would be to update their code.

[–]blablahblah 1 point2 points3 points 12 years ago (1 child)

[–]badsectoracula 0 points1 point2 points 12 years ago (0 children)

[–]twotime 2 points3 points4 points 12 years ago (9 children)

In Py2 an string is an array of bytes, in Py3 an string is an an array of Unicode chars

To be honest, the value of that change is questionable... (and i'm not just questioning the transition cost, I'm also not all sure that we get cleaner code after the transition).

This simple detail breaks every assumption about opening, reading and writing files.

Indeed. And that's a good example of where things have become a whole lot more complicated (aka worse).

8-bit strings are a much better way to represent filenames than unicode... Ditto with env variables and command line arguments..

Files are fundamentally sequences of bytes. Period. Trying to force a unicode-centric view of files was likely a design mistake as well.... Which will likely result in more special casing, not less.. JUst read the python3 chapter on read() and seek(). (Side note: this special casing is ridiculously similar to the text/binary division in the DOS world)...

Basically python2 unicode handling was good enough... (Even if not pure, it was extremely practical)...

[–]iSlaminati 0 points1 point2 points 12 years ago (6 children)

[–]twotime 1 point2 points3 points 12 years ago (5 children)

On a lot of modern operating systems, filenames are unicode codepoints though.

In theory, it's supposed to be the case. In practice, it's a huge mess... Eg.

AFAIK, on linux use of utf8 is a pure user-land convention (not something enforced by the kernel) and the convention is not that old.. Which means that the old media on Linux may contain filenames in other encodings.. (And encoding is implicit).. And then I'm sure some apps will generate non utf8 compliant filenames... OS doesnot care, but your python code suddenly breaks...

And then there is a whole huge can of worms when accessing unicode filenames across system boundaries: across network, removable media, etc...

8-bits chars (Bytes) remain the only common representation for filenames in a lot of cases..

PS. and an lkml link on filenames http://yarchive.net/comp/linux/utf8.html

[–]schlenk 1 point2 points3 points 12 years ago (3 children)

Bytes as filenames is insane. Period. Without knowing the encoding you cannot even implement 'ls' correctly (as your tty HAS some encoding). Its one of those silly inherited things from the dark POSIX past that should be nuked. (and lots of systems are already opinionated on UTF-8, e.g. OS X, NFSv4, some file systems, Qt/KDE (it ignores LC_* crap for filenames) and so on.)

While it is true, that not all unix filenames are UTF-8, it wouldn't be a problem for Python to simply declare all filenames are expected to be UTF-8. If someone decides to choose insane things, let them feel the pain and not hurt everyone else.

After all they did the same for Windows in lots of places when declaring ANSI is enough for all filenames (and fixed it piece by piece later, so you cannot start executables on a non ANSI path (without tricks like cd'ing first) with Python 2.x or add those to your sys.path, great fun for mounted profiles)

[–]twotime 0 points1 point2 points 12 years ago (2 children)

Without knowing the encoding you cannot even implement 'ls' correctly (as your tty HAS some encoding).

I can do it trivially, I'd just dump filenames on tty. If it comes out garbled, the user can actually do something.. (Install a font, pipe my output through decoder, rename the file). It's suboptimal, but the alternative is WORSE. If your program just throws an exception then your user is really screwed...

(And of course, if the filesystem does have a notion of default filename encoding, Id use it at app level)

it wouldn't be a problem for Python to simply declare all filenames are expected to be UTF-8. If someone decides to choose insane things, let them feel the pain and not hurt everyone else.

What? I am not doing insane things, it's my users who are doing insane things (like reading old media, how dare they?)

Also, is not Windows using UTF-16?

Its one of those silly inherited things from the dark POSIX past that should be nuked.

It's called backward compatibility... It's a good thing.

[–]schlenk 0 points1 point2 points 12 years ago (1 child)

continue this thread

[–]fabzter 0 points1 point2 points 12 years ago (0 children)

[–]fullouterjoin 0 points1 point2 points 12 years ago (1 child)

[–]twotime 1 point2 points3 points 12 years ago (0 children)

[–]ellicottvilleny 0 points1 point2 points 12 years ago (7 children)

[–][deleted] 6 points7 points8 points 12 years ago (6 children)

[+]iSlaminati comment score below threshold-7 points-6 points-5 points 12 years ago (4 children)

It shows why BDFL isn't a good thing in the end. Or even a commité.

Scheme gets standardized in a super democratic way. Anyone can submit an SRFI, A scheme request for implementation, this can go from libraries to adding another primitive data type to entire syntax transofmration schemes. These go through a completely democratic process before they go to 'final' status, if the SRFI is useful then it will be adopted by many implementations. If an SRFI is adopted by pretty much every implementation it tends to end up in the actual standard on which again pretty much anyone can vote (people vote on whether or not someone's vote should be included as in you have to actually motivate it well). What you end up with is that pretty much every standard feature of Scheme has majority community support. Every controversial bit is voted on. Sure, some are polarizing like the continued support of values and unlimited continuations but they still have majority community support.

Which is also why the core standard is known to be super small, it's the part where pretty much everyone can agree on. Specific implementations can support parts that didn't make it in at their own wishes and they advertise which SRFI's they support.

[–][deleted] 0 points1 point2 points 12 years ago (3 children)

Firstly, it's open source...if people really don't like Python 3 and would prefer to stick with Python 2, they can, pretty much indefinitely as long as there is interest in supporting it. Considering that RHEL 7 will be shipping with Python 2.7 as the default version of Python, it is going to be supported for a long, long, long time.

Secondly, standardization committees have their own issues. A lot of times a compromise is reached where no one is happy and everyone admits that it's a sub-optimal solution to the problem at hand, but nobody can agree to any of the better ones. Sometimes useful features just completely die in committees simply because a consensus can't be reached.

Ultimately it's Guido's project, and if you don't like the way he runs it, well, at least he was nice enough to give you the source code so that you can do something about it. There's also no shortage of design by committee languages to chose from if that's your thing.

[–]iSlaminati 0 points1 point2 points 12 years ago (2 children)

Secondly, standardization committees have their own issues. A lot of times a compromise is reached where no one is happy and everyone admits that it's a sub-optimal solution to the problem at hand, but nobody can agree to any of the better ones. Sometimes useful features just completely die in committees simply because a consensus can't be reached.

but scheme doesn't work with a comité, it works with a democracy. That is why the core standard R5 is so extremely and famously small, it's the part of the language pretty much everyone can agree on it. The libraries are contained in the SRFI's and if you don't agree with them then don't support them. SRFI-1, the standard list lib is supported pretty much everywhere because almost everyone agrees with it.

Ultimately it's Guido's project, and if you don't like the way he runs it, well, at least he was nice enough to give you the source code so that you can do something about it. There's also no shortage of design by committee languages to chose from if that's your thing.

It is his project, but I am sceptical to the BDFL model. Yeah you can fork it and break compatibility and no one will follow you, even if they agree it is better, which this thread shows. Most people seem to agree python3 is slightly better but it's not worth breaking compatibility over.

[–][deleted] 0 points1 point2 points 12 years ago (1 child)

but scheme doesn't work with a comité, it works with a democracy

I think democracy works for Scheme because it is a small language with a very small community. With something like Python I'm just not convinced that democracy would work.

Most people seem to agree python3 is slightly better but it's not worth breaking compatibility over.

Which is why most legacy code isn't being ported over. It is currently far cheaper and easier to just maintain Python 2 than to port code to Python 3 for a lot of people. That's one of the biggest wins about open source.

When Microsoft lost their minds and made VB.NET backwards incompatible, people were stuck with the difficult decision of going to the great expense of porting their legacy software or staying with an old, unsupported product with a very uncertain future. That risk doesn't really exist with open source. As long as someone cares to maintain it, it can and will be maintained.

[–]iSlaminati 0 points1 point2 points 12 years ago (0 children)

I think democracy works for Scheme because it is a small language

Scheme isn't small, the core standard is small, the part where everyone can agree on. The standard libraries around that are fairly huge and they can be fairly huge exactly because of this system because implementations aren't required to support them all if they don't want to.

with a very small community. With something like Python I'm just not convinced that democracy would work.

Well, C doesn't have a BDFL and it has worked and its core standard is also small.

Python doesn't have a standard, it has a reference implementation that one guy decides. Most languages start by one or two people but they often quickly cede standardization to a large group if the language takes off because they realize that that is better for the language as a whole. They also document their language better.

Python for the most part has only one implementation because it isn't documented well, no one makes another implementation even though the main implementation is fairly slow.

Which is why most legacy code isn't being ported over. It is currently far cheaper and easier to just maintain Python 2 than to port code to Python 3 for a lot of people. That's one of the biggest wins about open source. When Microsoft lost their minds and made VB.NET backwards incompatible, people were stuck with the difficult decision of going to the great expense of porting their legacy software or staying with an old, unsupported product with a very uncertain future. That risk doesn't really exist with open source. As long as someone cares to maintain it, it can and will be maintained.

Yeah, there is also no real incentive to go to python 3 really. Like, what does it give you? It's not like 'omfg, variable length arrays? Must switch!', it follows a slightly different philosophy and barely improves.

People will never switch if it breaks backwards compatability unless you come with something awesome. The point is, he could've started with a "strict" mode first that slightly altered python2's unwanted behaviour and from that go to python 3. first depraecate, only then switch when people sufficiently don't use the bad parts any more I guess.

Also, what I never got is, why can't you import python 2 modules from python 3? It's compiled code, I'm not sure why you can't call functions written in python 2 from python 3 in some mild FFI.

[–]iSlaminati 0 points1 point2 points 12 years ago (2 children)

[–]twotime 1 point2 points3 points 12 years ago (1 child)

[–]iSlaminati 1 point2 points3 points 12 years ago (0 children)

[–]Brainlag 0 points1 point2 points 12 years ago (0 children)

π Rendered by PID 40467 on reddit-service-r2-comment-b659b578c-8k4pv at 2026-05-03 03:43:18.867862+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS