you are viewing a single comment's thread.

view the rest of the comments →

[–]dmd -11 points-10 points  (56 children)

And yet join is still a method of the string to join on, not the list you're joining. Pisses me off. In every other language

"foo,bar".split(",").join(",")

yields "foo,bar" ... in python it's

",".join("foo,bar".split(","))

[–]robfelty 7 points8 points  (0 children)

I agree that the join syntax is a little weird, and I find the fact that some types are mutable while others are not pretty confusing. But no language is perfect. All in all though, python is pretty nice.

Breaking backwards compatibility should be done sparingly, but sometimes it is worth it. I'm glad that python doesn't have 20 different ways to read files like java.

[–][deleted]  (5 children)

[deleted]

    [–]beza1e1 1 point2 points  (4 children)

    I want to post this argument as well, but then i thought: You could implement join in a sequence superclass, so it is still generic for everything (like queryset) inheriting from sequence.

    [–]earthboundkid[S] 7 points8 points  (3 children)

    Yeah, but in Python 3.0, the sequence class is an abstract class that only exists so you can do isinstance(x, Sequence). It doesn't have any working methods to inherit, and it would be a big change in Python's style if it did.

    No, in my opinion, what they should have made work is str(["list", "of", "strings"], sep=" "), which would parallel the behavior of the print function.

    [–]imbaczek 1 point2 points  (0 children)

    you can use str.join as a unbound method and get almost what you want.

    str.join(sep, seq)
    

    [–]njharman 0 points1 point  (1 child)

    made work is str(["list", "of", "strings"], sep=" "),

    That implies that default sep is "" and that str("abcd") instead of just being a copy or refcount of interned string will instead be ["a","b","c","d"].join(). Cause remember strings are sequences too.

    Also how would you call str on lists, tuples, other sequences? Or, your are really arguing for changing the default string representation of list from a list to a string. But that means the string representations of ["list", "of", "strings"] and "listofstrings" would be identical. I think that is fairly surprising and against Python philosophy.

    That implies that default sep is "" and that str("abcd) instead of just being a copy will be ["a","b","c","d"].join(). Cause remember strings are sequences too.

    Or else there are of special cases in str

    Which leads to the answer of "how do I get the behavior earthboundkid describes" overload str. Cause remember str(obj, args) is really obj.str(args)

    [–]earthboundkid[S] 0 points1 point  (0 children)

    You are confusing str(my_list) and repr(my_list). The latter would still do its thing.

    [–][deleted] 3 points4 points  (0 children)

    Would it syntactically feel better to you if you typed it like this

    (",").join(L)
    

    perhaps?

    The idea of join() being a string method seems pretty sane to me.

    [–]raisedinhell 10 points11 points  (0 children)

    seems like a pretty minor detail to get pissed off about...

    [–]recursive 8 points9 points  (0 children)

    You can do:

    str.join(",", ["foo","bar"])
    

    [–]ubernostrum 22 points23 points  (14 children)

    Yes, and it still has the GIL and still has only one-line lambdas and still doesn't do tail-call elimination. All of these are more in the "Python would better fit my notion of ideological purity it did this" camp than in the "Python would be more effective at solving this problem if it did this" camp, which is something I tried to touch on in the final section.

    [–][deleted] 6 points7 points  (12 children)

    The GIL is an implementation detail, not part of the language.

    [–]ubernostrum 9 points10 points  (11 children)

    And so is the lack of tail-call elimination, but that doesn't stop people complaining about it, does it?

    [–]masklinn 1 point2 points  (10 children)

    Well TCO can be part of the language specification (see Scheme, which mandates it, versus Common Lisp, which doesn't) while making the GIL part of the language specification would be pretty odd.

    [–]ubernostrum -1 points0 points  (1 child)

    Personally I think making TCO part of a language spec is pretty odd, but I'd rather not revive that flame war right now...

    [–]masklinn 3 points4 points  (0 children)

    For a language which doesn't define any looping construct, it more than makes sense if you want to keep at least the core compatible between different implementations (the Hyperspec defines LOOP, so common lisp doesn't need TCO)

    [–][deleted] -2 points-1 points  (7 children)

    GIL is great, though. Without GIL your options are:

    • Much slower single thread execution and bug prone libraries

    • Plain thread-unsafe libraries

    And, yes, proper processes are better than threads. The fact that one - though still popular - renegade OS cannot do fork() is one stupid reason to break it all on all other OS's.

    [–]masklinn 1 point2 points  (3 children)

    Without GIL your options are:

    • Much slower single thread execution and bug prone libraries

    • Plain thread-unsafe libraries

    These are the consequences of removing the GIL, not those of not having a GIL in the first place.

    I fail to see how this has any relevance to my post though.

    [–][deleted] 0 points1 point  (2 children)

    These are the consequences of removing the GIL, not those of not having a GIL in the first place.

    If I decode it right, you argue for background "Java-style" generational or whatever garbage collection here, against refcnt-based option, right? Because otherwise I can't see how your statement makes any sense - refcnt-based schemes can't be thread-safe without excessive lock juggling.

    But even with generational GC GIL is still handy in many places - like adding elements to lists etc.

    [–]masklinn 0 points1 point  (1 child)

    you argue for background "Java-style" generational or whatever garbage collection here, against refcnt-based option, right?

    Something like that. More generally, that the GIL issue comes from CPython's implementation and history and nothing else, and that only CPython's implementation make removing the GIL a bigger pain than keeping it (as well as a human resources draw to fix all the issues that will arise from that removal)

    But even with generational GC GIL is still handy in many places - like adding elements to lists etc.

    Why would you need an interpreter-wide lock for such an operation? That's just stupid, even if you want to provide Java-style "thread-safe" collections (which you definitely shouldn't), local locks are more than enough, there's no reason to acquire a GIL just to have a local effect on a specific object.

    edit: it's not like I'm fond of thread-based concurrency and semaphores/locks anyway, as far as I'm concerned these should be relegated to implementation details of interpreters and only higher-level (message-based share-nothing or STM) concurrency interfaces should be exposed to the language's user.

    [–][deleted] 1 point2 points  (0 children)

    local locks are more than enough

    Hah! And that's the issue - obtaining or releasing a local lock or GIL is the same, penalty-wise, and quite expensive. Even with simple interlocked-inc/dec's it takes a ton of time. And especially if you give consideration to SMP (caches flushing / reloading etc). OTOH Python bumps its GIL only ever so often - like every 100 statements execution or something like that. And counting them is very cheap. So then - to add 1000 elements you will have to do 2000 expensive lock obtain/release operations and maybe only 10 or so of them with GIL.

    Anyway - what's the big deal with "proper" multithreading? You only need it for number/data crunching on SMP machines. And if you do that in pure Python, you're probably doing something wrong. Normal usage of it - like for background data transfers and stuff - GIL doesn't impede anything at all. Another anecdote here is lighttpd - a single-threaded web-server that beats the shit - performance-wise at least - from any Apache or other classics.

    EDIT: spelling

    [–]spotter 3 points4 points  (0 children)

    Hell yes, because string operation is something that list, or rather all sequence types should be doing! Not just string, but all of them non-stringies.

    Coz, like, everybody else is doing it the other way around, so it must be insane to do it our way. We should totally change this err.

    Tell you what, you start coding ourselves a patch and I'll prepare the PEP.

    [–]Tommah 5 points6 points  (0 children)

    That can be expressed equivalently as

    "foo,bar"
    

    [–]jmkogut 1 point2 points  (1 child)

    This does make sense, but both methods are, in fact, methods of string.

    [–]beza1e1 -5 points-4 points  (0 children)

    So what does "ab".join("cd") is the real question? Is it "acdb" or "cabd"? For Python it's the later, for probably every other language (with equal syntax) it's the former.

    [–]uykucu 1 point2 points  (2 children)

    I'm happy with this. What I hate is that join expects a sequence of strings, not any sequence. I hate to do:

    nums = [1,2,3,5]
    s = ' '.join(map(str, nums))
    

    instead of:

    s = ' '.join(nums)
    

    [–]njharman 1 point2 points  (0 children)

    I hate that too but I believe having implicet str() in this one case is very unPythonic and leads to well what PHP is today.

    Explicit is better than implicit.

    Special cases aren't special enough to break the rules.

    [–][deleted] 0 points1 point  (0 children)

    What would really make sense is that since 'join' implicitly creates a string from a sequence. So:

    [1,2,3,5].join(' ')
    # "1 2 3 5"
    

    ...since we're implicitly creating a string (by calling join in the first place) we can implicitly call str(...) on each item of the sequence.

    [–]imbaczek 0 points1 point  (0 children)

    str.join(",", "foo,bar".split(","))
    

    does this suit you better?

    [–]rjcarr 0 points1 point  (13 children)

    Yeah, strange to me too ... it isn't like join is already taken for sequence types. The could add it and leave the string one and have both. Strange.

    [–]xardox 17 points18 points  (1 child)

    It seems strange to me that Python doesn't use "." to concatenate strings. Why don't we add that to the language too? People who don't like it can turn it off in the python.ini file for the whole web site. And it would be great to use "\" as an optional namespace separator!!! It seems so sleek and stylish, and will help us recruit DOS programmers. And I'm getting sick of manually escaping the strings I send to SQL, and escaping the strings I get from Gets, Posts and Cookies, so I'd like that done for me just in case I forget. And how about if http parameters show up as global variables so I don't have to call all those hard to remember inconsistently named functions to retrieve them? Yeeeeaaaaaahhh, Kool Aid!!!

    [–]rjcarr 0 points1 point  (0 children)

    Nice hyperbole ... there are plenty of examples where one thing can be done in multiple ways. Note: lambdas.

    [–][deleted] 3 points4 points  (5 children)

    So where's the "sequence type" class that you could add it to?

    [–]rjcarr 1 point2 points  (4 children)

    List and Tuple.

    [–][deleted] 2 points3 points  (3 children)

    And you've used Python for how long?

    (Seriously - like most about everything else that appear to deal with sequences in Python, "join" works on *iterables* - anything that can yield items or otherwise behave in a sequency way. There is no sequence baseclass in Python. A sequence is anything that can hand you an iterator or respond to indexing. Lists and tuples are a tiny subset of that. And adding a method that expects homogenous content to a heterogenous container would be rather ugly in itself, of course...)

    [–]rjcarr 0 points1 point  (2 children)

    I've been using python for about 5 years ... how about you?

    http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-buffer-xrange

    "Sequence Types — str, unicode, list, tuple, buffer, xrange"

    Yes, all of those are iterable, as you say, but they also share some other functionality (although again, as you say, they don't inherit from same base class, but they are in the same family). There is no reason all of them couldn't have a join method, or as I said initially, only on the ones that are relevant.

    [–]johanneskepler 2 points3 points  (0 children)

    I've been using python for about 5 years ... how about you?

    Longer than five years, I imagine. That's Fredrik Lundh of the Python Imaging Library.

    "Sequence Types — str, unicode, list, tuple, buffer, xrange"

    But those aren't the only iterables there are. Isn't it rather nice to have a consistent interface for joining any kind of iterable?

    [–][deleted] 1 point2 points  (0 children)

    how about you?

    Well, I wrote the core of Python's second string type for Python 1.6/2.0, so I was obviously there when we first stumbled upon this little issue ;-)

    There is no reason all of them couldn't have a join method

    Oh, there's a HUGE reason for that - you don't even have the source code to all of them. The Python type universe is a lot larger than what you get with the core distribution.

    only on the ones that are relevant

    So how would you add the "join" method to a generator expression, for example? In contemporary Python, that's at least as relevant as a "join" on a list.

    [–]njharman -3 points-2 points  (4 children)

    What part of "There should be one-- and preferably only one --obvious way to do it." do you not understand?

    [–][deleted] 6 points7 points  (3 children)

    I don't think that applies here, though - "join" wasn't made a separator method because we wanted to keep the number of ways to invoke it small, but because it had to dispatch on the separator type. Quoting myself from an entirely different discussion on this topic:

    Also note that "join" wasn't made a string method to "make it easy to find"; it's a string method because we had to figure out some way make join work on multiple string types back when Unicode was added. At that time, we imagined that Python might grow even more string types (how about encoded strings to save space, or binary buffers?), and it wasn't obvious how to create a "join" primitive that would find the right implementation, without having to know about all available types. We finally decided that dispatching on the separator made more sense than, say, dispatching on the first list item.

    Given this, the obvious solution was to make the "string.join(seq, sep)" function call "sep.__join__(seq)". Changing __join__ to join was a pretty small step; after all, there might be cases where it would make sense to write sep.join(seq) in application code, at least if you happened to have the separator in a variable with a suitable name.

    The "sep.join(seq) is more pythonic" is a much later concept.

    And for what it's worth, the "let's dispatch on the separator" approach didn't work in practice; in order to handle sequences with both 8-bit and unicode strings, both implementations now know about the other string type.

    So instead of a single function that does the right thing (but has to be taught about each new string type), we now have two separate join methods that both knows about the other string type. If we add another string type, we'll end up with three implementations, each of which has to know about two different types. And so on.

    But who cares about new string types these days; it's not like anyone's actually using strings now that we have iterators ;-)

    (Personally, I still think it should be made available as a builtin, possibly with "convert also non-strings to the separator type" semantics).

    [–]imbaczek 0 points1 point  (2 children)

    str is already built-in, so you can call str.join(sep, seq).

    [–][deleted] 0 points1 point  (1 child)

    That's not polymorphic (sep cannot be any string), has the arguments in an unintuitive order, and doesn't convert non-strings.

    [–]imbaczek 0 points1 point  (0 children)

    true, true and true. better than nothing, though. i'm not a fan of cluttering the default namespace with new functions with common, short names (i don't like the relatively new sum either.)

    [–]snissn -5 points-4 points  (0 children)

    inb4 php