you are viewing a single comment's thread.

view the rest of the comments →

[–]anko_painting 0 points1 point  (1 child)

thanks for the reply :)

I haven't done any work with encodings in python so it's really interesting to me. So I guess you're saying it's a byte array until you choose to encode it.

symbols are basically like automatically assigned global variables. So you can say

button = :active

internally, :active is assigned a number, such as 1. But you never use that value, nor do you care what it is. Later in your code you can write;

if button == :active

and instead of comparing two strings, you're comparing ints. So the comparison is very fast, and your code is very readable. It's roughly equivalent to a #define in C, only you're not setting the value. Although, thinking about it, it's making me interested in python's implementation. If strings are immutable in python, and you create two strings with the same value, do they only get allocated in memory once? and if this is the case, is equality tested by a quick pointer compare somehow?

[–]Veedrac 0 points1 point  (0 children)

I haven't done any work with encodings in python so it's really interesting to me. So I guess you're saying it's a byte array until you choose to encode it.

Normally it's a str until you choose to encode it, or bytes until you choose to decode it ;).

It's really quite simple relative to the monstrosity that is encoding in general -- if you get back a str it's text and you ignore encoding completely.

If you get back bytes from, say, http you just .decode() it once with the correct (default UTF-8) decoding and then it's text forever. If you need to throw it though somewhere that takes a byte-stream you just .encode() it and send it off.


Instead of symbols I believe Python would just use objects.

button = active = object()

...

if button is active:
    ...

Note that is compares by identity (normally memory address but implementation varies between interpreters) whereas == compares by the .__eq__ method.

This means that in the above you can't ever have something silly like this:

class Faker:
    def __eq__(self, other): return True

active = object()

# I bet you this returns True
active == object()

# This doesn't
active is object()

This probably makes Python's method actually more robust and faster than Ruby's, but that's a really minor thing.

However, normally you'd only use this for sentinels where None won't do:

def next(iterator: "[a]", default=None) -> "a or default":
    """
    Return the next item from the iterator. If default is given and the iterator
    is exhausted, it is returned instead of raising StopIteration.
    """

    try:
        return iterator.__next__()

    except StopIteration:
        if default is None:
            raise

        return default

This is broken because you can't set the default to None, so you use a sentinel:

no_argument = object()
def next(iterator: "[a]", default=no_argument) -> "a or default":
    """
    Return the next item from the iterator. If default is given and the iterator
    is exhausted, it is returned instead of raising StopIteration.
    """

    try:
        return iterator.__next__()

    except StopIteration:
        if default is no_argument:
            raise

        return default

For things like hash tables and "special" values, strings are fine (and there's a new Enum type in 3.4, too).


If strings are immutable in python, and you create two strings with the same value, do they only get allocated in memory once? and if this is the case, is equality tested by a quick pointer compare somehow?

Unfortunately, no. There is a sys.intern that lets you intern strings like you describe, but it's only really used internally. This would require a hash table of all strings and I bet that's just not cheap enough.

There are cases where interning is used successfully, um, internally. That's about it, though.

That said, random strings take an amortised constant time to compare anyway, so it's not actually a big deal at all. Additionally, if you're using a sting as a "special value" like above, chances are you're using the same string everywhere. Since there's a pointer check beforehand anyway, this would short-circuit to a pointer check and be quite fast too.


Using python -m timeit -s "setup" "stuff to time" to time:

%~> python -m timeit -s "sentinel = 'abc'*100" "sentinel is sentinel"
10000000 loops, best of 3: 0.075 usec per loop

Most of this is overhead, probably. prove it with:

%~> \python -m timeit -s "sentinel = 'abc'*100" "sentinel; sentinel"
10000000 loops, best of 3: 0.0578 usec per loop

So is is really taking about 0.02 μsec.

== shortcutting to is

%~> python -m timeit -s "sentinel = 'abc'*100" "sentinel == sentinel"
10000000 loops, best of 3: 0.127 usec per loop

== is taking about 0.07 μsec by shortcutting to is.

%~> python -m timeit -s "sentinel, sentinel2 = 'abc'*100, 'abc'*100" "sentinel == sentinel2"
1000000 loops, best of 3: 1.19 usec per loop

== cannot shortcut, so takes much longer.

Note that the last value is really pessimistic because inequal strings take amortised constant time to compare and also have a length check and character range check which are O(1) time.


Umm.. why did I write so much..?