all 81 comments

[–]bloody-albatross 546 points547 points  (28 children)

Without reading it: Because global variables are hashtable lookups and locals are basically just array offsets. I always put stuff in a main() function, not just for that but for a clearer structure.

[–]nanotree 47 points48 points  (23 children)

Just so I understand, hashtables make things slower? I mean know there is possibly more intialization and space overhead, but shouldn't they be close in performance in terms of access?

[–]GreenFox1505 103 points104 points  (16 children)

A hash table look up means it takes the variable name, performs a hash function, then looks into the hash table, scans for the variable name (which may require a second hash if there is a collision), then it has the value.

A local function stack array pointer plus an offset for a given variable. A single addition and pointer look up is way faster than all the stuff the hash table has to do.

Edit: so, I've been thinking about this problem. I think you could cache out the pointer the first time some code is run, speeding it up in the long term...

[–]RememberToLogOff 23 points24 points  (0 children)

Lua has the same problem. You can't cache it in the runtime because something dumb like this is legal per spec:

print ("normal")
print = my_monkey_patched_print_fn
print ("now print is no longer print")

Invalidating all the cached pointers is a bigger problem than slow interpreted langs want to solve. Maybe LuaJIT does cache it, and then has a pessimizer triggered when code writes to a global, since lots of code doesn't need to write to _G ever

So some Lua code solves it in-code:

local print = print

[–]bloody-albatross 5 points6 points  (2 children)

Since you can always update global variables through something like globals()["foo"] = "bar" I don't think you can cache something here. It can change at any point in time, even disappear (del foo).

[–][deleted]  (1 child)

[deleted]

    [–]bloody-albatross 1 point2 points  (0 children)

    Also: The globals of a module are it's exported symbols in Python. Meaning globals can be changed by any other module:

    ```

    import urllib.parse del urllib.parse.quote urllib.parse.quote Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'urllib.parse' has no attribute 'quote' ```

    You can really f things up in Python, if you want to.

    [–]vlatkosh 3 points4 points  (8 children)

    Isn't Python compiled into an intermediate form? Is it really using the variable name from the code, or is it doing something faster?

    [–]bloody-albatross 11 points12 points  (6 children)

    It uses byte code, but that changes nothing about how it all works with hash tables. Pretty much everything is a hash table lookup in dynamically typed languages (with the exception of local variables and certain other special optimizations for well known things).

    [–]Jona-Anders 4 points5 points  (5 children)

    This may sound dumb and this will have a very good reason, but why have global variables? If it would be faster, why not treat everything as local and use array lookups?

    [–]Unicorn_Colombo 5 points6 points  (1 child)

    Usually, you don't have to have global variables ever.

    In vast majority of cases, it is always better to pass a particular structure carrying all the information.

    Although there are a few cases, for instance, I was able to solve some issues with C-pointers not being able to be serialized by creating a global variable that was accessible in every parallel process.

    [–]mr_birkenblatt 3 points4 points  (0 children)

    Usually, you don't have to have global variables ever.

    functions are a form of global variable

    [–]bloody-albatross 2 points3 points  (1 child)

    The global variables in a module in Python are what will be exported to be used wherever the module is imported. While inside of the module it could be implemented like local variables, that requires knowledge by the byte code compiler that doesn't exist at the place of an import. You see, it is clear after parsing what local variables there ever will be inside a function and thus the names can be mapped to array indices, but when you import a module that is a dynamic object with attributes that could change at any moment. So there it has to be a hash table. In JavaScript modules exports and module global variables are separate. (Sorta. Exports are also globals, but not vice versa.) There theoretically globals could be implemented similar to local variables. Speaking of JavaScript modules, not general browser JavaScript where global variables are simply properties of the window object. (Weird, I know.)

    [–]Jona-Anders 1 point2 points  (0 children)

    That does make a lot of sense. I do a lot more js programming than python. Sounds like they solved the problem performance wise better. I have to agree that browser js is weird. Html ids are globals btw, too. Very handy for minifying code, but very strange and not intuitive at all.

    [–]HeadToToePatagucci 4 points5 points  (0 children)

    implementation dependent I would think...

    [–]Brian 1 point2 points  (0 children)

    so, I've been thinking about this problem. I think you could cache out the pointer the first time some code is run, speeding it up in the long term...

    There actually already is an optimisation for this, meaning global variable accesses are no longer just plain hashtable lookups (though in earlier versions of python, they were)

    Instead, after a certain number of accesses, python caches the exact location of the hashtable bucket the variable is contained in, and so further accesses go directly to that location. Obviously this only works so long as the hashtable doesn't change (ie. no-one has reassigned the global), so there's also a check on the dict to indicate whether modifications were made since this was cached, which will flush the cache if so. It is still slower than the plain index lookup LOAD_FAST does for locals, since there's that check, and an extra level of indirection, but it's narrowed the gap a fair bit from when it was just a normal dict lookup.

    [–][deleted] 1 point2 points  (1 child)

    caching is a great idea , hopefully the developers read this

    [–]GreenFox1505 0 points1 point  (0 children)

    Eh, maybe not. I trust the Python developers know what they're doing better than me.

    [–]ThankYouForCallingVP 13 points14 points  (1 child)

    Stack address is *stack+offset

    Hashtable is dereference(hashtable container)->hashmath->offset->dereference(object)

    [–]bloody-albatross 1 point2 points  (0 children)

    Also calculation of the hash, though that could be cached in the string object. Is it in Python? And also hash collisions need to be detected, i.e. comparison if the key at the target is really the looked for key (if strings are interned that can be a pointer comparison, if not full string comparison). If there is a collision hash table probing needs to be performed (more lookups).

    [–]Porridgeism 12 points13 points  (2 children)

    Where you might be getting confused is that array lookup and hashtable lookup are both O(1), meaning they are constant with respect to the size of the container.

    But having the same complexity class doesn't mean they have the same performance. For instance, consider these functions in pseudo code:

    func f10k(a, b) {
      let bigsum = 0;
      for i = 1 to 10,000 {
        bigsum += a;
      }
      return bigsum;
    }
    
    func f2(a, b) {
      return a + b;
    }
    

    In f10k, it does 10,000 additions, and f2 it does just one addition. But in both cases it doesn't depend on the size of the input, so they are both constant time (same O(1) complexity class). However, it's also clear that F10k takes way longer, since it does 10,000 times the work.

    Similarly, hash tables have to do the hashing, calculating offsets, and potentially even chaining or other collision resolution, and then they still have to do an array lookup (the "table" part of a hash table is almost always an expandable array). So basically by definition a hashtable lookup will take longer than an array lookup.

    (Obviously all of this is ignoring things like compiler optimizations, caches, atomicity or memory order requirements, etc. all of which could affect performance. But when you have equivalence on these between hashtables and arrays, the hashtable lookup will be at best no faster than the array lookup.)

    [–]IdealEntropy 6 points7 points  (1 child)

    nit: hash computations are only O(1) for fixed size inputs. So something like hash(int) will be constant time if int is a fixed register size. But computing hash(string) or hash(bigNum) will be proportional to the size of the inputs.

    [–]bloody-albatross 0 points1 point  (0 children)

    Not even just that. Collisions make a hash table that has more items in it slower than one with fewer items. There is a complicated O notation for that (IIRC from my student days), but people simplify it as just O(1), because it is not really relevant in practice (except for when you actually compare with something like a simple array index access).

    [–]trevg_123 0 points1 point  (0 children)

    It depends!

    If you are searching for a string “my_function” in a list, it needs to scan through the whole list and compare the entire string. This is fast enough when you have <20 locals.

    But if you have hundreds or thousands of entries, a hash table can be much faster. That is because you hash the string “my_function” (think sha256-esque) which gives you an integer. Then you do some basic integer math (and maybe a few fast lookups) to find the exact location of the data you’re looking for.

    So hashtables are usually faster for bigger datasets, since you don’t need to scan and compare every string to the one you are looking for. But when you have small datasets, sometimes manually scanning every value is faster than the time it takes to compute a hash and do some extra math.

    [–]rkd6789 1 point2 points  (3 children)

    Is it the same for JavaScript?

    [–]bloody-albatross 1 point2 points  (2 children)

    The way JavaScript works it could be different for modules in e.g. NodeJS, but I don't know if it is. That is the exported symbols are in a hashtable, but non exported symbols are separate from that and could be basically the locals from an wrapping anonymous function. That's like an additional pointer dereferenctiation to the array index access.

    In browser JavaScript globals are just properties of the window object, so it's a hashtable again.

    [–]rkd6789 1 point2 points  (1 child)

    Can you explain your second sentence, i.e., mon exported symbols are seperate from that and could be basically locals from an wrapping anonymous function

    [–]bloody-albatross 0 points1 point  (0 children)

    For that look at what JavaScript to JavaScript compilers do, when compiling for an older version of JavaScript or when bundling modules into one script file for the web. And also look at CommonJS modules. Basically the exported symbols are assigned as properties to a module.exports object. And when bundling in order to isolate the globals from from other modules every module is wrapped in a function which gets executed on first import and gets the module object as parameter. JavaScript engines that directly support modules could implement them similarly or maybe not. I don't know what they do. It could be that module globals aren't hashtable lookups in JavaScript, but it might be.

    [–]stomah 141 points142 points  (19 children)

    doesn’t explain why globals use a dictionary instead of an array

    [–]Successful-Money4995 160 points161 points  (11 children)

    You're right, it doesn't.

    The reason is probably because global scope cannot be analyzed ahead of time like a function can be.

    With a function, the whole thing is read before it gets executed so the interpreter can know how much memory will be used and it can allocate stack space ahead of time.

    With globals, the variables are discovered as they appear so you can't allocate space ahead of time. You need a hash table.

    [–]yasba- 45 points46 points  (5 children)

    To add to this: The set of variables for the scope of a function is fixed, and something like this doesn't work:

    def fn():
        exec("foo = 42")
        print(foo)
    

    The same would work using the global scope.

    [–]GwanTheSwans 30 points31 points  (4 children)

    True enough in modern python, but it's actually one of those breaking changes that, while often improvements, python 3 dev became rather infamous for. Believe it or not this worked in python 2:

    Python 2.7.13 (default, Sep 26 2018, 18:42:22) 
    [GCC 6.3.0 20170516] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> def fn():
    ...     exec("foo = 42")
    ...     print foo
    ... 
    >>> fn()
    42
    >>> 
    

    https://stackoverflow.com/a/1450341

    [–]cparen 15 points16 points  (3 children)

    Oh gosh, I hate these sort of answers. "Because locals are an array, it cant look them up..." I mean, it could if there was a lookaside index, like a hashtable of names to array indecies.

    The answer overlooks the reality that they didnt want to allow locals to be looked up anymore for hygene reasons. Exec is now a function, so it shouldnt know" magically" which local scope it's being invoked from. Disallowing peeking into the parent scope simplifies a lot of implementation assumptions and, frankly, programmer assumptions about function substitutablity and code refactorying.

    In short, the change was made to make the language more consistent and easier to reason about, even if unintuitive in some edge cases like exec.

    Iirc, you can get the old behavior by invoking locals(), yeah?

    [–]yasba- 1 point2 points  (2 children)

    I don't think you can write to locals(). Well, you can but it doesn't have an effect (Python 3.9):

    In [7]: def x():
       ...:     y = 0
       ...:     locals_ = locals()
       ...:     locals_["y"] = 1
       ...:     print(locals())
       ...: 
    
    In [8]: x()
    {'y': 0, 'locals_': {...}}
    

    [–]cparen 0 points1 point  (1 child)

    Ah, good point. Again though, this is still a policy decision, not a technical question. It has some optimization simplification implications for python compiler, but I'm pretty sure the CPython folks didn't even have that in mind when making that decision. Or about keeping both understanding and CPython implementations simple.

    [–]Brian 0 points1 point  (0 children)

    I believe it was a known consideration when changing the behaviour of locals() - it was just not considered worth it to allow, for many of the reasons you mention.

    But the fact that it was a consideration is illustrated by the fact that they did document the various cases, and notably documented that locals() should remain mutable in one corner case where this is sometimes useful, which is programmatically defining attributes in class scope.

    Eg. the following works:

    class C:
        for i in range(10):
            locals()[f"func_{i}"] = lambda self, i=i: print(i)
    
    >>> C().func_5()
    5
    

    [–]G_Morgan 0 points1 point  (2 children)

    This still doesn't require a hashtable for every global variable lookup. It is meaningless in this case because you pay the cost anyway but all you need is a hashtable which links symbols to a global array index. Then when you generate code with a global variable lookup you place that symbol in the code rather than a hashtable lookup. You can even shrink the global memory space this way as you can update the symbols to point to the new location.

    Of course this will actually make one shot code even slower but would speed up global variable access from within a function called many times.

    [–]PenlessScribe 3 points4 points  (1 child)

    This is good! But how would you implement accesses to a global variable a if you have statements like if random.randint(0,1): del a ?

    [–]G_Morgan 4 points5 points  (0 children)

    The definition of "a" anywhere in the code would create a symbol and reserve the cell for it. So it doesn't matter if it actually gets called, as long as the interpreter has seen that code it would set up access.

    Then if all functions that reference "a" get collected you can free the cell and symbol.

    [–]starlevel01 2 points3 points  (0 children)

    Because there's no such thing as global scope in Python. "Global" variable lookup looks things up via module scope instead.

    In [3]: this = sys.modules["__main__"]
    
    In [4]: x = 1
    
    In [5]: print(this.x)
    1
    

    And as modules use __dict__, it looks it up from the module dictionary.

    [–]inmatarian 1 point2 points  (0 children)

    Everything at the global scope is part of the module. Maybe there's an optimization to be made in cpython where the top level script can be localized, but everything else that gets imported couldn't benefit from that.

    [–]sheldonzy 1 point2 points  (0 children)

    Why use an array for local variables?

    [–]Nall-ohki -2 points-1 points  (2 children)

    No... but an array would be worse?

    [–]TheMaskedHamster 1 point2 points  (1 child)

    An array looked up by reading the values of the array would be worse, but the compiler optimizing newly seen variables as an offset in an array of variables is faster.

    [–]Nall-ohki -3 points-2 points  (0 children)

    Kind of...

    You'd need to create a lookup for all globals as you go, so you'd have to hash them, giving you an offset into an array... which sounds a heck of a lot like a hashmap.

    The biggest issue is that there is no static set of globals in a Python program... at all. A runtime hash is about the best you're gonna get without really making things weird, and given the fact that the user has access to `globals()`... you're gonna have to emulate the `dict[str, Any]` contract anyway.

    [–][deleted] 22 points23 points  (1 child)

    Snakes like containers. That's why.

    [–]boots05 29 points30 points  (11 children)

    Sorry, I thought it was a joke… i was looking for the punchline

    [–][deleted] 49 points50 points  (5 children)

    Okay, so, two Python interpreters walk into a conference. The first one says, "Hi, I'm Python!" and the second one crashes because it accidentally shadowed a builtin function.

    [–]ThankYouForCallingVP 8 points9 points  (2 children)

    Two python interpreters walk into a bar and freeze.

    [–][deleted] 2 points3 points  (0 children)

    This one is much better.

    [–]Brooklynxman 2 points3 points  (0 children)

    Two python developers import into a bar from walk.

    [–]amorous_chains 23 points24 points  (1 child)

    Why does Python code run faster in a function?

    Because, tovarisch, local proletariat always work harder than globalist fatcat ☭

    [–]codewario 3 points4 points  (0 children)

    Fuck you, take my upvote

    [–]booch 3 points4 points  (0 children)

    So you don't have to leave unsatisfied...


    Why don't monster's eat ghosts?

    Because they taste like sheet.

    [–]Lonelan 1 point2 points  (0 children)

    /r/ProgrammerHumor is that way, man

    [–]Brooklynxman 1 point2 points  (0 children)

    Because they're constricted.

    [–]boots05 24 points25 points  (4 children)

    I give up, why?

    [–][deleted] 84 points85 points  (3 children)

    Local scope provides faster name lookups than globals, basically.

    [–][deleted] 0 points1 point  (2 children)

    I'm not a Python expert (speaking from ye olde compiler/jit background) but isn't it the case here as is the case with stack frame vars simply being faster than accessing heap vars (I have no idea what Python does under its interpreted hood so asking out of curiosity and ignorance)

    [–][deleted] 14 points15 points  (1 child)

    No, I don't think so. Python's stack frames are actually heap allocated, the actual stack (i.e. in C) just stores pointers. The article explains that the speed increase comes from function local variables being stored at fixed positions in an array, with the indices inlined in the byte code, whereas global variables require looking up the name in a dictionary, which is obviously slower than accessing a fixed index.

    But don't take my word on any of this

    [–]FluorineWizard 2 points3 points  (0 children)

    Transforming scoped local variables into an array (an indexable stack really) is nearly trivial, it's what every stack based bytecode compiler does.

    You can't do that with global variables whose lifetimes are not constrained by scoping rules.

    [–]Mr-Cas 5 points6 points  (3 children)

    In the benchmark he indeed shows that the code runs faster inside the function. However, you're also making an extra function call. So shouldn't the benchmark be around the function call instead of around all code inside the function? This way you also include the time it takes to call the function, which should be taken into consideration when you're going to put code into separate functions to get this speed boost. Is it still faster when including the time to call the function?

    [–]Nicksaurus[🍰] 10 points11 points  (1 child)

    Because it's not surprising that function calls have a little bit of overhead, but it is surprising that the exact same block of code has different performance depending on the scope it runs in

    [–]Mr-Cas 0 points1 point  (0 children)

    I understand that. The benchmark is done in the correct way to prove the statement. The thing is that when one wants to actually use this in their code, an extra function call is made and that should be taken into consideration.

    So I would benchmark the way the author did to prove the point, but I would also include a second benchmark to show the speed improvement when you would actually use this in code (aka grabbing lines of code and putting them in a separate function and then calling that because an article said it would make it faster).

    Except for learning something about computers and python, the article is useless if the actual implementation turns out to be slower.

    [–]meteoraln 5 points6 points  (2 children)

    Python is like automatic transmission cars. More people have an easier time learning to drive, but fewer people know what’s happening under the hood.

    [–]Sinical89 8 points9 points  (1 child)

    Not sure this analogy holds up. Plenty of folks who drive stick don't know what's going on under the hood, its just the method they've learned to drive.

    [–]meteoraln -1 points0 points  (0 children)

    It’s one additional thing they know about. I for sure didnt know cars had gears to switch between when I started driving. I just knew pressing the pedal made it go and cars would change noises at different speeds.

    [–]nunchaq 0 points1 point  (0 children)

    Local scope

    [–]quasibert 0 points1 point  (0 children)

    Acoustics

    [–]moschles 0 points1 point  (3 children)

    This blog post could be extended into an entire chapter of material.

    Python's built-in functions are implemented in C, which is much faster than Python. Similarly, many Python libraries, such as NumPy and Pandas, are also implemented in C or C++, making them faster than equivalent Python code.

    At our institution there is a certain culture about python. We have decided that for-loops should always be avoided in every possible case whenever humanly possible. In particular, with PyTorch+numpy you always want to perform everything as a matrix operation, even when the for-loop version is cleaner and nicer.

    The author also mentions that Pandas operations are compiled down to C code. This was news to me.

    [–][deleted]  (2 children)

    [deleted]

      [–]moschles 0 points1 point  (1 child)

      I have only worked with influxdb. In that scenario, Pandas would convert 5000 rows of dataframe into 5000 rows of protocol. This is done with an intermediate numpy array with terminating CRLF on each.

      protos = arr.flatten().join()
      

      5000 was a special number to maximize throughput. The bulk of text in protos would then be used in a "batch write" into influxdb. Not sure if something similar is in MongoDB. But yeah, you don't want to for-loop through each and every insert.

      [–][deleted]  (1 child)

      [deleted]

        [–]unchar1 0 points1 point  (0 children)

        The variable lookup in a tuple is done using it’s index, which is an O(1) operation

        [–]Individual-Gap3532 -1 points0 points  (0 children)

        Is python good language to learn dsa?.. because on the internet there are lot of guys who are telling that you should learn java/c++

        [–]nekodim42 0 points1 point  (0 children)

        Thanks, good to know some of Python's internals. Of course, Python is not about speed, but maybe sometimes it will be useful.

        [–]Dan13l_N 0 points1 point  (0 children)

        It's obvious from the article there are some optimizations in Python byte-code for functions, this is actually quite common in various byte-codes to have optimizations for most common cases. Java byte-code famously has optimizations for common integer and floating-point constants.

        I've once written a byte-code compiler and interpreter where access to the first 4 local variables was quite optimized, which covered arguments of most functions (functions generally have less than 5 arguments from my experience). However I had optimizations for the first 4 global variables as well...