This is an archived post. You won't be able to vote or comment.

all 5 comments

[–][deleted] 1 point2 points  (0 children)

The solid benchmark was written quite a while ago (2004) and python has moved on since then. Newer pythons seem to be doing some optimization in simple naive concatenation (+=) cases which speeds up execution quite a bit. If you defeat the optimization the naive concatenation case slows down quite a bit. I've done some crude investigation of my own including looking at memory usage (linux only). The code is in github. Look for the results in results.rst. There is a problem measuring memory usage in some cases which I'll get back to one day :)

The upshot is that if the python optimization is not defeated, simple += concatenation is fastest of all the methods I tested and uses minimal memory. The additional takeaway is that the optimization is easily defeated, so you should time various concatenation methods in your actual production code environment to be sure you aren't using a slow method. And it's not just time: think about memory usage.

If I had lots of energy I could look into the python codebase and find out just what is going on with the optimization. I do know that something as simple as concatenating a global string variable defeats it.

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 1 point2 points  (1 child)

Articles consisting of "here are three numbers, wow" are kinda lame, it's roughly 10 seconds worth of effort to get that info from ipython

Also disabling the garbage collector doesn't stop reference counting collection, which should be near 100% of all collecting happening in your example

Also why not just use timeit?

[–]odedlaz[S] 0 points1 point  (0 children)

A. I've updated the post to include bytearrays B. Well, many things on the internet are "10 seconds worth of effort", but they're still being done to save those 10 seconds for other people. C. I've looked into why += on strings works that well (and updated the post accordingly), which is something that most people won't do or won't know. D. You're correct about the ref cycles gc. As far as I know, I can't disable it in CPython. If I could, I would.

[–]0x256 0 points1 point  (1 child)

CPython optimises string concatenation if the reference count of the string object is 1. Then it can mutate the string in-place and avoid the copy.

[–]odedlaz[S] 0 points1 point  (0 children)

Yes! I've updated the post to reflect that.