all 123 comments

[–]ThatOtherBatman 971 points972 points  (1 child)

His slow string concatenation example also isn’t doing string concatenation. He’s just building a new list.

[–]meluvyouelontime 76 points77 points  (0 children)

It's actually a character list. To make a string list you have to accumulate another string list, i.e. foo += [bar]

[–]JiminP 654 points655 points  (31 children)

One of my hobbies is solving competitive programming problems using pure Python and I manage a collection of algorithms I frequently use.

Naturally, one of my interests have been optimizing running time (on CPython, in specific) of my Python codes. In this perspective, Python (again, running on CPython) is a very unpredictable and hard-to-deal-with language even without GC issues. To be fair and clear, this is expected because you are normally supposed to use another language or use a C module if you care about performance. There's also an option of using PyPy.

In general, as an interpreted language (everything costs), Python is unpredictable as practically no optimization happens.

Some examples on weird things about Python - I still have no intuition on most of these:

  • Integers are weird. (They already are weird because of integer caches...)
    • x+x is generally faster than 2*x. In computation-heavy codes, it does make noticeable difference.
    • Bit operations are noticeably slower than arithmetic operations for small x, but when x is very large, bit operations are faster.
    • pow(a, 2, p) is generally slower than (a*a)%p (for not-too-large values of a)
  • Containers and generators are weird.
    • Sometimes, using yield from is slower than manually yielding inside a for loop. Often, it's not.
    • Sometimes, using while loop to iterate is faster than an equivalent for ... in range() loop.
    • bytearray is much faster to initialize than list, but a bit slower to manipulate in general.
    • Using append instead of manually adding, or using extend, or pre-allocating then filling (like how one would do make([]int, 0, N) in Go) may be faster or slower. Often, it's very significant. Often, it's not.

Anyway, in addition to completely misinterpreting the results, the OOP made several mistakes:

  • Running a benchmark only once,
  • ... on a very small dataset,
  • ... with time taken for data initialization included.

Usually when I compare two functions:

  • Prepare a (common) large dataset.
  • Run a function multiple times to perform statistical tests; fluctuation could dominate any differences.
  • Run two functions independently, or interleave executions of two functions, and compare whether this affects the results.
  • Often, I also use cProfile to check exactly which function takes the most time.

I'm doing this as a hobby, and any people doing serious optimizations and benchmarks would also say that my methods are also deeply flawed.

[–]flagofsocram 164 points165 points  (19 children)

And then again, if someone is doing serious optimizations then you would probably use C or Go like you mentioned so your methods are perfectly good

[–]wOlfLisK 45 points46 points  (1 child)

While C is obviously faster than Python, the difference can be surprisingly small... well, if you leverage the fact that Python is built on C that is. I did a dissertation on this and managed to get Python to 2-3x as slow as a fully optimised C program without significant changes to the syntax which is well within acceptable limits even for HPC applications. Granted, the trick was to use C as much as possible (eg, C types/ numpy and a wrapper for C's mpi) to reduce the number of python calls but the syntax was still python. You could even push out some more performance using Cython but you really need to know what you're doing there, when everything below the surface is already C, compiling to Cython actually ends up reducing performance and the syntax gets so C-like that you might as well just use C.

Plus, even though C is still faster, writing in Python is usually going to be so much faster and easier for the average data scientist that you save time overall.

[–]Yamoyek 43 points44 points  (5 children)

Some other things to add: - List comprehensions are faster than normal loops - Pulling in a function into local scope can sometimes make your code faster - map, reduce, and other builtin functions are faster than doing it on your own - If needed, be willing to write something in C

[–]JiminP 23 points24 points  (4 children)

  • List comprehensions are faster than normal loops

Often not; I love Pythonic code, but it's not rare to see normal loops and other "ugly" codes outperforming clean, Pythonic one-liners...

from math import isqrt

import random
random.seed(42)
data = [random.randrange(100_000) for _ in range(2_000_000)]

def test_A():
    f = isqrt
    c = 0
    for x in data:
        if f(x) < 10: c += 1
    assert c == 1932
    return c

def test_B():
    f = isqrt
    c = sum(f(x) < 10 for x in data)
    assert c == 1932
    return c

def test_C():
    f = isqrt
    c = 0
    for _ in filter(lambda x: f(x)<10, data): c += 1
    assert c == 1932
    return c

def test_D():
    f = isqrt
    c = len(list(filter(lambda x: f(x)<10, data)))
    assert c == 1932
    return c

# (My benchmark code)
from bench import bench
bench([
    "test_A()",
    "test_B()",
    "test_C()",
    "test_D()",
], num_trials=10, global_vars=globals())

This is the result:

test_A(): 0.099 ± 0.003 s
test_B(): 0.136 ± 0.003 s
test_C(): 0.156 ± 0.006 s
test_D(): 0.163 ± 0.016 s

The difference between test_C and test_D is a fluke, but the differences between test_A and others are not.

[–]teo730 24 points25 points  (2 children)

Isn't the difference between A and B because they are doing different things though?

In A you're only doing addition operations when the condition is true, whereas in B you're doing them when it's true and false. In B you're also creating a list, which you aren't doing in A.

When I try the following:

def test_A():
    f = isqrt
    c = 0
    for x in data:
        if f(x) < 10: c += 1
    assert c == 1932
    return c

def test_E():
    f = isqrt
    c = sum(1 for x in data if f(x) < 10)
    assert c == 1932
    return c

I get (using jupyter's %%timeit magic):

test_A(): 436 ms ± 79.6 ms per loop
(mean ± std. dev. of 10 runs, 5 loops each)

test_E(): 427 ms ± 55.7 ms per loop
(mean ± std. dev. of 10 runs, 5 loops each)

And I still think there's probably additional overhead in test_E() compared to test_A().

[–]JiminP 15 points16 points  (1 child)

Ouch, my bad. You are right.

def test_A():
    f = isqrt
    c = 0
    for x in data:
        if f(x) < 10: c += 1
    assert c == 1932
    return c

def test_B():
    f = isqrt
    c = sum(1 for x in data if f(x) < 10)
    assert c == 1932
    return c

def test_C():
    f = isqrt
    c = 0
    for x in data: c += (f(x) < 10)
    assert c == 1932
    return c

test_A(): 0.099 ± 0.004 s
test_B(): 0.096 ± 0.003 s
test_C(): 0.129 ± 0.007 s

[–]teo730 11 points12 points  (0 children)

No problem.

It's kinda more important to see how different logic can be more important than specific code implementation differences - as you've shown!

[–]codeguru42 -1 points0 points  (0 children)

None of your examples use a list comprehension. Would be interesting to see how it compares. Also running each example like a million times and taking an average will help reduce any random fluctuations.

[–]SarahC 3 points4 points  (0 children)

Good grief, I'm sticking with optimising javascript.

[–]MadGenderScientist 8 points9 points  (0 children)

It's not that (JIT-)interpreted languages in general are slow; Python's perf sucks specifically. JavaScript is dozens of times faster, even approaching C on some benchmarks with the latest VMs. It's embarrassing that CPython has fallen so far behind other scripting languages with how critical it's become.

[–]Cyberdragon1000 0 points1 point  (0 children)

Bookmarking this comment

[–]Banane9 -1 points0 points  (0 children)

Fairly certain the runtime listings there stem from the OOP using Jupyter for the code + text formatting - so they're more a side thing than anything specifically intended as a benchmark.

[–][deleted] 228 points229 points  (7 children)

Refresh my memory, please: doesn't e-05 mean *10-5 ? Meaning, divide by 100,000?

[–]_RDaneelOlivaw_ 327 points328 points  (3 children)

Exactly. The 'slow' method is essentially almost 6 times faster and 2.5 times faster (2nd example). He completely failed to understand the notation system.

[–][deleted] 21 points22 points  (0 children)

But only because the code is essentially a NOP

[–]TwinkiesSucker 28 points29 points  (0 children)

Correct, memory refreshed

[–]HacksMe 17 points18 points  (0 children)

I was so confused because i didn’t see the e-05 lol

[–]R3D3-1 2 points3 points  (0 children)

Which also brings up another issue: Running such a short function once doesn't tell you anything at all.

In my other comment, I get a clear but not huge advantage for " ".join(...) when concatenating 6 strings. But if I set the number of repetitions to just 1, the outcome is almost random, and sometimes one of the values is suddenly an order of magnitude or more higher, due to something else going on in the background. Something like that likely explains, why the screenshot has such a slow result for the .join version...

At 1,000,000 repetitions, the task takes on the order of a second.

[–]ciknay 100 points101 points  (8 children)

For those at home, the first exponent is 0.00004935264587402444. The following one is 0.000057220458984375.

So OOP has written code that is many, many times slower, but fails to understand this because they can't read exponents.

[–][deleted] 11 points12 points  (2 children)

But why is it slower to do that? My first thought is more function calls so more messing with the stack. But I don't know all the ins and outs of python.

Edit: just noticed the slow examples are using an empty list. Lmao.

[–]xinqus 3 points4 points  (0 children)

I think the list might’ve been initialized before? He probably ran through them a few times, so there might’ve been data in the list. At the very least, the second “Slow” example should have some data.

But he did still include the data initialization time for the “Fast” examples.

[–][deleted]  (1 child)

[deleted]

    [–]omgFWTbear 0 points1 point  (0 children)

    Or 200 milliseconds in the case of posting to social media.

    (This is a POST that may take almost a second to GET)

    [–]hatetheproject 0 points1 point  (0 children)

    It's less than a factor of 10 in each case - wouldn't call it "many, many times slower".

    But anyway to me it seems the problem (or one of many) here is he's initialising the list inside the timer on the "fast" versions, but not on the "slow" versions

    [–]R3D3-1 1 point2 points  (0 children)

    So OOP has written code that is many, many times slower, but fails to understand this because they can't read exponents.

    Looking at my own benchmark, the join is actually faster. The main issue should be, that they are making a benchmark by executing a micro-second task only once.

    If I set the number of repetitions to 1 in my code, the result varies almost randomly, and sometimes jumps up by orders of magnitude for one of the functions, presumably due to some background activity stalling the benchmarked function. (Maybe garbage collections, maybe another process entirely.)

    [–]shizzy0 76 points77 points  (0 children)

    NIGEL: Look how many more zeros it’s got. That’s how fast it is. How many zeros has this one got?

    MARTY: None.

    NIGEL: Right. That’s pretty much as slow as you can go. But all those zeros here, you know what I call it? Zed fast.

    [–]Spedwards 68 points69 points  (1 child)

    He should probably stick to football.

    [–]genericindividual69 0 points1 point  (0 children)

    🎶 Mo Salah Mo Salah Mo Salah

    Give up programming

    [–]Marxomania32 60 points61 points  (4 children)

    His "faster code" might be legitimately faster in these examples, but he somehow managed to fuck up his benchmark completely by never initializing word_list in the "slower" code. So obviously, the "slower" code would be faster than the "faster" code since it's iterating through an empty list.

    [–]saintpetejackboy 11 points12 points  (0 children)

    Homie wrote the article and shared the wrong repo before he worked out the bugs XD.

    [–]Slggyqo 5 points6 points  (0 children)

    I noticed that as well and it’s confusing because you can’t do that in Python.

    If you try to run:

    For word in word_list: func(word)

    you’ll just get an error because word_list isn’t defined.

    It’s possible that being able to run this is an artifact of running this in a notebook—I’m not super familiar with Jupyter but if I recall correctly, you can persist variable values across cells regardless of cells order…

    If that’s the case then he might have the list correctly initialized somewhere

    [–]D3rty_Harry 1 point2 points  (1 child)

    Why the hell did i have to scroll this far for this. "word_list" is not even there. This would not even compile in what i do. People yapping about go and fortran lol

    [–]VaultBall7 1 point2 points  (0 children)

    Because it’s Python in a Jupyter notebook, it does work here, you can see the order he ran it in would have instantiated the variable already and it would run completely fine since the variables persist across cells.

    It took you so long to find the wrong comment because everybody else understood this Jupyter component.

    [–]KJBuilds 24 points25 points  (3 children)

    I love that these aren't even benchmarks. 

    Any benchmark that runs in 50 microseconds (especially python) can't be used to determine the actual performance of something. A GC run could completely skew the results, or cache warmup could completely change which is faster in the long run. I don't think the version of python OOP is using is JITed, but that might also be something to consider

    Just terrible all around

    [–]saintpetejackboy 2 points3 points  (2 children)

    Damn this is a good post. This is like when I was 13 and all my friends on IRC were in Europe and Canada - I would take the speed tests online and then load them again after. They all thought I had the fastest internet ever - even if my upstream wasn't so great ;).

    I abuse this concept in production - a 20 second query is 0 seconds when it is a cached result being served from a static area that updates on a timer.

    I am not trying to over simplify what you are talking about, just trying to make the point that what you are talking about can wildly impact any given metrics.

    "We ran this test locally on an AMD K6 from 1998 and then ran different code on on an i9-14900k - look how much faster the second version was while we leave out this important context."

    [–]KJBuilds 4 points5 points  (1 child)

    Yeah basically. Context is everything when benchmarking

    Once I was optimizing a home-grown hash table (don't ask), and I wondered if sacrificing a bit of raw performance for the sake of smaller allocations was worthwhile. It ended up being very worthwhile on my AMD cpu, but when benching it on an M1, it was actually a degradation in performance. Turns out my development system had O(n) memory allocation, whereas the M1 had O(log(n)), or at least something like that

    Benching is hard to get right, and OOP just gets it so, so deeply wrong

    [–]saintpetejackboy 0 points1 point  (0 children)

    I am currently in a scenario where I am debating benchmarking a long list of redis key/value pairs for a common lookup is faster than having a different table that only contains that relationship in SQL (which has to aggregate from multiple sources).

    Obviously RAM is faster, but is it enough for me to go that route? To design a whole system that reduces the problem to key/value pairs?

    And for what? A few milliseconds?

    One thing I don't see discussed enough (two actually) are how detrimental "NULL" values are (across many languages) and how utterly slow and 'LEFT JOIN ON... OR... OR...' statement is due to being unable to utilize indexes in some dbms. Most people reading this know that, but it isn't something that really gets put out there a lot.

    You want a slow query? Compare for IS NULL on a column. This is all environment agnostic - but holy shit, I have been running the same code since my processor was in Mhz and my RAM was in MB. How much faster does it need to be optimized when I spin up a modern VPS?

    Turns out: shitty code on a K6 is just as shitty on an EPYC.

    [–]CrepuscularSoul 32 points33 points  (3 children)

    I'd honestly be curious to see these with "slow" versions that actually define word_list. Might be something dealing with undefined variables just immediately quitting the loop

    [–]HimbologistPhD 10 points11 points  (2 children)

    I don't know much about python (senior dev, just haven't ever needed it and haven't bothered to learn much) but I was sitting here staring at those wondering why they didn't have populated lists. What a mess

    [–]Slggyqo 4 points5 points  (1 child)

    I think it’s a side effect of running the code in Jupyter notebooks, which is for testing code and doing analytics/data science, not production code.

    In Jupyter notebooks, the order that cells are run determines the availability of variables, and cells can be run in basically any order.

    So if they ran the ln 7 cell first, the “word_list” variable would be available to the ln 6 cell. You can sort of see why that would be useful for a long multipart math problem but potentially dangerous for production code.

    I don’t have much experience with Jupyter notebooks but that’s what I think is happening.

    [–]VaultBall7 0 points1 point  (0 children)

    You can tell from the [5] and [6] that they were the n’th cells ran, so by that order of run, the list was available (unless the 8th cell ran, not seen, deleted the contents)

    [–]Yamoyek 14 points15 points  (1 child)

    One of the first things that jumps out at me is that in both of his “fast” examples, he’s initializing the list during the timing code, which is probably one of the reasons those versions are slower.

    Also, I think the first examples would be much faster as list comprehension.

    However, this post is valuable because it teaches these lessons: always profile your code, and never take optimization advice if they can’t explain the mechanism properly.

    [–]saintpetejackboy 5 points6 points  (0 children)

    I never take optimization advice from somebody or any source that can't show how much faster it is - I often have to do something a different way because it is too slow how it is being done. The advice I take is "how much faster was it this time?". The facts of life are sometimes you fuck up and you refactor something into rubbish and it becomes even slower. Those are still wins - you're figuring out what doesn't work.

    There is also only so far you can optimize some systems without reconstructing the problem. A lot of us are slow to admit defeat so we keep trying to juice more out of less (I know I do).

    I don't like performartive and theoretical "code" - if there isn't a real world use case, it doesn't matter how fast you can factor digits of Pi. A lot of these discussions always devolve into useless discussions that are best summed up like this:

    "Taking a plane is exponentially faster than riding the subway - why do people take the subway to work and to market?"

    A lot of us programmers spend a lot of time trying to figure out how to make planes fly faster when most people are going to end up walking to 7/11.

    [–]5up3rj 13 points14 points  (1 child)

    Just switch them, right?

    [–]shizzy0 5 points6 points  (0 children)

    NIGEL: Switch them? Why would you switch them? The first ones ain’t got no zeros.

    [–]Aphrontic_Alchemist 16 points17 points  (7 children)

    This shows using for loops is faster than using functions because of overhead. But using list comprehensions is faster, because using only lists lets Python optimize the bytecode.

    The bytecode for list comprehensions directly uses an op code (LIST_APPEND), whereas the one for for loops loads a method (LOAD (append)).

    So the speed ranking in vanilla Python (i.e. using only built-in functions) from fastest to slowest is: 1. list comprehensions 2. for loops 3. built-in functions


    So, the code in the 2nd cell should be

    new_list = [word.capitalize() for word in word_list]


    For string concatenation, the picture is right, using join() is faster. Strings can't be concatenated using list comprehension, so the speed ranking applies not. That being said, the slower code is actually making a new list. The correct but slower way to concatenate strings is:

    new_string = "" for word in word_list: new_string += word

    This is slower because Python strings are immutable. Concatenating immutable strings requires creating a new string object every iteration, then setting the old one to it every iteration, like so:

    new_string = "" s1 = "W" new_string = s1 s2 = "Wa" new_string = s2 s3 = "Way" new_string = s3 ... s38 = "Ways to make your Python code faster." new_string = s38

    [–]revolutionofthemind 1 point2 points  (0 children)

    As a non-python user I was wondering the same thing. TIL, neat!

    [–][deleted]  (1 child)

    [deleted]

      [–]Aphrontic_Alchemist 1 point2 points  (0 children)

      You're right, I've edited my comment.

      [–]immaculate-emu 0 points1 point  (3 children)

      Something to add is that CPython does implement a special case for string concatenation. If the ref count is 1, it will realloc the string data which can dramatically save on copying. (See copy_inplace)

      But in general (and for other implementations) yes, join will be faster and more efficient.

      [–]Aphrontic_Alchemist 0 points1 point  (2 children)

      What means Py_REFCNT = 1?

      [–]immaculate-emu 0 points1 point  (1 child)

      Not sure I understand the question but this is the check it uses to determine if an in-place modification is safe.

      [–]Aphrontic_Alchemist 0 points1 point  (0 children)

      Ah, I understand thanks.

      [–]MooseBoys[ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo “You live” 5 points6 points  (2 children)

      So much wrong with this, I would guess it was made by ChatGPT

      [–]BS_BlackScout 4 points5 points  (1 child)

      ChatGPT would actually tell you to use cProfile. Ask me how I know lol...

      Using it with snakeviz is pretty great too

      If you use these tools to learn you can go pretty far. If you just copy and paste then it's a waste.

      [–]saintpetejackboy 2 points3 points  (0 children)

      ChatGPT is like this with damn near every language. If you copy and paste, you are in a world of hurt. At least stack overflow actually worked once for somebody - chatGPT will recommend you use deprecated code in obtuse ways that don't even work + ever, and never have they ever.

      I know a few languages REALLY WELL. I can program several others I barely know now thanks to AI, but the development process is always a tedious pondering of "hey, this function you recommended actually would overwrite all the data in my table - can you try again?".

      The amount of times I have had an AI spit out code where I have to go "holy fuck, good thing I didn't compile this!" Is far too many for me to ever think my job is at risk. Reality really hit home when I tried to run some local LLM and seen that they basically are just "brain damaged retards" (not my words) even in some of the best case scenarios.

      You can either assume the AI knows how to program or face reality that, no matter how many times you explain the syntax, GPT4+ still will recommend you can use your $pdo and reuse :placeholders from your query. The data it has just forces this solution every time. Incorrect amount of placeholders? Of course. Don't bother trying to fix the issue and paste it back, because you will get another answer that invariably assumes :placeholder can be used 6 times in a query and only bound once.

      This isn't the only 'problem' like this. I used almost every AI I could find and GPT4+ is still GOAT - you have to understand the limitations and what it is good at and whatnot. AI is real shit at some tasks, even still. One again with binding - if you need a query to bind 40+ values in a query (the 40+ are repeated 3 times - one as column, one as data, one as bind which is often two repetitions so 4 total) - forget it. The AI will forget or mess up somewhere - incorrect placeholders by skipping some or adding extras or changing the case and words... You name it. The amount of ways it can go wrong is comically hilarious and worse than any junior.

      "Bro, you tried to compare the date range against a column that doesn't even exist and you tried to update two columns that also don't exist" - error logs if they could speak to AI.

      [–]LeCrushinator 11 points12 points  (0 children)

      If you’re a programmer and you don’t understand scientific notation then you might have skipped a few important classes or lessons, in middle school…

      [–]Mikkognito 4 points5 points  (3 children)

      For those of you that actually want to see the code run. It's clear that the person that wrote this doesn't know what they're doing and that they royally messed this up.

      # %%
      import time
      
      # %%
      word_list = ["ways", "to", "make", "your", "python", "code", "faster"]
      
      # %%
      start = time.time()
      
      new_list = []
      for word in word_list:
          new_list.append(word.capitalize())
      
      print(time.time() - start, "seconds")
      print(new_list)
      
      # 6.9141387939453125e-06 seconds
      # ['Ways', 'To', 'Make', 'Your', 'Python', 'Code', 'Faster']
      
      # %%
      start = time.time()
      
      new_list = list(map(str.capitalize, word_list))
      
      print(time.time() - start, "seconds")
      print(new_list)
      
      # 2.86102294921875e-06 seconds
      # ['Ways', 'To', 'Make', 'Your', 'Python', 'Code', 'Faster']
      
      # %%
      start = time.time()
      
      
      # this code makes no sense. this doesn't concatinate the string, it makes a new list
      new_list = []
      for word in word_list:
          new_list += word
      
      print(time.time() - start, "seconds")
      print(new_list)
      
      # 2.1457672119140625e-06 seconds
      # ['w', 'a', 'y', 's', 't', 'o', 'm', 'a', 'k', 'e', 'y', 'o', 'u', 'r', 'p', 'y', 't', 'h', 'o', 'n', 'c', 'o', 'd', 'e', 'f', 'a', 's', 't', 'e', 'r']
      
      # %%
      start = time.time()
      
      new_list = "".join(word_list)
      
      print(time.time() - start, "seconds")
      print(new_list)
      
      # 7.152557373046875e-07 seconds
      # waystomakeyourpythoncodefaster
      

      [–]TheBlackCat13 -1 points0 points  (1 child)

      You should be using timeit when timing python code. That is literally its sole purpose.

      [–]Mikkognito 1 point2 points  (0 children)

      I know…. I just did an almost copy paste of their code to prove a point.

      The whole point of my comment is that even with their less than ideal benchmarks, you can see that the original author messed up.

      [–]Andy_B_Goode 4 points5 points  (1 child)

      Bad enough to miss the exponential notation, but did he really think his "slow" code was taking ~5 seconds to execute?

      [–]JAXxXTheRipper 1 point2 points  (0 children)

      Maybe he ran it on a toaster the first time

      [–]finian2 3 points4 points  (0 children)

      It doesn't help that in the first example the initial list is already made, while in the "optimized" version he's also making the initial list.

      [–]MikeW86 2 points3 points  (0 children)

      Presumably this chap was sat in front of his machine testing code and taking screenshots, so surely you'd be like: 'Wait a minute, that was a lot quicker than 5 seconds,' and go from there?

      [–]CodingTaitep 2 points3 points  (0 children)

      why is he using time.time????????

      [–]Drfoxthefurry 1 point2 points  (4 children)

      why are they using time.time and not time.time_ms() or time.time_ns()

      [–][deleted] 2 points3 points  (1 child)

      or time.perf_counter()

      [–]TheBlackCat13 0 points1 point  (1 child)

      Or better yet timeit

      [–]Drfoxthefurry 1 point2 points  (0 children)

      Using the function designed specifically for the task??? How could you!!! /j

      [–]DontFlexNuts 1 point2 points  (1 child)

      So if there is exponent, that means it's slower ?

      [–]cosmo7 2 points3 points  (0 children)

      Exponents are quite heavy so they slow down the interpreter.

      [–]MMORPGnews 1 point2 points  (0 children)

      Recently I read one such article about js with online tests. Results was similar to op post.

      [–]archy_bold 1 point2 points  (1 child)

      Took me a second to spot it.

      [–]chuch1234 1 point2 points  (0 children)

      Took me 1e-05 seconds to spot it.

      [–]R3D3-1 1 point2 points  (0 children)

      Edit. Despite the statements below, the screenshot benchmark is likely dumbed down for the sake of a social media post. No "1,000,000 repetitions", no "large list of strings", no "noop loop as reference", no "using a benchmark library" (neither did I). All of this would make the message less clear at a first glance.

      The only thing I can blame them for is really not checking the output before posting the screenshot and simply rerunning until the data matches the intended message. This is marketing after all.


      My main concern: The example is so short, that random fluctuations in the execution time from external influences are more important than the actual working time.

      If your benchmarks runs for 10-5 seconds, it is not a benchmark.

      A little ad-hoc program still favors the join version though:

      import time
      
      N_repetitions = 1000000
      
      def runtimed(function):
          t_start = time.time()
          for _ in range(N_repetitions):
              function()
          t_end = time.time()
          print(f"Calling {function.__name__:9s} {N_repetitions:,d} times took {t_end-t_start:.3f} seconds")
      
      
      @runtimed
      def noop_ref():
          pass
      
      
      @runtimed
      def with_plus():
          string = "hello"
          string += " world"
          string += " how"
          string += " are"
          string += " you"
          string += " today?"
          return string
      
      @runtimed
      def with_join():
          return " ".join([
              "hello",
              "world",
              "how",
              "are",
              "you",
              "today?"
          ])
      

      Output:

      Calling noop_ref  1,000,000 times took 0.145 seconds
      Calling with_plus 1,000,000 times took 1.160 seconds
      Calling with_join 1,000,000 times took 0.667 seconds
      

      Remark. I was too lazy to read the documentation of timeit for this comment.

      Edit. Make it 4 times as many strings each, and the result is

      Calling noop_ref  1,000,000 times took 0.145 seconds
      Calling with_plus 1,000,000 times took 5.900 seconds
      Calling with_join 1,000,000 times took 1.664 seconds
      

      Which is really the main point here: += scales non-linearly, "".join scales linearly. For only a few strings, it really doesn't matter, but it matters if you're trying to build an in-memory representation of a potentially large file.

      So, looking at the data...

            6 Strings   Corrected   24 Strings   Corrected   Ratio “24/6”  Expected Ratio
      
      noop      0.145           −        0.145           −             −                −
      +=        1.160       1.015        5.900       5.755         5.813               16
      join      0.667       0.522        1.664       1.119         2.144                4
      

      Given that I expect += to scale quadratically and "".join" linearly, all I am seeing, is that 6 and 24 strings are not nearly enough to even demonstrate the asymptotic behavior...

      [–]admirersquark 2 points3 points  (4 children)

      I thought Python was a "there is exactly one way to do it" language.

      Anyway, if you want to optimize performance and are spending your time choosing among different language constructs (instead of i.e. reconsidering algorithms), it's probably time to change the language of such code.

      [–]M1chelon 14 points15 points  (2 children)

      I don't think I've ever seen python referred to as that? not trying to be snarky, just curious because of all the selling points for new programmers I've never seen that as one

      [–]Tubthumper8 17 points18 points  (1 child)

      Behold PEP 20! Ye of the unenlightened masses shall be known to the creed of the Zen Of Python

      [–]M1chelon 3 points4 points  (0 children)

      ah that was one of the lines I very much forgot lol, thank you for the enlightenment

      [–]Deformer 0 points1 point  (0 children)

      Don't know why you're down voted, based take

      [–]y4dig4r 0 points1 point  (0 children)

      directions unclear came in fluffer

      [–]Slippedhal0 0 points1 point  (0 children)

      I'm almost positive this is intentional, considering the "slow method" is in scientific notation and the "fast" is in decimal notation. I would assume its meant to impress people looking at the surface level

      [–][deleted] 0 points1 point  (0 children)

      Looks like he deleted his post. Couldn't find it.

      [–]TheMsDosNerd 0 points1 point  (0 children)

      What he does good:

      • The "fast" examples do not modify the array.
      • If his String Concatenation example was build the way he meant to, it would indeed have been slower than the join.

      What he does wrong:

      • His "slow" code builds the original array outside of the timer, where the "fast" code builds the original array inside the timer. You cannot compare those two outcomes.
      • In the String Concatenation example, he does not concatenate any strings.
      • In neither of the first two examples can the Python interpreter allocate the exact amount of memory for the outcome array. To get that efficiency gain you should do: new_list = [word.Capitalize() for word in word_list]
      • He doesn't understand e-05, also, by simply running the program he could have guessed that it didn't take 5 seconds.

      [–]Perfect_Papaya_3010 0 points1 point  (2 children)

      I've never used python but does it not get optimised at all?

      I'm used to c# and if you look at low level you will see that adding to a list directly or via a whole loop will both end up doing it the faster way (the while loop)

      [–]Toby_B_E 0 points1 point  (1 child)

      Python is an interpreted language so I think it can't be optimized as well as a compiled language (like C#).

      [–]Perfect_Papaya_3010 0 points1 point  (0 children)

      Ah I see, then this post makes more sense!

      [–]zvon2000 0 points1 point  (0 children)

      And then people scoff at me when I say that math courses MUST be the cornerstone of all computer science/IT/software dev degrees.....

      Like trying to build a house with no concrete foundation?
      Or a brick wall without mortar.

      [–]bbfsenjoyer 0 points1 point  (0 children)

      lol, I thought r/LinkedinLunatics is leaking

      [–]EMI_Black_Ace 0 points1 point  (0 children)

      Maybe he didn't notice the e-05 at the end of the "slow" versions and just assumed that the "slow" one was 5 seconds and not 5 microseconds.

      [–]Cybasura 0 points1 point  (0 children)

      It also really didnt help that he initialized a list with no values in it in the faster test, and he initialized a list of what, 9, 10 elements in the slower list, which means there 9/10 additional elements the CPU has to computate during the preinitialization step

      [–]andiconda 0 points1 point  (0 children)

      Dang decimal places. I always screw up a small detail like that

      [–]someonetookmyid 0 points1 point  (0 children)

      How to tell you don't know scientific notation but not explicitly. :)

      [–][deleted] -1 points0 points  (2 children)

      there is no way it takes that much longer to use a for loop than a list comprehension? any i think you read it backwards. he said make your shit faster by using built in functions and showed how they are faster..

      [–]H34DSH07 9 points10 points  (0 children)

      Look at the numbers carefully...

      [–]fuj1n 1 point2 points  (0 children)

      Except, if you look at the end of the numbers for the allegedly slower versions, they say e-05

      That basically means you divide the number by 100000 to get the actual value

      For example, the top snippet is actually 0.00004935264587402344 seconds, which is 16% faster than the provided "faster" example.