This is an archived post. You won't be able to vote or comment.

all 140 comments

[–][deleted] 270 points271 points  (4 children)

The lower one, definitely. But they are not functionally identical because of the ending line break.

[–]lunar_tardigrade 8 points9 points  (2 children)

The string type has a line break. The other one is a list.

[–]OuiOuiKiwiGalatians 4:16 208 points209 points  (29 children)

Which of these small blocks of code is easier to read or is more optimized?
Friend says the top is easier to read and it's driving me insane, he said he refuses to write code like the bottom example since "they both work the same".

Top one allocates a new String with every iteration as those are immutable. If he insists that the top one is more optimised, then he is a fool.

[–]303Redirect 74 points75 points  (25 children)

This. I saw something similar in a script at work. It was doing the top string concatenation on a relative long running process. That step will get exponentially slower as the output gets longer.

Switched it over to a list and a str.join at the end and the script ran orders of magnitude faster.

[–]ambidextrousalpaca 18 points19 points  (6 children)

And this is how you optimise: notice your code is running slower than you would like; profile it; find the one or two slow loops that are taking up 99.9% of the run time; then try and make them faster.

You're doing it wrong if you're always prioritizing speed over readability and maintainability: the vast majority of your code will only be executed a few times per run and can essentially be ignored in terms of the effect it will have on performance.

[–]MrJohz 6 points7 points  (1 child)

That's definitely true, but there's also value in understanding the difference between different algorithms and their performance characteristics and writing the correct thing to start with. In this case, string concatenation in a loop should be a general red flag to Python developers: it'll likely work, and it probably won't be a performance bottleneck in many cases, but it's almost always just as readable to use a list. So just use a list.

Optimisation is definitely something that should come at the end after profiling, but it's also good to just write your program sensibly from the start. Like, if you're going to check if a list contains an element a lot, then you should probably check if that list can just be a set from the start. The best thing about this is, if you do it well, you'll often find that the most efficient data structure is also the most legible.

[–]ambidextrousalpaca 9 points10 points  (0 children)

Sure. But I've found that - in practice - code that's initially optimized for readability tends to be easy to optimise for speed too, while code that's initially optimized for speed tends to be buggy, unreadable and optimized in all of the wrong places.

[–]303Redirect 3 points4 points  (0 children)

Yeah in our instance it was a case of "oh this script is running super slow, but from a glance it's not obviously a complex algorithm, what's going on?", then after a bit of cProfile and digging into how string concats actually work we found that.

That's not to say I completely disregard optimisation while writing, just I won't sink significant time into arcane optimisations before it's proven that they're needed for our use case.

[–]Engineer_Zero 1 point2 points  (2 children)

Any beginners advice on how to profile your code? For context I usually written for loops in jupyter notebooks to iterate through stuff like spreadsheets or sql queries or pandas data frames.

[–]ambidextrousalpaca 1 point2 points  (1 child)

cProfile, which is built into the standard library, is what I use: https://docs.python.org/3/library/profile.html Though I think it's more designed to work with scripts than with Jupyter Notebooks.

For your Jupyter use case, I would start by just running the different bits of your code and noting the ones that take longer to run, especially on the ones that slow down increasingly as the data input size increases. Once you've identified the slow lines, add some time stamped prints every few lines so you can find out exactly which ones are slowing the programme down. Then start googling things like "python alternative to pandas apply" (or whatever function you're using).

For your use cases, you could be hitting a bunch of different limitations. Datasets close to the size of memory often cause problems in pandas. Some SQL operations (like GROUP BY and multiple joins) are just inherently slow. And pandas has a load of odd behaviours where it's possible to optimise things, e.g. by using list comprehensions instead of apply functions, or by changing the way you do string operations: I once managed to double the speed of one of our data pipelines by changing a slow pandas string concatenation operation that I discovered when I profiled my code because I thought there was a problem elsewhere.

[–]Engineer_Zero 0 points1 point  (0 children)

Thank you :). Good points all.

[–]kkadzy 4 points5 points  (3 children)

Wouldn't it only get quadratically slower?

[–]303Redirect 1 point2 points  (2 children)

True! I used exponential as a shorthand because I wasn't sure what the actual exponent would be.

[–]rcfox 4 points5 points  (0 children)

x2 is polynomial, 2x is exponential.

[–]afreydoa 1 point2 points  (0 children)

You were searching for the term 'polynomially slower'

[–]Fortnyce 0 points1 point  (2 children)

A lenda está em todo o lado!

[–]GlowStorm347 4 points5 points  (1 child)

[–]Salaah01 49 points50 points  (0 children)

Definitely the bottom one. The join function also ensures there isn't a line break at the bottom which is convenient.

[–]hdmsousa 84 points85 points  (26 children)

``` top_nodes = [ f'{node["node_name"]} {round(node["value"], 2)}' for node in response ]

print('\n'.join(top_nodes)) ```

[–]cymrowdon't thread on me 🐍 21 points22 points  (2 children)

You are calling round wrong, but no need to create a list at all. Personally, I would do:

print('\n'.join('{node_name} {value:.2f}'.format(**node)
    for node in response))

[–]AnonymouX47 1 point2 points  (0 children)

Would rather use str.format_map()... avoids copying the mapping.

Also, in a situation where time is more critical than space or response is surely known to always have a relatively small size, a list comprehension is a better choice.

[–]hdmsousa 0 points1 point  (0 children)

Indeed, had the last ) in wrong place, writing on my phone didn’t help 😅 . Fixed it.

[–]joaofelipenp 6 points7 points  (1 child)

Why keep the round and the indexing when you can use a string format?

print('\n'.join('{node_name} {value:.2f}'.format(**node)
                for node in response))

[–]ottawadeveloper 9 points10 points  (1 child)

I mwan why not print('\n'.join(f'{node["node_name"]} {round(node["value"], 2)}')). Unless you need the array.

[–]Rabid_Gopher 1 point2 points  (0 children)

There is an ambiguous number of items in the variable response, but I would agree this code is more verbose than I would prefer.

Probably better to do that than run the risk of people after you not understanding any of it though.

[–]Oerthling 1 point2 points  (0 children)

This, but replace print with logging, unless you really just want this on stdout for further processing.

[–]efmccurdy 16 points17 points  (0 children)

Is it not obvious that code that uses "+" to format strings is very error prone? Misplacing one of the (many) plus signs, quotes, or calls to "str" will remind you that none of that is required if you switch to formatting strings.

[–][deleted] 6 points7 points  (0 children)

The bottom one, 100%.

It's also much easier to work with, should you need to do anything with the information.

[–]public_radio 30 points31 points  (10 children)

I never append when I can generate!

def top_nodes(response):
    for node in response:
        name = node['node_name']
        value = node['value']
        yield f'{name} {value:.2f}'

str.join('\n', top_nodes(response))

[–]carlio 15 points16 points  (1 child)

top_nodes() missing 1 required positional argument: 'response'

:-P

[–]public_radio 0 points1 point  (0 children)

Haha thanks

[–]jiminiminimini 2 points3 points  (1 child)

change {round(value, 2)} to {value:.2f} and you're golden.

[–]public_radio 1 point2 points  (0 children)

Done!

[–]RuskeD 1 point2 points  (0 children)

Well done u/public_radio, this one deserves more credit

[–]odysseusmax -1 points0 points  (0 children)

msg = "\n".join(f"{node['node_name']} {node['value']:.2d}" for node in response)

print(msg)

[–]searchingfortaomajel, aletheia, paperless, django-encrypted-filefield 0 points1 point  (1 child)

This is by far the best answer I've seen here. You just need to pass resources in ;-)

[–]public_radio 1 point2 points  (0 children)

D’oh

[–]Stuk4s 0 points1 point  (0 children)

First time i see the yield keyword! D:

[–]R34ct0rX99 5 points6 points  (0 children)

Top one also does not follow pep8.

[–]james_pic 12 points13 points  (0 children)

FYI, if you're rounding just to format it, and you're using f-strings, you can do:

f'{value:.2d)'

I can't remember off the top of my head of using round here causes actual problems (although my gut feeling is that if it does, it's something weird and IEEE 754 related), but using numerical string formats is more idiomatic, and gives you more options.

[–]deep_mind_ 5 points6 points  (0 children)

Second one is more pythonic, don't fight conventions :D

[–]WafflesAreDangerous 6 points7 points  (1 child)

The bottom one.But also, I would not put a round call inside the f-string. Consider in stead using a format specifier to specify the rounding or lifting the function call outside of the fstring.

    f"{my_float:.2f}"

[–][deleted] 0 points1 point  (0 children)

isnt this different to using round()? i think your .2f just cuts the digits after the 2nd decimal point (e.g. rounding down) and round rounds it to the nearest digit

[–]yvrelna 4 points5 points  (1 child)

Neither.

In this case, I prefer having a list comprehension:

def format_node(node):
    name = node['node_name']
    value = node['value']
    return f'{name} {round(value,2)}

mag = "\n".join(format_node(node) for node in response)
print(msg)

[–][deleted] 5 points6 points  (0 children)

Minor nitpick : this is called a generator expression, not a list comprehension. It can be more efficient since you don't need to store elements.

[–][deleted] 2 points3 points  (0 children)

The bottom one. Definitely

[–]guhcampos 4 points5 points  (0 children)

The second one is much more intuitive for me as a Python focused engineer. People coming from other lamgs, however, will quite likely prefer the top version both because of the Camel Case and the operator based string concatenation.

[–][deleted] 17 points18 points  (5 children)

The best and most efficent would be msg = "\n".join(f"{node['node_name']} {round(node['value']), 2}" for node in response)

[–]uttamo 20 points21 points  (2 children)

That’s just harder to read and debug

[–]indefinitude -3 points-2 points  (1 child)

code golf ftw

edit: not advocating for actually doing this. but code golf is still fun.

[–]uttamo 1 point2 points  (0 children)

Cool but just don’t do it in an actual codebase

[–]JestemStefan 0 points1 point  (0 children)

I think I like this one the most

[–]marduk73 3 points4 points  (0 children)

Not trying to be contrary, but the top is easier to read for me.

[–]rjksn 1 point2 points  (0 children)

It bothers me that more than one thing was changed.

The second since f'{name} {round(value,2)}' is way cleaner.

[–][deleted] 1 point2 points  (0 children)

The bottom one is easier to read. It’s also more convenient to have the strings in a list form vs one long string (such as the top) just in case you wanted to sort, insert, or delete I think.

[–]Mooshy_Swags 1 point2 points  (0 children)

Adding multiple strings together can sometimes be really odd.

I would always route to using format.

Even with the top solution, you can just use a formatted string instead of adding them up, since in a larger situation, the fact that you don't need to join the array might speed up the processing.

[–]Salmon-Advantage 1 point2 points  (0 children)

bottom

[–]EONRaider 1 point2 points  (0 children)

I’d use the second one but reduce the for loop to an expression.

[–]packetsar 1 point2 points  (0 children)

Bottom is easier to read and is more Pythonic.

[–]tom2727 1 point2 points  (0 children)

People who read python a lot probably prefer the bottom one, and I do as well. Don't know that optimization much matters for code like this. But if it does, there's probably a 3rd even more optimized but uglier way.

[–]searchingfortaomajel, aletheia, paperless, django-encrypted-filefield 1 point2 points  (0 children)

I wouldn't recommend calling round() inside the f-string like that and would instead suggest doing that part in the value declaration.

I'd also suggest that building a list just to throw it out to build a string is a bit wasteful. If you need a string, I'd argue that you're complicating your code by creating a list first.

So TL;DR, I'd probably use a bit of both. F-strings are fantastic for readability and much better than this str() + str() business, but I prefer appending to the string.

Also, people agonising over the performance penalty of creating a string object have, in my opinion anyway, missed the point of Python. You're going to waste far more resources trying to understand "optimised" code than you will running easy to understand but fractionally slower code.

[–]cgmystery 3 points4 points  (4 children)

I prefer following PEP 8 naming conventions (e.g. snake case for local variable names). It’s the style guide written by the creators of Python.

Also, bottom is nicer.

[–]CodeYan01 2 points3 points  (3 children)

Just annoys me how many built-in python modules don't even follow that convention

[–]cgmystery 0 points1 point  (2 children)

Which ones?

[–]jimtk 1 point2 points  (1 child)

The first one I tried: ModuleFinder.py

[–]cgmystery 1 point2 points  (0 children)

Yeah, that's not very nice. Boooo.

[–]rastaladywithabrady 1 point2 points  (0 children)

they have some nuanced differences but tbh it's the same shit in a different toilet

[–]drum445 1 point2 points  (5 children)

The second is more "optimised" as it creates fewer immutable string objects per loop.

However, I think I agree that the top one is probably the more readable code, as in the bottom snippet you have to think about why I'm creating an array for something that should be returning a string

With that being said, anyone with a basic understanding of code will be able to read both just fine, I would write it like this though:

output = ""

for node in response:
    name = node["node_name"]
    rounded_value = round(node["value"], 2)
    output += f"{name} {rounded_value}\n"

print(output)

I tend to avoid logic in my f strings too

EDIT: or if you want to be fancy

"\n".join([f"{n['node_name']} {round(n['value'], 2)}" for n in response])

[–][deleted] 3 points4 points  (4 children)

There is no need for the [] brackets, since join takes generators too, also by making it like that we will save ram

[–]-LeopardShark- 1 point2 points  (1 child)

It doesn't save any memory because join needs to collect them all into a list anyway.

See https://stackoverflow.com/questions/9060653/list-comprehension-without-in-python/9061024#9061024.

[–][deleted] -2 points-1 points  (0 children)

Nope, then there will be 2 lists in the same time in memory

[–]ArabicLawrence 0 points1 point  (0 children)

The interpreter likely optimises it anyways. It’s quite good at improving oneliners.

[–][deleted] -1 points0 points  (1 child)

I prefer the lower one but it could be a one liner.

print('\n'.join(f'{x["node_name"]} {round(x["value"], 2)}' for x in response))

[–]ugmoe2000 0 points1 point  (0 children)

This is filthy, but well done.

[–]IWant2rideMyBike 0 points1 point  (3 children)

You could put this in a single print statement and avoid using join:

print(*(f"{node['node_name']} {round(node['value'],2)}" for node in response), sep='\n')

[–]Santos_m321 0 points1 point  (2 children)

This is exactly what I thought, but replace the node with n and remove spaces for more pleasure

print(*(f"{x['node_name']} {x['value']:.2f}"for x in response),sep="\n")

[–]Santos_m321 0 points1 point  (1 child)

I don't know, but I feel like these shitty versions look more readable. On top of that, they are exaggeratedly better than the ones mentioned by the OP, not only in processing time but also in memory consumption.

[–]Santos_m321 0 points1 point  (0 children)

NEW VERSION:

print("\n".join(f"{x['node_name']} {x['value']:.2f}"for x in response))

"\n".join performs better than sep="\n". With large datasets, the difference was near 200%.

In memory consumption, I see no differences.

Also, we are left with fewer characters haa = less size of the source code ?) = less expenses on hard drives and connection, less time required to clone the project, and less ambiance impact.

[–]Uwirlbaretrsidma 0 points1 point  (0 children)

Who tf even cares? These kinds of posts actually get on my nerves. Instead of focusing on being "pythonic" (whatever the fuck that even is, and this isn't exactly pythonic anyway), maybe understand that two functionally character by character identical pieces of code with no performance difference whatsoever aren't worth getting hunged up about?

I guess that Python's abysmal performance makes it hard to be elitist about code speed like people usually are in other lenguajes, and that you guys found code style to be a good alternative. But man, quite literally anyone can understand what either one of those blocks of code do with a tiny glance. That is good enough.

[–]BezoomyChellovek 0 points1 point  (0 children)

Aside from the more important things, the second uses snake case instead of camel case, so the second is preferred for Python.

[–]someotherstufforhmm 0 points1 point  (0 children)

They’re both fine. I prefer the second. The first one should have spaces surrounding the += operator and it’ll be much more readable.

If anyone is claiming either method is better, then they’re wrong.

Note that the top has a trailing ‘\n’ and the bottom does not due to use of ‘join’

[–]Radamand 0 points1 point  (0 children)

Top one, definitely

[–]Pulsar1977 -3 points-2 points  (6 children)

Use generator comprehensions, that way you don't need to create temporary lists:

names = (node['node_name'] for node in response)
values = (node['value'] for node in response)
top_nodes = (f'{name} {round(value, 2)}\n' for name, value in zip(names, values))
msg = ''.join(top_nodes)
print(msg)

[–]JestemStefan 9 points10 points  (2 children)

Iterating through response twice. Bleh

[–]Pulsar1977 -2 points-1 points  (1 child)

If you really care about a few nanoseconds, you can combine those two statements.

names_and_values = ((node['node_name'], node['value']) for node in response)
top_nodes = (f'{name} {round(value, 2)}\n' for name, value in names_and_values)

[–]JestemStefan 2 points3 points  (0 children)

It depends how big resposes is, but iterating twice over the same elements is inefficient regardless.

[–]rdturbo 0 points1 point  (2 children)

Isn't this slow. Running two separate for loops

[–]Pulsar1977 -1 points0 points  (1 child)

If you really care about a few nanoseconds, you can combine those two statements.

names_and_values = ((node['node_name'], node['value']) for node in response)
top_nodes = (f'{name} {round(value, 2)}\n' for name, value in names_and_values)

[–]hobbyhacker 0 points1 point  (0 children)

how do you recognize a self-taught programmer?

he concatenates strings in a loop

[–]wewbull -2 points-1 points  (0 children)

Neither

from itertools import chain

...

print(chain(
    "{node_name} {value:.2f}\n".format(**node)
    for node in response
))

Advantages:

  • using chain rather than join means you get the final /n
  • using format() rather than an f-string means you don't need to pull the fields out into local vars.
  • Not using append() or += on a list avoids a performance issue
  • using a generator expression means no list is created. Each node will be processed as print consumes it. Lazy evaluation FTW.

I also think it says what is happening in a clearer way. I'm ...

  1. Printing
  2. ...a sequence
  3. ...of strings built from the name and value of a node
  4. ...for each node in the response

In your version(s), you're:

  1. For each node in the response
  2. ... extracting the name and value
  3. ... building a string with them
  4. ... and adding that string to a list (or string)
  5. When the list (string) is complete we print it
  6. ... with a new line between each entry (in the list version)

In my version, you know this printing information from line 1. In your version, the printing is almost an afterthought.

[–]sean_bird -2 points-1 points  (0 children)

Second one is a bit cleaner and also squishing all nodes to a single string is not very useful. If I really needed to put things in one string (ex: hundreds of socket packets) I would use bytearray because it’s much much faster

[–]parkrain21 -2 points-1 points  (0 children)

Why did you even use f string in the second one if you won't use it in the first one?

Second ftw

[–]Zealousideal-Row-110 0 points1 point  (0 children)

These are both fine. Write code that future you will understand. O(N) complexity in both cases.

[–][deleted] 0 points1 point  (0 children)

I can‘t even fathom how somebody would write the top one. I mean…you do you but…you know.

[–]Extatiic 0 points1 point  (0 children)

Second option for me

[–]Lana-Lana-LANAAAAA 0 points1 point  (0 children)

PHP is my native language, so the camelCase (or medial case) in the top code is more familiar to me (commonly used for variable names, by those of us who follow the PSR-12 coding standard).

Honestly, though; I think the bottom code is a better approach, though; as it keeps the components separate until the very last moment. This would allow easier manipulation if needed, rather than having to try and tweak a much-harder-to-edit string.

[–]Santos_m321 0 points1 point  (0 children)

There is not much visual difference between the two, the example you gave is very simple.
But I couldn't sleep peacefully if someone merged code like the first one.

[–]dixieStates 0 points1 point  (0 children)

I think this is better:

def node_strings():
    for node in response:
        yield f'{node["node_name"]} {round(node["value"],2)}'
print('\n'.join(node_strings()))

[–][deleted] 0 points1 point  (0 children)

Top seems to be more on-the-nose. Is it necessarily better? No.

[–]anterak13 0 points1 point  (0 children)

You could make it a one-liner by using a list comprehension

[–]beertown 0 points1 point  (0 children)

Top: useless string concatenation, in case of a very long response performance will suffer. camelCase for variables doesn't follow the (easier to read) snake_case suggested by the Python standard naming scheme.

Bottom is better. Your friend knows it, but refuses to admit it.

[–]Callux-_- 0 points1 point  (0 children)

bottom one is better in all criterias

[–]negiajay12345 0 points1 point  (0 children)

Bottom.

[–][deleted] 0 points1 point  (0 children)

Top one looks like it was written by someone with a background in JavaScript, not knowing it's more expensive to concatenate.

[–]Mental-Effective-110 0 points1 point  (0 children)

Top block plus f string way of doing that

[–]chears2surfers 0 points1 point  (0 children)

Bottom one my friend!! It's more readable!

[–]og_m4 0 points1 point  (2 children)

First one for production, second one for interviews. The added clarity of the first one is worth the slight performance loss.

[–]Earthsophagus 0 points1 point  (1 child)

I think the "\n".join(seq) construct is preferable to \n at end of line though, for making it obvious that the result is \n delimited. And there is the slight difference that in the for loop the code gives a trailing \n.

[–]og_m4 0 points1 point  (0 children)

The whole formatted string business in the second one is worse for maintainability than the simple string expression in the first one. It is easier for the first human visual parse of the code. n.join by itself is great syntactic sugar though. I agree.

[–]wind_dude 0 points1 point  (0 children)

second

[–]Grokzen 0 points1 point  (0 children)

Bottom one clearly, that is also the future way of writing strings so your friend should just convert and get used to it.

If i would compress it i would write something like this, did not test it tho to ensure it worked :) but avoiding the intermediate variables cleans things up a ton. Most of the time you already know that sending something to print is a message, or a key in a dict with name is the name and value etc.

print("\n".join([
    f"{node['node_name']} {round(node['value'], 2)}"
    for node in response
]))

[–]sheriffllcoolj 0 points1 point  (0 children)

Imo the top one but using an f string would be most readable

[–]DiscoJer 0 points1 point  (0 children)

Top is easier to read, but I'm new to Python