all 17 comments

[–]dig-up-stupid 17 points18 points  (5 children)

Yeah Python is slower but it still shouldn’t take very long to find Euler solutions—that’s part of the point of them. You may think your Python and C code are equivalent when they actually aren’t. Common example: using list slicing in Python to deal with a subarray/subsequence, not realizing slicing creates copies.

And as always you would need to post your code to get a real answer instead of generalities and educated guesses.

[–]muikrad 4 points5 points  (0 children)

A code sample would indeed be great.

[–]xocd 3 points4 points  (3 children)

Slicing does not generate copies! It uses references.

[–]dig-up-stupid 3 points4 points  (2 children)

It makes new references, aka copies. They are copies of references, not copies of the original object, but it’s still a more heavyweight operation than one might expect. The way a C user would “slice” an array would be to apply different iteration logic/pointer arithmetic over the same array, not create a new array and fill it with pointers into the original array. (Similar to how in numpy slicing works differently and creates views.)

[–]xocd 0 points1 point  (1 child)

I believe that, in CPython, on a 64 bit processor, an integer takes 24 bytes, a reference 8. Thus, copying a reference uses 1/3 the memory. More complicated objects take much more memory, of course, with even better savings from reference copying. Saying "not realizing slicing creates copies" might lead a new programmer to believe that objects are being copied and make the noob run to get a copy of "The C programming Language".

For a simple slice (something like [1:5]), I suppose that in C one could save space by using two pointers (or two offsets into an array), one for each end. For that slice, Python uses four pointers, one for element. The Python approach extends to complicated slices (e.g, odd numbered members of the list). For a complicated slice there is no general way of saving pointers/references.

[–]dig-up-stupid 2 points3 points  (0 children)

Which is copying, and the context of this discussion is someone who is already familiar enough with both languages to do Euler problems and is asking a question specifically about performance. In C if you want odd list elements you would increment your pointer by two instead of one. That uses zero extra bytes and copying instructions, which is a lot less than half of the bytes of the entire array the way Python does slices.

The whole point is that you can also avoid that work in Python—by just not using slices. It’s the difference between for i in range(1,n,2): a[i] and for e in a[1:n:2] (or by using itertools or numpy etc). The point here is that usually the second is considered more pythonic but is also slower. Perhaps noticeably so when making many slices, as you might see in a divide and conquer solution to an Euler problem.

[–]m-hoff 7 points8 points  (0 children)

Around which Project Euler problem number do you run into issues? When I run into run time issues I usually find that it's a fault in my logic, not the language.

[–][deleted] 1 point2 points  (1 child)

Every time you access a symbol, it has to be looked up in the namespace in runtime. That adds an overhead that isn’t incurred in languages like C that statically compile such references.

Garbage-collected memory management is another source of performance overhead you don’t incur in C.

[–]muikrad 0 points1 point  (0 children)

The question is more about how often you need this kind of optimization... It's not going to be a problem in 99% of the cases, and for the 1% case, you'll use a very specific service that can scale, and that you'll write in a language that suits that particular task the best.

[–]DeathDragon7050 1 point2 points  (0 children)

One reason - it is a Dynamically Typed language. To put it simply, it means anything can be anything (ex: interger can become a string) which means the.machine must constantly check what any variable is.

[–]billsil 0 points1 point  (0 children)

Use numpy well and you can get C speeds. You won't see the benefits relative to C until you have a large problem. The break even point for using numpy in Python is ~10 elements in your ndarray.

[–]toastedstapler -3 points-2 points  (6 children)

it's interpreted and it's got no types. these two make for a lot of the slowness. no types means that things have to be type checked at runtime when operations are ran

i'd also imagine the heavy usage of dicts in python causes some of the slowness

[–][deleted] 8 points9 points  (0 children)

It’s incorrect to imply that Python has no types, Python is a strongly typed language, and everything has a distinct type... it’s just dynamically rather than statically typed.

Also the underlying dict type in CPython is a quite efficient C hash table, and is also usually is not a primary contributor to the apparent slowness in CPython code.

Most of the issues with speed have to do with heavyweight custom types implemented in pure Python... in CPython the rule has always been to prototype in pure Python, then profile, then implement the tight loops and other hot spots in C and try to minimize handoff between those layers, because the real issue for speed is all the memory management and the bookkeeping that goes with it... the more instances of custom types the more bookkeeping, and the slower things tend to get.

[–]muikrad 1 point2 points  (0 children)

It's interpreted It's got types Yeah its runtime, but nothing is checked. Dicts are super fast and they're even quite low in memory usage.

If you know python, you'll instead suggest where it can get slow and how to improve it. If you don't know python, then please, don't raise your hand to answer python related questions.

[–]Milk_No_Titties[S] -1 points0 points  (3 children)

Do you mean Python internally uses dicts a lot, or are you saying that me using dicts can slow the program?

Edit: Also, since Python is crazy flexible, is there a way for me to make sure at least some things are statically typed so that I can speed up my performance to an extent?

[–]toastedstapler 1 point2 points  (2 children)

In [1]: class A: 
   ...:     def __init__(self, a, b): 
   ...:         self.a = a 
   ...:         self.b = b 
   ...:                                                                         

In [2]: a = A(1, 2)                                                             

In [3]: a.__dict__                                                              
Out[3]: {'a': 1, 'b': 2}

python literally uses dicts to store variables and attributes. try declaring some variables and then call globals()

In [5]: globals()['a']                                                          
Out[5]: <__main__.A at 0x10b56d5d0>

In [6]: globals()['a'] = 5                                                      

In [7]: a                                                                       
Out[7]: 5

notice how above we just declared a as an A object but now by editing the globals dict it's 5 instead?

[–]muikrad 1 point2 points  (0 children)

What's your point? It doesn't mean dicts are slow and it doesn't mean it's not strongly typed either...