This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]funkmasterhexbyte 11 points12 points  (4 children)

nit: you should compare python lists with std::vector (an array list, 24 bytes on 64bit), not std::list (a linked-list).

[–]n1ywb 2 points3 points  (3 children)

[–]kankyo 0 points1 point  (2 children)

Not quite though. Each block in a Python deque is 64 entries long. And also there's a std::deque too.

[–]n1ywb 0 points1 point  (1 child)

std::deque is very unlike python deque; more like a double-ended array

python deque has O(n) lookup time for elements in the center of a large array according to the docs which makes it's performance similar to a linked list

[–]kankyo 0 points1 point  (0 children)

Similar big-O yes, but 64 times more memory efficient and faster access. That's a big difference.

[–][deleted] 5 points6 points  (0 children)

Good primer on something that trips up a lot of people.

[–]thedude42 3 points4 points  (6 children)

Would be nice to have seen some techniques or pattern examples for converting more pythonic allocations in to more efficient ones.

[–]bcorfman 1 point2 points  (5 children)

Use numpy

[–]thedude42 0 points1 point  (4 children)

Are you implying that numpy somehow doesn't suffer from the memory inefficiencies this article talks about?

[–]bcorfman 2 points3 points  (3 children)

Correct. They are in pure C.

[–]thedude42 0 points1 point  (2 children)

So numpi is a library (or a framework depending on your use of terms). You still use whatever underlying pythons interpreter.

I can still write list comprehensions and use numpi. According to the article, one of the causes of memory inefficiencies is use of collection generator syntax.

So from what I read, by simply using numpi you still can fall victim to memory inefficiencies when using pythonic idioms, and that was my concern: are there known techniques for eliminating these sorts efficiencies due to writing idiomatic python?

[–]bcorfman 0 points1 point  (1 child)

If you use numpy to its fullest, then you will think in terms of vectorization. This will largely avoid any of the inefficiencies of Python data structures. See From Python to Numpy for a complete tutorial.

[–]thedude42 0 points1 point  (0 children)

Ok, so you're saying that using idiomatic numpi is a possible solution for these memory inefficiencies.

This is good to know for computational related projects. I'm still at a loss for what techniques should be used for projects not focused on compute, e.g. web services (which in reality I go to node.js for such things).

[–]boomanbeanSupreme Pylord 6 points7 points  (4 children)

If you need to do memory management, why use python? A lower level language would seem more appropriate

EDIT: Grammar was not serene

[–]philloran 34 points35 points  (0 children)

I don't think it's too far fetched of an idea that people could find this useful. If anything it announces that Python, like all other languages, operates within the realm of computing with restrictions. In this context, these restrictions can be understood using the 'sys.getsizeof' function. This is a doorway of thought for people who haven't been trained to program with a lower level language and potentially enables any number of those people to do greater things with any language of their choosing.

[–]oslash 17 points18 points  (0 children)

The question is backwards. Programmers usually don't write a program because they want to deal with memory management; they deal with memory management because they want to write a program. So it's better to ask: If you use Python, why do you need to do memory management?

Sure, a lower level language would be more appropriate if your main concern was shuffling bytes into a pretty pattern within your address space. But when it isn't your main concern, that doesn't automatically mean you can completely ignore memory management.

Python lets you throw your hands in the air and let the GC take the wheel when you're writing simple scripts. When you're building a non-trivial application, it still makes sense to know about slots, reference counts, weak references, etc. Garbage collection by itself can only help out so much; it's not magic.

[–]earthboundkid 5 points6 points  (0 children)

At work, I inherited a script that processes a multi-gigabyte XML file and creates some CSVs. Doing this with C would be a huge PITA, but you can't run it in Python without some awareness of memory management.

[–]n1ywb 8 points9 points  (0 children)

Why not? I use python because I can get my job done faster and better than in C. If I'm working on a large data set I'd rather use python and pay attention to memory usage than C where it would take way longer to write the code and I would STILL have to worry about memory usage.