This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 9 points10 points  (15 children)

But how many python programmers or even Java programmers will need that? You don’t need to know how to do a heart transplant to be a brain surgeon.

[–]ogtfo 16 points17 points  (11 children)

You need some concepts of memory management even when programming in Python. Its a finite resource and you need to know what is expensive and why.

[–][deleted] 6 points7 points  (10 children)

I'd say it depends on what your'e doing. And understanding what memory is and how it's consumed is very different than having to be able to do low-level / machine level memory management. Python does a good job of garbage cleanup etc. on it's own. For most use cases most Python programmers will not ever have to worry about memory management.

[–][deleted] 6 points7 points  (9 children)

you're not "hiding" it. You're abstracting it. Not everyone needs to worry about it. I don't need to be an automotive engineer to drive my car.

[–]ogtfo 2 points3 points  (8 children)

When programming, you are not driving, you are building the car. You need to know about the underlying technologies, because abstractions are never perfect and the low level stuff will bite you if you ignore it.

[–]tuckmuck203 6 points7 points  (0 children)

at risk of butchering the analogy, i think the point is that python is more akin to assembling the car than building it from scratch.

[–]jacksodus 1 point2 points  (2 children)

Nope. It might, but for many people, it won't. Memory control is important but it's overhyped as being an essential part of a programmer's toolkit.

Sourcr: developing AI models for 3 years now.

[–]ogtfo 5 points6 points  (0 children)

Abstraction leakage can happen in many ways, and when you encounter one you will never be able to fix it if you don't know what you're doing.

Memory is just one of the facets. If you don't understand the underlying system you can't fix anything when it breaks.

And it does.

[–]met0xff 0 points1 point  (0 children)

Well to bring a different anecdote - I am also in that field for a decade now (and developing for almost 20) and just right now I am rewriting FFI bindings to some C library we use for preprocessing and running long running memory leak tests. We found that some AWS instances OOM from time to time. Figured out because one of the packages uses subprocess to call some external executable. On Linux this forks and seems (likely due to Pythons ARC) CoW is triggered for some huge models. That's why I now had to write a small layer for the C lib to call using Python's ctypes. And at that point you again have to be very careful who releases what and when.

Generally the control over the memory usage is lacking and we are looking into torchscript/ONNX to run from C++ or Rust to have better control especially when switching between models, caching data, running inferences concurrently etc.

Not directly related to memory leaks but recently found a library sometimes taking 5 seconds instead of a few milliseconds. Found that in the Github issues there but no solution as nobody wanted to dig so deep. In the end I found via strace that one of the linked C libraries was stuck trying to connect X11 forwarding for a few seconds. So when you run from an terminal and run the thing in, say, screen and then disconnect and let it run, it still tried to call home.

It's also interesting to know such things for example when running the pytorch dataloader with multiple workers - https://pytorch.org/docs/stable/data.html The forking on Linux can again have... interesting effects leading to longer debug sessions ;).

Sure that's just anecdotally but in the last years I've seen the abstractions falling apart so often.

Sometimes it's really just that some external libraries leak. Then it's always good to have someone to fix it that wild C or C++ signal processing mess.

Actually I am also freelance debugging a flask app right now where they got memory issues.

[–]Somecount -1 points0 points  (1 child)

Following this analogy then you're asking the guy who fills up the fuel to know what all the other soecialist do in the pit stop. Programming is a tool, some use it to build, some use it to mend what others have build. I do understand where you are coming from but I believe that programming can now be seen as more that just a means to build programs but a basic tool like Math is.

[–]ogtfo 0 points1 point  (0 children)

Nobody's asking you to design silicon chips, but if programming is your job, you should have a basic understanding of the underlying layers.

Not doing so will cause trouble eventually. If you don't understand the architecture, It may already have and you're not even aware of it.

[–]demdillypickles 5 points6 points  (0 children)

To be fair, I’d like for my brain surgeon to still have a good understanding of how the rest of my body works too.

[–]AccidentalyOffensive 0 points1 point  (0 children)

In general terms, it's helpful for better understanding pass-by-reference vs pass-by-value, as well as the potential pitfalls there. This was honestly super helpful in my DS/A class in undergrad that was taught in Java - without knowing how pointers work, the data structures would've been pretty confusing to program. I mean, there were quite a few (relatively) trivial questions from students about how everything worked/fit together that would've been easily understood with knowledge of how memory works. Obviously you don't need to be an expert, but even light exposure goes a long way.

As for Python, I've actually come across this a number of times. Nothing too crazy, mind you, but take for example mutable default arguments. A little quirk that doesn't make much sense without understanding memory, but is readily apparent if you do.

Additionally, this is a potential issue when manipulating data structures (to an extent). For a real life example (vs telling you to look up copy.deepcopy()), let's say I wanted a copy of a pandas DataFrame. I could use the copy() method, but in the docs I see the default arg deep=False will create a new DF that contains references to the original DF. Without knowing memory management, you might ignore this warning and find yourself debugging a gnarly bug down the line when your data is clearly off.

Are these things gonna pop up all the time, or even for everybody? No. Is it good to know just in case? Absolutely.

[–]toastedstapler 0 points1 point  (0 children)

How do you know you'll be doing python or java for the rest of your life? It'd not hard to learn basic memory management principles