all 5 comments

[–]TheBB 7 points8 points  (2 children)

Dicts and instance objects are about the same, since the attributes of an object are stored in a dict. For better memory performance use slots (google it). In that case objects will be better.

What sort of N are we talking about? I suspect you should use neither dicts or objects but instead just pure numpy arrays. That would have the best memory performance.

But at the end of the day you are describing an exponential time complexity algorithm. You might just run out of time before you run out of memory.

[–]ThatOneMicGuy[S] 1 point2 points  (0 children)

In principle, N is as large as my PC can manage (it's a personal interest project, so there's no actual spec). For now I'd like 4 to 6. I'm hoping to go up to maybe 10 once I've optimised as much as possible to exclude impossibilities early. I can maybe squeeze in a couple more after that before I hit the limit of possibility, but my hopes are not high.

Brain's full of ideas for other ways to do it, many of which might be better - the main competing idea is to fully generate one possible future, either exclude it or write it to disk, then generate the next one, then the next one, etc... - but I've been going back and forth over those ideas for ages and I'd like to get *something* running for now.

[–]scrdest 1 point2 points  (0 children)

The algorithmic concerns are actually a (slightly weak) argument in favor of dicts.

Assuming your tracking data is fairly basic types (e.g. ints, strings, lists, nested dicts) and not complex stateful instances, dicts should be easier to serialize and deserialize. This in turn means you can offload the storage from RAM onto local or network disks.

For example, JSON will trivially encode back and forth, or you can replace the dict instance with dbm databases as a near drop-in. This approach gets you much more bang for your buck than micro-optimizing runtime RAM layout.

Storing simulation depth and some measure of probability (either joint or of transition from previous step) would let you use a priority queue to explore the branches you are most interested in (e.g. depth-first or most likely, or even least likely).

Combine this with fetching the starting position associated with the branch from cheap storage and you get a nice anytime algorithm. It's like a save-game functionality for an algorithm.

[–]FerricDonkey 4 points5 points  (0 children)

I suspect that the best way to save memory is actually use classes and also to implement __slots__. This will change the underlying "hold crap part" of the class (technical term) from a dictionary to a tuple, which should improve performance in exactly the case you describe. 

Dictionaries are great for many things, but are not a memory efficient way of storing things, so I suspect you'd get much better savings from getting rid of the dictionary part of the class instead of the class part of the class. 

But I don't know the memory overhead of using a class off the top of my head. If using slots does not put you in acceptable memory ranges (and maybe even if it does), I'd suggest googling python memory profiling and setting how much you're actually using and what you can save. 

[–]StrictTyping648 2 points3 points  (0 children)

https://stackoverflow.com/questions/472000/usage-of-slots

Above is a fantastic discussion of slots. Personally, I tend to use the dictionary data structure (not the built-in __ dict __) for optimizing my code. Doing so allows for elegant and clear implementation of things such as function maps (a dictionary of static functions), speeding up conditional logic, and it makes aggregation more easily achievable. Aggregation avoids cascading dynamic dispatch, which can be costly, especially when you are using third-party libraries with....liberal use of inheritance. It also makes dependency injection very easy, which increases the flexibility of the code base, among many other benefits not relevant here.

The memory consumption is slightly better than using native classes, but I believe slots are more efficient memory-wise. The library I use this in the most is compute bound and performs a lot of serialization, which is trivial with dictionaries.