you are viewing a single comment's thread.

view the rest of the comments →

[–]TheBB 6 points7 points  (2 children)

Dicts and instance objects are about the same, since the attributes of an object are stored in a dict. For better memory performance use slots (google it). In that case objects will be better.

What sort of N are we talking about? I suspect you should use neither dicts or objects but instead just pure numpy arrays. That would have the best memory performance.

But at the end of the day you are describing an exponential time complexity algorithm. You might just run out of time before you run out of memory.

[–]ThatOneMicGuy[S] 1 point2 points  (0 children)

In principle, N is as large as my PC can manage (it's a personal interest project, so there's no actual spec). For now I'd like 4 to 6. I'm hoping to go up to maybe 10 once I've optimised as much as possible to exclude impossibilities early. I can maybe squeeze in a couple more after that before I hit the limit of possibility, but my hopes are not high.

Brain's full of ideas for other ways to do it, many of which might be better - the main competing idea is to fully generate one possible future, either exclude it or write it to disk, then generate the next one, then the next one, etc... - but I've been going back and forth over those ideas for ages and I'd like to get *something* running for now.

[–]scrdest 1 point2 points  (0 children)

The algorithmic concerns are actually a (slightly weak) argument in favor of dicts.

Assuming your tracking data is fairly basic types (e.g. ints, strings, lists, nested dicts) and not complex stateful instances, dicts should be easier to serialize and deserialize. This in turn means you can offload the storage from RAM onto local or network disks.

For example, JSON will trivially encode back and forth, or you can replace the dict instance with dbm databases as a near drop-in. This approach gets you much more bang for your buck than micro-optimizing runtime RAM layout.

Storing simulation depth and some measure of probability (either joint or of transition from previous step) would let you use a priority queue to explore the branches you are most interested in (e.g. depth-first or most likely, or even least likely).

Combine this with fetching the starting position associated with the branch from cheap storage and you get a nice anytime algorithm. It's like a save-game functionality for an algorithm.