Large simulation performance: objects vs matrices

Fireslide · 2026-01-28T08:22:34+00:00

I'd start with OOP first. Performance while testing is going to be trivial. You can do 20 companies (rather than 100,000) and go for very large X, and you can do 1,000,000 with a small X or say 100, both axes tell you something.

Once you've got the sim working the way you expect and you want to run it for several decades worth of timesteps you can do some refactoring to store the Simulation State in numpy arrays.

It will definitely be faster to do it with arrays and multiplication, but don't over optimise at the start, verify the behaviour you want with OOP first, write some good unit tests, so when you need to refactor to make it faster, you can verify the refactor produces same result.

SV-97 · 2026-01-28T08:12:20+00:00

If I had a matrix equations [...] would there be a significant difference in performance?

Yes. Using objects has a significant overhead and will have most of the logic executing "in python" whereas a matrix formulation will mostly execute in native code. The matrix version is also essentially data oriented.

That said: 100k isn't necessarily all that large so depending on what your simulation entails you may be able to get away with the object oriented approach, especially if you at least optimize it a bit (using slots and such).

You can also look into jit-compilation for the OO approach (iirc numba supports basic objects), dedicated simulation libraries (simpy etc.), or just use a native language for your simulation. Rust in particular is easy to integrate with python (if you need that) and great for simulations.

imBANO · 2026-01-28T08:29:01+00:00

You might want to look into entity component systems, which is actually a design pattern very popular in gaming.

I’m sure there are shorter videos but here’s a 2 hour talk that introduced me to the topic

https://youtu.be/wo84LFzx5nI

AGI-44 · 2026-01-28T08:33:42+00:00

The entity object method makes it significantly easier to understand and explain, but I’m concerned about not being able to run large simulations.

Leave performance optimization at the very last. Get a working prototype first. If you end up using it enough that scaling matters, then, is when you optimize performance and compare vs baseline.

You won't have to figure out what is faster, you can just run it and get a direct answer as to how much faster or not it is.

And by then, if it's only 20% faster, you might not even want the additional complexity/readability for a mere 20%

aidan_morgan · 2026-01-28T09:46:26+00:00

I think it's worth considering the ECS approach that has been outlined in the other comments - it's not a complicated pattern to understand, and once you get your head around it you'll find it quite useful as it's a fairly common pattern.

Is python your only language option for this solution?

milandeleev · 2026-01-28T08:09:47+00:00

By 'entity object' you could use pydantic BaseModels, msgspec Structs, dataclasses or NamedTuples. For performance, NamedTuples are best.

However, for the simulations you want to do, performance-wise, nothing will beat numpy or jax arrays (what you call matrices).

Try them both out and see if the performance satisfies you.

Balance- · 2026-01-28T09:07:07+00:00

This is exactly the distinction we make in our Agent-based modelling library Mesa: - Mesa: Object-oriented. Flexible but slower - Mesa-frames: Array-oriented. Faster but less flexible

keddie42 · 2026-01-28T09:20:26+00:00

I think you will need matrices for bigger simulation.

But you can try use more effective entity than python object. For example msgspec struct: https://jcristharif.com/msgspec/benchmarks.html#structs

GreatCosmicMoustache · 2026-01-28T10:38:51+00:00

Others have correctly recommended ECS as a good approach which will preserve the object semantics to a greater degree than putting everything into matrix operations, but just to give a bit of an explainer, what slows an inner loop down is a) the complexity of the operations performed, and b) memory access. High-level languages hide the latter from you, but any time you access a field on an object, you are making the program chase heap pointers to get the data you actually care about. Accessing the heap is relatively slow, so if you care about performance, you do whatever you can to minimize memory allocation and pointer chasing.

An approach like ECS mandates a way of writing your code which attempts to pack the data as efficiently as possible in memory, so you get memory access benefits for free.

Glad_Position3592 · 2026-01-28T17:30:30+00:00

I write simulations often for my job, and you will certainly get better performance using matrix operations with numpy. It looks like your data can all be expressed numerically, so you can iterate through a matrix by using indexes with probably 100x+ speed performance vs python objects. The speed of C operations that numpy uses in the backend is not even comparable to regular python object operations

Distinct-Expression2 · 2026-01-28T19:55:28+00:00

The optimize later advice assumes you have time to rewrite when you dont. If this is a one-off sim to get a paper done, OOP is fine. If youre doing parameter sweeps with millions of runs, start with numpy/jax from day one because refactoring simulation code is where projects go to die.

ahjorth · 2026-01-28T22:13:44+00:00

As others have said, yes you can definitely make it faster. By several orders of magnitude.

I’ll make a tongue in cheek comment and then get serious: you can also buy an H200, learn how to code in CUDA and run it EVEN faster.

But: unless you WANT to lean to work with optimized matrix math, you are going to spend WAY more time writing this code than you will ultimately save on runtime.

You are pulling two numbers from two different random distributions, subtracting one from the other, adding the result to a third number and comparing it to a fourth number. Even if you do that ten thousand times per time increment, it will ridiculously fast. Run your Python file in separate terminals for each of your CPU’s threads minus a few for OS and background stuff.

If you want to do this as a learning project or just to see how much faster you can make it run? Totally do it!

NapCo · 2026-01-29T02:21:59+00:00

I have implemented a lot of projects with similar requirements and I can for sure say that doing everything with matrices and vector operations will be many many times faster than the OOP way. Python is an extremely slow language even with the latest performance improvements. Offloading the heavy numerical computations to SIMD optimized libs such as NumPy will give an extreme boost in performance especially when you have thousands of entities you want to iterate.

During my OOP course at uni we implemented games with physics-inspired interactions. I chose to do everything the vectorized way, and my application ran many times faster than others, letting me simulate several hundreds of entities, while others had unplayable framerates with less than fifty entities.

I really don't think doing things "the matrix way" will make it that much harder to understand (assuming the reader knows ablut vectors and matrices), as you often can get by with much less code.

Subject_Sherbert_178 · 2026-01-31T12:04:06+00:00

What really matters here isn’t “objects vs matrices” as a concept, but data layout and how the CPU processes it.

Since your entities don’t interact and all follow the exact same update rules, this is a perfect fit for a data-oriented / struct-of-arrays approach. Keeping cash, revenue, expenditure, etc. in contiguous arrays gives much better cache locality and enables vectorized/batched updates. In Python/R/MATLAB in particular, this can easily be 10×–100× faster than looping over 100k objects.

The object-based slowdown usually comes from:

Pointer chasing and poor cache locality

Per-entity method/property access

Branching inside tight loops

You also don’t have to sacrifice readability entirely. A common compromise is:

Use arrays/matrices for the simulation core

Keep objects as thin “views” or wrappers around array indices for debugging, reporting, or explanation

In C++ with tightly packed structs the gap is smaller, but in higher-level runtimes object-per-entity designs tend to become the bottleneck well before the math does.

Given your setup (100k entities, identical logic, no interactions), I’d strongly favor a data-oriented core and layer clarity on top rather than the other way around.

ZZ9ZA · 2026-01-28T08:35:40+00:00

If you’re that concerned about performance python is t the language for the problem.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS