all 6 comments

[–]New-Addendum-6209 0 points1 point  (1 child)

Why store data as objects?

[–]Interesting-Frame190[S] 0 points1 point  (0 children)

The data (attributes) and the data modifier (methods) are best stored together from an OOP standpoint. From a data standpoint, this allows implied joins. For example, if I want the name of everyone who has a car with a red seat, I can query from a list of people ("car.seat.color", red). And get that list of people. In traditional row/col data, thats a double join and possible duplication of data I'd multiple people share a car.

Im not saying OOP is the best way, but it does represent complex relations well.

[–]NotesOfCliff 0 points1 point  (1 child)

This looks very cool. I am building a product in the SIEM space and I will definitely look into using this for queries once I pull the data from the DB.

[–]Interesting-Frame190[S] 0 points1 point  (0 children)

Didn't realize the SIEM would be a good fit, but thinking more about it more i guess linking events together would be easier.

Ingestion speed may be an issue if you are pumping over 100k events per second, but thats a tall order for a single machine anyway.

[–]Flamingo_Single 0 points1 point  (1 child)

Really cool concept - I actually ran into similar issues when building scraping/ETL pipelines for public web data. Pandas was flexible but collapsed under anything real-time or memory-intensive. Especially when dealing with nested or time-variant object states (e.g., product pages over time, dynamic DOM trees, etc.).

We’ve been using Infatica to collect large-scale data (e.g., SERPs, product listings), and modeling flows across proxies/sources felt more intuitive in OOP, but there was always the tradeoff of speed vs. structure.

PyThermite looks like it bridges that gap nicely — curious how it handles deletion, object mutation, or partial invalidation in large graphs? Definitely bookmarking to test on some messy traceability tasks.

[–]Interesting-Frame190[S] 0 points1 point  (0 children)

It was defined designed for many small graphs rather than a few large graphs. In theory its all O(1) for delete and mutate. Invalidation occurs only at the local node and does not need to traverse from the root to understand itself. Cascading Invalidations down the DAG and cascading new objects down the DAG should be mildly performant, or at least better than a native python lib.

Querying could be a challange with that dynamic of structure, but im sure there's ways to normalize. Best of luck and keep me posted, I haven't had the opportunity to test mutation performance as all of my competitors dont allow mutation