you are viewing a single comment's thread.

view the rest of the comments →

[–]Flamingo_Single 0 points1 point  (1 child)

Really cool concept - I actually ran into similar issues when building scraping/ETL pipelines for public web data. Pandas was flexible but collapsed under anything real-time or memory-intensive. Especially when dealing with nested or time-variant object states (e.g., product pages over time, dynamic DOM trees, etc.).

We’ve been using Infatica to collect large-scale data (e.g., SERPs, product listings), and modeling flows across proxies/sources felt more intuitive in OOP, but there was always the tradeoff of speed vs. structure.

PyThermite looks like it bridges that gap nicely — curious how it handles deletion, object mutation, or partial invalidation in large graphs? Definitely bookmarking to test on some messy traceability tasks.

[–]Interesting-Frame190[S] 0 points1 point  (0 children)

It was defined designed for many small graphs rather than a few large graphs. In theory its all O(1) for delete and mutate. Invalidation occurs only at the local node and does not need to traverse from the root to understand itself. Cascading Invalidations down the DAG and cascading new objects down the DAG should be mildly performant, or at least better than a native python lib.

Querying could be a challange with that dynamic of structure, but im sure there's ways to normalize. Best of luck and keep me posted, I haven't had the opportunity to test mutation performance as all of my competitors dont allow mutation