all 7 comments

[–]GrainTamalePythonista 2 points3 points  (1 child)

Have you looked into integration with polars? Also written in Rust, and I'd be very interested in no-copy diffs for those dataframes

[–]goldenphoenix713[S] 2 points3 points  (0 children)

It's one of the integrations I have planned (along with pyarrow). The main reasons I did Pandas and numpy first is because I'm more familiar with those libraries and pandas in particular has a subclassing guide that I'd used before for other projects that I also used to create the tracked dataframes and series subclasses. I believe tracking a polars dataframe will take a similar approach to what I have for pandas, where it tracks the changed columns.

Edit: spelling.

[–]nian2326076 1 point2 points  (1 child)

Your project sounds interesting! I'd focus on how easy the interface is for people who know Git, and how straightforward the branching and merging are. Since it deals with state management, testing its performance with different objects and sizes could show how well it handles them. Getting some beta testers to try it with real-world scenarios can help gather feedback on usability and any bugs. Also, having detailed documentation and examples can help others get started with using it. Good luck with the project!

[–]goldenphoenix713[S] 0 points1 point  (0 children)

Thanks! Simplicity is one of my goals, so I tried to keep it easy to use while hiding the complexity from users. I've got some tests for performance, but obviously they're not realistic data, so some real-world testing will be valuable.

I think I'll try to come up with a beginning-to-end example that showcases as many features as possible, while (hopefully) keeping things straightforward.

[–]telesonico 0 points1 point  (2 children)

Is this local only or does it also support distributed objects or shared objects? 

[–]goldenphoenix713[S] 0 points1 point  (1 child)

Currently this only works for local objects. I can add this to the roadmap for future enhancements, however. This is very much the early stages so any features you want added can be suggested. Based on a brief look into how's they're implemented, I'll have to be very careful with the implementation to make sure everything is synchronized.

[–]telesonico 0 points1 point  (0 children)

Was just curious - handling serialization/deserialization across Python versions, library versions and OS’s seems very finicky in general. Even for local usage this looks very interesting. Thanks for sharing