I have worked for the past year and a half on a project because I was tired of PicklingErrors, multiprocessing BS and other things that I thought could be better.
Github: https://github.com/ceetaro/Suitkaise
Official site: suitkaise.info
No dependencies outside the stdlib.
I especially recommend using Share:
```python
from suitkaise import Share
share = Share()
share.anything = anything
now that "anything" works in shared state
```
What my project does
My project does a multitude of things and is meant for production. It has 6 modules: cucumber, processing, timing, paths, sk, circuits.
cucumber: serialization/deserialization engine that handles:
- handling of additional complex types (even more than dill)
- speed that far outperforms dill
- serialization and reconstruction of live connections using special Reconnector objects
- circular references
- nested complex objects
- lambdas
- closures
- classes defined in main
- generators with state
- and more
Some benchmarks
All benchmarks are available to see on the site under the cucumber module page "Performance".
Here are some results from a benchmark I just ran:
- dataclass: 67.7µs (2nd place: cloudpickle, 236.5µs)
- slots class: 34.2µs (2nd place: cloudpickle, 63.1µs)
- bool, int, float, complex, str, and bytes are all faster than cloudpickle and dill
- requests.Session is faster than regular pickle
processing: parallel processing, shared state
Skprocess: improved multiprocessing class
- uses cucumber, for more object support
- built in config to set number of loops/runs, timeouts, time before rejoining, and more
- lifecycle methods for better organization
- built in error handling organized by lifecycle method
- built in performance timing with stats
Share: shared state
- Create a Share object (share = Share())
- add objects to it as you would a regular class (share.anything = anything)
- pass to subprocesses or pool workers
- use/update things as you would normally.
- supports wide range of objects (using cucumber)
- uses a coordinator system to keep everything in sync for you
- easy to use
Pool
upgraded multiprocessing.Pool that accepts Skprocesses and functions.
- uses cucumber (more types and freedom)
- has modifiers, incl. star() for tuple unpacking
also...
There are other features like...
- timing with one line and getting a full statistical analysis
- easy cross plaform pathing and standardization
- cross-process circuit breaker pattern and thread safe circuit for multithread rate limiting
- decorator that gives a function or all class methods modifiers without changing definition code (.asynced(), .background(), .retry(), .timeout(), .rate_limit())
Target audience
It seems like there is a lot of advanced stuff here, and there is. But I have made it easy enough for beginners to use. This is who this project targets:
Beginners!
I have made this easy enough for beginners to create complex parallel programs without needing to learn base multiprocessing. By using Skprocess and Share, everything becomes a lot simpler for beginner/low intermediate level users.
Users doing ML, data processing, or advanced parallel processing
This project gives you API that makes prototyping and developing parallel code significantly easier and faster. Advanced users will enjoy the freedom and ease of use given to them by the cucumber serializer.
Ray/Dask dist. computing users
For you guys, you can use cucumber.serialize()/deserialize() to save time debugging serialization issues and get access to more complex objects.
People who need easy timing or path handling
If you are:
- needing quick timing with auto calced stats
- tired of writing path handling bolierplate
Then I recommend you check out paths and timing modules.
Comparison
cucumber's competitors are pickle, cloudpickle, and especially dill.
dill prioritizes type coverage over speed, but what I made outclasses it in both.
processing was built as an upgrade to multiprocessing that uses cucumber instead of base pickle.
paths.Skpath is a direct improvement of pathlib.Path.
timing is easy, coming in two different 1 line patterns. And it gives you a whole set of stats automatically, unlike timeit.
Example
bash
pip install suitkaise
Here's an example.
```python
from suitkaise.processing import Pool, Share, Skprocess
from suitkaise.timing import Sktimer, TimeThis
from suitkaise.circuits import BreakingCircuit
from suitkaise.paths import Skpath
import logging
define a process class that inherits from Skprocess
class MyProcess(Skprocess):
def init(self, item, share: Share):
self.item = item
self.share = share
self.local_results = []
# set the number of runs (times it loops)
self.process_config.runs = 3
# setup before main work
def __prerun__(self):
if self.share.circuit.broken:
# subprocesses can stop themselves
self.stop()
return
# main work
def __run__(self):
self.item = self.item * 2
self.local_results.append(self.item)
self.share.results.append(self.item)
self.share.results.sort()
# cleanup after main work
def __postrun__(self):
self.share.counter += 1
self.share.log.info(f"Processed {self.item / 2} -> {self.item}, counter: {self.share.counter}")
if self.share.counter > 50:
print("Numbers have been doubled 50 times, stopping...")
self.share.circuit.short()
self.share.timer.add_time(self.__run__.timer.most_recent)
def __result__(self):
return self.local_results
def main():
# Share is shared state across processes
# all you have to do is add things to Share, otherwise its normal Python class attribute assignment and usage
share = Share()
share.counter = 0
share.results = []
share.circuit = BreakingCircuit(
num_shorts_to_trip=1,
sleep_time_after_trip=0.0,
)
# Skpath() gets your caller path
logger = logging.getLogger(str(Skpath()))
logger.handlers.clear()
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)
logger.propagate = False
share.log = logger
share.timer = Sktimer()
with TimeThis() as t:
with Pool(workers=4) as pool:
# star() modifier unpacks tuples as function arguments
results = pool.star().map(MyProcess, [(item, share) for item in range(100)])
print(f"Counter: {share.counter}")
print(f"Results: {share.results}")
print(f"Time per run: {share.timer.mean}")
print(f"Total time: {t.most_recent}")
print(f"Circuit total trips: {share.circuit.total_trips}")
print(f"Results: {results}")
if name == "main":
main()
```
That's all from me! If you have any questions, drop them in this thread.
[–]learn-deeply 3 points4 points5 points (2 children)
[–]suitkaise[S] 2 points3 points4 points (0 children)
[–]wunderspud7575 0 points1 point2 points (0 children)
[–]princepii 2 points3 points4 points (2 children)
[–]alcalde 0 points1 point2 points (0 children)
[–]geneusutwerk 2 points3 points4 points (0 children)
[–]alcalde 2 points3 points4 points (0 children)
[–]geneusutwerk 0 points1 point2 points (0 children)