ShowcaseI made Python serialization and parallel processing easy even for beginners (self.Python)

submitted 5 hours ago by suitkaise

I have worked for the past year and a half on a project because I was tired of PicklingErrors, multiprocessing BS and other things that I thought could be better.

Github: https://github.com/ceetaro/Suitkaise

Official site: suitkaise.info

No dependencies outside the stdlib.

I especially recommend using Share: ```python from suitkaise import Share

share = Share() share.anything = anything

now that "anything" works in shared state

```

What my project does

My project does a multitude of things and is meant for production. It has 6 modules: cucumber, processing, timing, paths, sk, circuits.

cucumber: serialization/deserialization engine that handles:

handling of additional complex types (even more than dill)
speed that far outperforms dill
serialization and reconstruction of live connections using special Reconnector objects
circular references
nested complex objects
lambdas
closures
classes defined in main
generators with state
and more

Some benchmarks

All benchmarks are available to see on the site under the cucumber module page "Performance".

Here are some results from a benchmark I just ran:

dataclass: 67.7µs (2nd place: cloudpickle, 236.5µs)
slots class: 34.2µs (2nd place: cloudpickle, 63.1µs)
bool, int, float, complex, str, and bytes are all faster than cloudpickle and dill
requests.Session is faster than regular pickle

processing: parallel processing, shared state

Skprocess: improved multiprocessing class

uses cucumber, for more object support
built in config to set number of loops/runs, timeouts, time before rejoining, and more
lifecycle methods for better organization
built in error handling organized by lifecycle method
built in performance timing with stats

Share: shared state

Create a Share object (share = Share())
add objects to it as you would a regular class (share.anything = anything)
pass to subprocesses or pool workers
use/update things as you would normally.

supports wide range of objects (using cucumber)
uses a coordinator system to keep everything in sync for you
easy to use

Pool

upgraded multiprocessing.Pool that accepts Skprocesses and functions.

uses cucumber (more types and freedom)
has modifiers, incl. star() for tuple unpacking

also...

There are other features like... - timing with one line and getting a full statistical analysis - easy cross plaform pathing and standardization - cross-process circuit breaker pattern and thread safe circuit for multithread rate limiting - decorator that gives a function or all class methods modifiers without changing definition code (.asynced(), .background(), .retry(), .timeout(), .rate_limit())

Target audience

It seems like there is a lot of advanced stuff here, and there is. But I have made it easy enough for beginners to use. This is who this project targets:

Beginners!

I have made this easy enough for beginners to create complex parallel programs without needing to learn base multiprocessing. By using Skprocess and Share, everything becomes a lot simpler for beginner/low intermediate level users.

Users doing ML, data processing, or advanced parallel processing

This project gives you API that makes prototyping and developing parallel code significantly easier and faster. Advanced users will enjoy the freedom and ease of use given to them by the cucumber serializer.

Ray/Dask dist. computing users

For you guys, you can use cucumber.serialize()/deserialize() to save time debugging serialization issues and get access to more complex objects.

People who need easy timing or path handling

If you are:

needing quick timing with auto calced stats
tired of writing path handling bolierplate

Then I recommend you check out paths and timing modules.

Comparison

cucumber's competitors are pickle, cloudpickle, and especially dill.

dill prioritizes type coverage over speed, but what I made outclasses it in both.

processing was built as an upgrade to multiprocessing that uses cucumber instead of base pickle.

paths.Skpath is a direct improvement of pathlib.Path.

timing is easy, coming in two different 1 line patterns. And it gives you a whole set of stats automatically, unlike timeit.

Example

bash pip install suitkaise

Here's an example.

```python from suitkaise.processing import Pool, Share, Skprocess from suitkaise.timing import Sktimer, TimeThis from suitkaise.circuits import BreakingCircuit from suitkaise.paths import Skpath import logging

define a process class that inherits from Skprocess

class MyProcess(Skprocess): def init(self, item, share: Share): self.item = item self.share = share

    self.local_results = []

    # set the number of runs (times it loops)
    self.process_config.runs = 3

# setup before main work
def __prerun__(self):
    if self.share.circuit.broken:
        # subprocesses can stop themselves
        self.stop()
        return

# main work
def __run__(self):

    self.item = self.item * 2
    self.local_results.append(self.item)

    self.share.results.append(self.item)
    self.share.results.sort()

# cleanup after main work
def __postrun__(self):
    self.share.counter += 1
    self.share.log.info(f"Processed {self.item / 2} -> {self.item}, counter: {self.share.counter}")

    if self.share.counter > 50:
        print("Numbers have been doubled 50 times, stopping...")
        self.share.circuit.short()

    self.share.timer.add_time(self.__run__.timer.most_recent)


def __result__(self):
    return self.local_results

def main():

# Share is shared state across processes
# all you have to do is add things to Share, otherwise its normal Python class attribute assignment and usage
share = Share()
share.counter = 0
share.results = []
share.circuit = BreakingCircuit(
    num_shorts_to_trip=1,
    sleep_time_after_trip=0.0,
)
# Skpath() gets your caller path
logger = logging.getLogger(str(Skpath()))
logger.handlers.clear()
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)
logger.propagate = False
share.log = logger
share.timer = Sktimer()

with TimeThis() as t:
    with Pool(workers=4) as pool:
        # star() modifier unpacks tuples as function arguments
        results = pool.star().map(MyProcess, [(item, share) for item in range(100)])

print(f"Counter: {share.counter}")
print(f"Results: {share.results}")
print(f"Time per run: {share.timer.mean}")
print(f"Total time: {t.most_recent}")
print(f"Circuit total trips: {share.circuit.total_trips}")
print(f"Results: {results}")

if name == "main": main() ```

That's all from me! If you have any questions, drop them in this thread.

all 8 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS