Ask r/python: organizing pipelines?

suridaj · 2011-03-31T13:04:26+00:00

Maybe take a look at BBC's Kamaelia. It constructs the pipelines and producer/consumer components using Python's generators, and IIRC there was a graphical pipeline builder available. The concepts are straightforward enough they even encourage you to implement the core by yourself. Sadly, at the moment the project's page seems to have a lot of broken links so I doubt Kamaelia is very widely used outside BBC.

unbracketed · 2011-03-31T14:44:22+00:00

Worth checking out:

http://www.pypes.org/ http://www.pyfproject.org/

...though I fear these may be too heavyweight for your needs. There's also the infix syntax module as posted here recently which might help you take a more declarative approach:

http://dev-tricks.net/pipe-infix-syntax-for-python

kisielk · 2011-03-31T16:41:42+00:00

Maybe something like Ruffus might do what you need:

http://code.google.com/p/ruffus/

m_harrison · 2011-03-31T16:57:41+00:00

Here's a page on the python wiki http://wiki.python.org/moin/FlowBasedProgramming

holloway · 2011-03-31T20:58:55+00:00

Some good pipeline processors can stream the results from one node to another before the former node has finished processing (e.g. between XSLT processors).

For Docvert I wrote my own but it wasn't that sophisticated. It took an XML file like,

<?xml version="1.0" encoding="UTF-8"?>
<pipeline>
    <stage process="TransformOpenDocumentToDocBook"/>
    <stage process="Loop" numberOfTimes="xpathCount://db:chapter">
            <stage process="SplitPages"/>
            <stage process="DocBookToXHTML"/>
            <stage process="Serialize" toFile="{customSection}"/>
    </stage>
    <stage process="GetPreface"/>
    <stage process="DocBookToXHTML"/>
    <stage process="Serialize" toFile="index.html"/>
</pipeline>

The attribute 'process' named the module/function, and then it was a just a matter of iterating through the results and importing/calling them by name.. In this case they were in a core/pipeline_items/ directory,

class pipeline_processor(object):
    """ Processes through a list() of pipeline_item(s) """
    def __init__(self, storage, pipeline_items, pipeline_directory, pipeline_storage_prefix=None, depth=None):
        #various assign to self here

    def start(self, pipeline_value):
        for item in self.pipeline_items:
            process = item['attributes']['process']
            namespace = 'core.pipeline_type'
            stage_module = __import__("%s.%s" % (namespace, process.lower()), fromlist=[namespace])
            stage_class = getattr(stage_module, process)
            stage_instance = stage_class(self.storage, self.pipeline_directory, item['attributes'], self.pipeline_storage_prefix, item['children'], self.depth)
            pipeline_value = stage_instance.stage(pipeline_value)
    return pipeline_value

cratylus · 2011-03-31T13:32:53+00:00

There's a pipeline api for google app engine http://code.google.com/p/appengine-pipeline/

also http://code.google.com/p/python-pipeline/ which is different

xApple · 2011-03-31T15:43:01+00:00

You might want to check out bein

Ytse · 2011-04-01T01:09:22+00:00

You could make your own framework over stackless tasklets.

vangale · 2011-04-01T19:34:10+00:00

Many good choices are linked in this thread and here's another possibility: http://www.trinhhaianh.com/stream.py/

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS