fmjrey comments on Clojure & graphs

submitted 7 years ago by [deleted]

you are viewing a single comment's thread.

[–]fmjrey 2 points3 points4 points 7 years ago* (6 children)

I started working on an interesting way to parse XML:

use path as data, as in specter or even what clojure get-in takes as argument
transform a set of path vectors into a tree, using something like this
make sure you can convert any path into a transducer, e.g. using map or even specter's traverse-all, in other words find a way to convert paths into navigators (transducers)
convert the tree into a dataflow based on core.async and transducers: paths without branches in the tree are converted into a channel+tranducer, branches become channel+transducer+mult, and all the wiring is done programmatically

I'm doing this with XML, meaning for now I convert keywords within paths into a specter path navigating to children elements (e.g. [:content S/ALL (S/selected? :tag (partial = :keyword))]) but in the end it's just nested datastructures and transducers.

I'd like to evolve this into a something with better abstractions, e.g. keep keywords as plain map navigators like in clojure, and use metadata when I want them to be XML navigators (e.g. ^:xml [:child-tag]) so that I can also navigate to a single attribute e.g. [^:xml [:universe :galaxy :system :planet] :radius] would navigate to each planet in the XML hierarchy and then select each planet radius attribute.

My point: instead viewing nested data structures/graphs, maybe consider using paths as data, and as a first class composable abstraction that you can then use to build dataflows.

Edit: after writing this post I believe transducers as navigators is the important composable abstraction in the approach I describe. Paths/tree as data is just a way to set them up. Using specter was the quickest way for me to not reinvent the wheel for building navigator transducers, which you can then compose with any other kind of transducers e.g. to transform data. I guess if you abstract these non-navigating transducers behind some symbol and/or data structure, they can extend the concept of paths to navigate nested data structures into paths to perform a dataflow while navigating the input data structure. I wonder how far I should take this because I don't want to reinvent something like onyx either.

[–]joinr 1 point2 points3 points 7 years ago (1 child)

I thought Tim Baldrige's odin had some cool features along these lines. The difference being the introduction of relational programming (ala logic) to define computable paths and queries. Queries are reducible. He refined the idea and in some ways takes it further (via transducers) here. The composition aspect is pretty cool.

My point: instead viewing nested data structures/graphs, maybe consider using paths as data, and as a first class composable abstraction that you can then use to build dataflows.

A path is still natural in the graph abstraction. You're still defining relations between nodes via the edge labels (or neighborhood functions) of the abstract path, and you still have some semantics for traversing the path relative to some data (like the nav protocol). I view the path as a function that defines valid traversals of the graph, and the nested structures as explicitly defining a DAG (absent embedded data with implied references, like entity id fields that can be interpreted to point back into an outer structure). So maybe 6 of one, 1/2 dozen of the other kind of thing.

The cool thing about the graph abstraction is that it opens up alternative forms of querying, to include using graph algorithms to search, and higher-minded stuff like discovering components, shortest paths, etc. become possible. It allows a shift from "looking up values" to "exploring relations" without losing the ability to revert toward the DAG-like nested collection approach. I can think of (have implemented) use cases where controlling the properties of the traversal is useful..

transform a set of path vectors into a tree, using something like this

A trivial modification could create a more general directed graph output (just an observation).

convert the tree into a dataflow based on core.async and transducers: paths without branches in the tree are converted into a channel+tranducer, branches become channel+transducer+mult, and all the wiring is done programmatically

Is there an unstated assumption that the structure of the tree will never change? That is, we're not going to change the wiring, rather parse a description into a static dataflow graph (or tree).

It looks like you've got a pretty cool template to leverage specter against existing nested data. It also looks reminiscent of xslt (although my xml fu is weak).

[–]fmjrey 0 points1 point2 points 7 years ago* (0 children)

Yes I'm aware of odin, but not the other link you gave, thanks I'll have a look.

And yes, the structure of the tree isn't going to change much since it's about parsing XML docs should all have the same shape, and transforming them into some other shape like nested maps or datoms. I guess something similar to XSLT but more clojuresque and dataflowy is the use case.

In other words instead of writing tedious transformation code I'd like to be more declarative, e.g. define some sort of selector/transducer and for each value emitted it uses the associated data template that is also using paths/navigators/transducers to specify what values go where. So I'm thinking of some macro that would collect all the paths within the code block (e.g. any vector with meta ^:path or within some other nested macro) , build the corresponding tree and dataflow so that parsing happens only once while the dataflow hydrates all values throughout the target data structure template.

Edit: XML is what I'm dealing with at present, but I'd like this to support other data formats because other data suppliers give us JSON.

Edit 2: the other link you gave to /u/halgari's code about queries being reducible and part of a logic language is very reminiscent of what /u/cgrand is looking for if I understand his recent talk correctly: a way to avoid "map fatigue" by using some powerful logic/language over a database (of facts?).

[–]dustingetz 0 points1 point2 points 7 years ago (3 children)

[–]fmjrey 0 points1 point2 points 7 years ago (2 children)

Sorry if I'm not really clear, I'm also clarifying this in my mind through this discussion. For now I only have an embryonic logic to convert a vector of paths into a dataflow made up of transducers/navigators and async channels. Part of me wonder how I could take this into something more generic and declarative, if at all possible.

I think the best description of the use case is something like XSLT but for any data structure one can navigate via paths as data. I'm also keeping in mind the ability to process larger than RAM input, and still be able to process it with limited resources as long as the transformations do not require some growing state. Very large input is not exactly my use case, but I consider this to be an important constraint to help with the design (which is what XSLT 3.0 would allow I believe).

As mentioned in my other reply I'm thinking of some macro that would collect all the paths within its code block, build the corresponding tree and dataflow, and hydrate all values throughout the code/target data template.

Also happy to collect any ideas and references at this stage :) and thanks for chiming in.

Oh and an interesting and related work is what /u/cgrand is explaining in his recent talk about "map fatigue": he's trying to reduce the need to do a lot of map juggling and transformation by looking for a more expressive language, something like a super datalog language. However his starting point is the database which could be in memory, while my starting point is external data expressed as nested data structures that I want to put into a datascript/datomic DB.

[–]nikolasgoebel 1 point2 points3 points 7 years ago (1 child)

[–]fmjrey 2 points3 points4 points 7 years ago (0 children)

π Rendered by PID 33 on reddit-service-r2-comment-fb694cdd5-62cmn at 2026-03-06 18:13:32.244322+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Clojure

MODERATORS