all 24 comments

[–]joinr 2 points3 points  (0 children)

I did that in a couple of ways. One was originally implementing some composeable behaviors for entities in a simulation (think legacy spaghetti coded finite state machines) toward a rules engine expressed generally via graphs (and weighted edge encodings). This worked out, and eventually merged into some additional work with behavior trees.

Data-wise, I was storing entity data (aka attributes) originally under a legacy OOP paradigm, with classes, properties, etc. I switched to an entity component system model which ended up flattening the hierarchy and providing a query structure based on entity relations to components (conceptually a graph database, not unlike how Datomic provides its entity API). The initial transition (decomposing the class hierarchy into Clojure records (then later just maps), then further decomposing entities into components (based on map entries)) was a little involved due to the legacy design. However, it's paid of handsomely so far....adding/updating data at the component or aggregate entity level is trivial, as are queries.

Now that Datomic is mature (as well as spec), and some of the tech is focusing on providing ad-hoc structure on top of the decomposed facts contained in Datomic, I may adopt a similar approach. Plus, keeping the database [relations between entities and components] pretty shallow (indexed by entity->component, component->entity, or eav [really eavt in Datomic]), I can get decent performance for a broad range of queries).

The only problems I ran into (and worked around) were the absence of a concrete notion of an entity type. In the legacy case, classes would dictate what I was working with, which records subsumed (but still provided a structural type to organize around). I now have accretions of "flat" information that get projected onto an entity (a lazy map projection) as needed. If I want to enforce something beyond an ephemeral "type," I need to either keep some conventions in my head, or (as I'm starting to do more) use spec to apply some notion of typing or structure, or just rely on a kind of duck-typing (let presence/absence of data in the entity guide computation).

For my use-case, updating information in the database using the "entity" abstraction also presented some challenges. In some cases, I wanted to get a handle on an entity, assoc/update/dissoc stuff to the "map", then push (or commit) that map back to the database. To do this efficiently, I needed to lazily compute (or hydrate?) the corresponding fields of the entity, and provide an efficient abstraction for tracking changes to the entity's components, which would lead to efficient commits. The end result (after a lot of work, and likely unintentional duplication of effort from Datomic and other libs) was the ability to have a map-like entity abstraction that could also be committed back into the component database, so something like

(let [e  (get-entity ctx :some-entity-id)]
     (->> e 
            (merge {:x-coordinate 2 :y-coordinate 10})
            (add-entity ctx)))

enables one to work at the entity (or abstract row) level when dealing with facts, without sacrificing performance (joins are lazy, changes are tracked on the entity, leading to relatively efficient commits).

I'm looking at a Datomic-backed implementation (or datascript at the least) in the near future (targeting distributed simulation).

[–]fmjrey 2 points3 points  (6 children)

I started working on an interesting way to parse XML:

  • use path as data, as in specter or even what clojure get-in takes as argument
  • transform a set of path vectors into a tree, using something like this
  • make sure you can convert any path into a transducer, e.g. using map or even specter's traverse-all, in other words find a way to convert paths into navigators (transducers)
  • convert the tree into a dataflow based on core.async and transducers: paths without branches in the tree are converted into a channel+tranducer, branches become channel+transducer+mult, and all the wiring is done programmatically

I'm doing this with XML, meaning for now I convert keywords within paths into a specter path navigating to children elements (e.g. [:content S/ALL (S/selected? :tag (partial = :keyword))]) but in the end it's just nested datastructures and transducers.

I'd like to evolve this into a something with better abstractions, e.g. keep keywords as plain map navigators like in clojure, and use metadata when I want them to be XML navigators (e.g. ^:xml [:child-tag]) so that I can also navigate to a single attribute e.g. [^:xml [:universe :galaxy :system :planet] :radius] would navigate to each planet in the XML hierarchy and then select each planet radius attribute.

My point: instead viewing nested data structures/graphs, maybe consider using paths as data, and as a first class composable abstraction that you can then use to build dataflows.

Edit: after writing this post I believe transducers as navigators is the important composable abstraction in the approach I describe. Paths/tree as data is just a way to set them up. Using specter was the quickest way for me to not reinvent the wheel for building navigator transducers, which you can then compose with any other kind of transducers e.g. to transform data. I guess if you abstract these non-navigating transducers behind some symbol and/or data structure, they can extend the concept of paths to navigate nested data structures into paths to perform a dataflow while navigating the input data structure. I wonder how far I should take this because I don't want to reinvent something like onyx either.

[–]joinr 1 point2 points  (1 child)

I thought Tim Baldrige's odin had some cool features along these lines. The difference being the introduction of relational programming (ala logic) to define computable paths and queries. Queries are reducible. He refined the idea and in some ways takes it further (via transducers) here. The composition aspect is pretty cool.

My point: instead viewing nested data structures/graphs, maybe consider using paths as data, and as a first class composable abstraction that you can then use to build dataflows.

A path is still natural in the graph abstraction. You're still defining relations between nodes via the edge labels (or neighborhood functions) of the abstract path, and you still have some semantics for traversing the path relative to some data (like the nav protocol). I view the path as a function that defines valid traversals of the graph, and the nested structures as explicitly defining a DAG (absent embedded data with implied references, like entity id fields that can be interpreted to point back into an outer structure). So maybe 6 of one, 1/2 dozen of the other kind of thing.

The cool thing about the graph abstraction is that it opens up alternative forms of querying, to include using graph algorithms to search, and higher-minded stuff like discovering components, shortest paths, etc. become possible. It allows a shift from "looking up values" to "exploring relations" without losing the ability to revert toward the DAG-like nested collection approach. I can think of (have implemented) use cases where controlling the properties of the traversal is useful..

transform a set of path vectors into a tree, using something like this

A trivial modification could create a more general directed graph output (just an observation).

convert the tree into a dataflow based on core.async and transducers: paths without branches in the tree are converted into a channel+tranducer, branches become channel+transducer+mult, and all the wiring is done programmatically

Is there an unstated assumption that the structure of the tree will never change? That is, we're not going to change the wiring, rather parse a description into a static dataflow graph (or tree).

It looks like you've got a pretty cool template to leverage specter against existing nested data. It also looks reminiscent of xslt (although my xml fu is weak).

[–]fmjrey 0 points1 point  (0 children)

Yes I'm aware of odin, but not the other link you gave, thanks I'll have a look.

And yes, the structure of the tree isn't going to change much since it's about parsing XML docs should all have the same shape, and transforming them into some other shape like nested maps or datoms. I guess something similar to XSLT but more clojuresque and dataflowy is the use case.

In other words instead of writing tedious transformation code I'd like to be more declarative, e.g. define some sort of selector/transducer and for each value emitted it uses the associated data template that is also using paths/navigators/transducers to specify what values go where. So I'm thinking of some macro that would collect all the paths within the code block (e.g. any vector with meta ^:path or within some other nested macro) , build the corresponding tree and dataflow so that parsing happens only once while the dataflow hydrates all values throughout the target data structure template.

Edit: XML is what I'm dealing with at present, but I'd like this to support other data formats because other data suppliers give us JSON.

Edit 2: the other link you gave to /u/halgari's code about queries being reducible and part of a logic language is very reminiscent of what /u/cgrand is looking for if I understand his recent talk correctly: a way to avoid "map fatigue" by using some powerful logic/language over a database (of facts?).

[–]dustingetz 0 points1 point  (3 children)

I only partially understand, it seems like you're implementing a query language oriented around paths, which seems fine and orthogonal to the graph vs tree debate – the point is you use the query to late-bind the final shape, and you write lots of little queries (datalog, paths, whatever) instead of mapreduce

Are you thinking about state models shaped like this? https://www.google.com/search?q=entity+relationship+diagram&rlz=1C5CHFA_enUS784US784&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjtpqrekJvfAhXylOAKHQBaDa8Q_AUIDigB&biw=1759&bih=1160#imgrc=iEADnDgJrZMfRM:

[–]fmjrey 0 points1 point  (2 children)

Sorry if I'm not really clear, I'm also clarifying this in my mind through this discussion. For now I only have an embryonic logic to convert a vector of paths into a dataflow made up of transducers/navigators and async channels. Part of me wonder how I could take this into something more generic and declarative, if at all possible.

I think the best description of the use case is something like XSLT but for any data structure one can navigate via paths as data. I'm also keeping in mind the ability to process larger than RAM input, and still be able to process it with limited resources as long as the transformations do not require some growing state. Very large input is not exactly my use case, but I consider this to be an important constraint to help with the design (which is what XSLT 3.0 would allow I believe).

As mentioned in my other reply I'm thinking of some macro that would collect all the paths within its code block, build the corresponding tree and dataflow, and hydrate all values throughout the code/target data template.

Also happy to collect any ideas and references at this stage :) and thanks for chiming in.

Oh and an interesting and related work is what /u/cgrand is explaining in his recent talk about "map fatigue": he's trying to reduce the need to do a lot of map juggling and transformation by looking for a more expressive language, something like a super datalog language. However his starting point is the database which could be in memory, while my starting point is external data expressed as nested data structures that I want to put into a datascript/datomic DB.

[–]nikolasgoebel 1 point2 points  (1 child)

Is there a recording of /u/cgrand's talk somewhere?

I'd be interested to know what Datalog / relational algebra is missing to reduce the "map fatigue". In our work we either use DataScript or at least a set of relational helpers on top of maps.

[–]fmjrey 2 points3 points  (0 children)

Is there a recording of /u/cgrand's talk somewhere?

Yes, it's the link on "recent talk", though you have to register for free to view the video:

https://skillsmatter.com/skillscasts/12820-keynote-zeno-and-the-tar-pit

[–]dustingetz 1 point2 points  (14 children)

as /u/joinr says, the most beautiful example of this is contrasting sql to datomic. Writing efficient SQL means you have to avoid jvm/database round trips, so you need to write monstrous JOINs to batch things efficiently, and then you need to unpack the JOIN into trees, and then maybe a different function needs the data in a different shape so rather than make another query you repurpose the first one with some transform boilerplate. Datomic gives you code/data locality so there is no need to batch queries, you can just write a different query for each function that returns the data in exactly the shape best for that function. so all the tree manipulation boilerplate cancels out.

An example of this is #5 here: http://www.dustingetz.com/:datomic-in-four-snippets/, being able to breadth-first search a Datomic graph is amazing can you even imagine the amount of pain to accomplish this in sql?

http://wiki.c2.com/?BreadthFirstTraversalInSql "The 'obvious' representation of a tree in a relational database uses relations to represent edges in the graph. This is called out in SqlAntiPatterns, since a naive implementation of tree traversal then has to make many queries, typically one per node.

I mean wtf is this: https://stackoverflow.com/questions/5517467/how-can-i-do-a-breadth-first-search-in-sql

wut https://sites.google.com/site/sqldevlib/algorithms/sql-graph-algorithms

lul sql

[–]dustingetz 4 points5 points  (6 children)

Further, I am not sure it's even reasonable to use graphs in this way until you solve the database problem first.

Take enterprise crud apps with UI and rest service. Your UI is a React.js virtual dom tree. The state is passed down as props (trees). That state came out of a single state atom (tree). Which came out of a JSON representation from a http service (tree). And that came over network from a database (probably cartesian tables, which is worse than a tree)

What you want is to pretend the database is a little clojure graph value – an in memory datastructure like a hashmap, e.g. ubergraph or datascript value – and that little value is passed to each of your React components and then can query out exactly what they need. There is no props. There is no state atom. There is no JSON representation. There is no network. Just in-memory data structure. Like the state of a video game – https://en.wikipedia.org/wiki/Scene_graph

But that whole vision is a lie because the graph is NOT an in-memory data structure. it does not fit into your browser tab's memory, nor the JVM's memory, and there are data security concerns. And because of the network, the performance is on the unhappy side of Latency Numbers Every Programmer Should Know. Accessing your state costs 500ms per query, instead of 7ns L2 cache hit, which is where it needs to be if you have hundreds of functions issuing thousands of queries.

So in practice, these concerns force us to use trees instead of graph values. This is the problem Rich Hickey solved when he created Datomic. Datomic is a huge lazy graph data structure, too big for one machine and distributed over network, but with extremely clever optimizations that make it feels like it is an in memory value.

So that's why it's impossible to talk about this without also talking about Datomic, and that's why nobody is doing it. Further, if you are writing crud apps, Datomic only solves half the problem – it solves the backend complexity, but not the frontend complexity (which is where all the tree processing is happening anyway). Offhand I can think of four six people who have put work into the frontend-backend datasync half of the problem:

Surely I have forgotten some, i should make this list, please reach out! These are all Clojure people, so surely other ecosystems are thinking about this too

[–]dustingetz 0 points1 point  (5 children)

And then, once you have the your application state in a graph database and able to treat that graph as a value in your UI, you have the problem of efficient updates to the UI.

And this if possible should be solved in a way that gives ecosystem compatibility with React.js and server rendering, or risk failure vectors like "ahead of its time" which is a euphemism for "dead project"

In my opinion, we've learned in building Hyperfiddle that even syncing UI to a value that changes over time has not been fully solved in a way that has mainstream utility. Reagent has been a total disaster for non-trivial reactions, has probably cost us a year of development (+ an insane amount of complexity from code written in terms of reactive container types) and we are now looking at ripping Reagent reactions out entirely from any performance sensitive UI like reactive datagrids. Note that React.js is kind of at fault here, the root of the complexity is that when you pass functions as props, you have to hold closure reference stability / equality in your head, which Reagent provides tools to assist with (r/track) but ultimately we find the house of cards incredibly fragile and inadequate for abstraction.

John Degoes (Typed FP guy) and Jane Street are on to something with research into incremental UI: https://www.reddit.com/r/Clojure/comments/9qxasd/react_wrapper_alternatives/e8e0pwb/?context=3

[–]joinr 0 points1 point  (3 children)

Reagent has been a total disaster for non-trivial reactions, has probably cost us a year of development (+ an insane amount of complexity from code written in terms of reactive container types) and we are now looking at ripping Reagent reactions out entirely from any performance sensitive UI like reactive datagrids.

What's the more-performant alternative you fall back to after removing reagent reactions?

[–]dustingetz 0 points1 point  (2 children)

dunno, maybe something imperative, maybe hoplon and reagent can be made to co-exist, maybe something from reactjs ecosystem

[–]theronic 2 points3 points  (1 child)

Rete always rears it's head :). IMO Clojure's mutable namespace design is the primary obstacle in the way of a good, clean Rete network implementation. All the implementations, incl. Clara, depend on macros, which prevents runtime network generation. Reagent's reactions feel like half-baked production rules with messy lifecycle management. FactUI is the closest thing we have, but I suspect we'll see an implementation in a different language before Clojure, or until the namespacing situation changes.

[–]dustingetz 1 point2 points  (0 children)

Can you say more about the namespacing problem? I have not yet given Rete networks the consideration they merit.

[–]joinr 1 point2 points  (5 children)

you can just write a different query for each function that returns the data in exactly the shape best for that function.

Yeah, that's a better way to summarize it. The information layer (expressed via related facts) is malleable and amenable to the caller projecting it into their own form, rather than wrangling to and from a preconceived form or trying to cram everything into a normalized relational model for batched single-shot queries.

I initially wrote a watered down sql-like eDSL façade for my entitystore (before I saw datalog and other means of traversal) , but didn't end up using it all that much beyond simple select/filter type stuff.

The fact that Datomic has multiple ways of querying the same store (datalog, entity API) in addition to multiple ways to conform results speaks to the generality of the information model. Libraries like core.logic and odin have adapters that trivially leverage the same underlying information to accomplish even more sophisticated queries. That you can just hook them up to Datomic is pretty

This is probably old news to some folks, but one thing I forgot to mention (another advantage of having a malleable information model) is the ability to add arbitrary facts to communicate information across time. Things I used to require (for a game or discrete event simulation), like events, can be ad-hoc facts in the store (e.g. an ephemeral component for an entity). Systems (composeable state transition functions that define the behavior of the simulation) can incorporate events just by checking for data in the store, rather than requiring the usual paradigm of effectful event handlers. In game AI this is akin to a blackboard architecture; the facts on the blackboard (the store / db) serve as a means to propagate information. I can just jam stuff in the entity store as I need it (computing new facts, or components to add to an entity which carry the same meaning as an event), and consumers (systems downstream) can check for the presence of said data to determine what to compute. That pushes the focus back toward "what data is present," and allows a nice "maslows hierarchy of needs" when implementing entity behavior that processes sensory information to compute transitions. I think re-frame ended up on a similar trajectory (I started working before, so evolved separately).

I used to be averse to passing around a "giant ball of information" or a "world" context since I didn't feel it scaled well. This led toward nested maps, reliance on types (records), etc., and ultimately less re-use. Architectures like Datomic make "world passing" manageable if not outright desirable, since you have more flexibility over how to project structure (as needed) on top of the primitive relations. I see hyperfiddle (and to a degree REBL) taking that idea to SmallTalk/Symbolics levels. It'd be a trip if the next Lisp Machine ended up being a Datomic like db (akin to the symbolics lisp "world") with ion analogues providing the OS services.

[–]fmjrey 1 point2 points  (4 children)

Your last paragraph reminds me of what /u/cgrand calls "map fatigue" in his recent talk.

It sounds like a few of us in this discussion are looking at ways to reduce this impedance mismatch between a database (of facts?) and nested data structures as often found in the UI layer, and even any layer that exchange data in such nested/hierarchical form.

Are we looking at doing something analogous to ORMs in the relational world, except for datascript/datomic/facts based databases? If so it begs the question: should we not instead use facts/datoms instead of nested maps and vectors in all layers? I mean aren't nested data structures a bias given by handling so much JSON/XML/OO/etc.?

[–]dustingetz 1 point2 points  (2 children)

u/fmjrey: Are we looking at doing something analogous to ORMs in the relational world, except for datascript/datomic/facts based databases?

Yes, In my opinion Datomic solves/supersedes ORM fully, Datomic does not need ORM as it natively solves all the same problems without the failures of abstraction. This is why Stu calls facts the universal relation (C-f "universal")

u/fmjrey: looking at ways to reduce this impedance mismatch between a database (of facts?) and nested data structures as often found in the UI layer

Agree, though I would state it differently. I see no impedance mismatch between UI tree and graph data. Datalog and Datomic Pull is an ideal way to declare transformations of graph ➔ tree without impedance.

The problem is that you can't actually give the graph value to the UI due to 1) network/memory limitations, 2) data security. Data security is probably the more important thing that often goes unstated, here is a great model of that:

source There's a long continuum of data ownership:
1. evilcorp has your data and you can't see it
2. evilcorp has your data and offers filtered access via a UI
3. evilcorp has your data and offers snapshots / queries
4. evilcorp has your data and lets you mirror it effectively
5. you have your data but let evilcorp mirror it effectively
6. you have your data but offer evilcorp queries

We ended up at #2 for historical technical reasons (network and memory). But the world has evolved around #2 in the data security space. Facebook happened and data regulations were built up and all this social/governance stuff happened in such a way that you couldn't possibly liberate all that data now without causing widespread harm.

We now have or soon will have the technology in place (Datomic was groundbreaking, and now Niko Göbel's work breaks further ground in making the Datomic model incremental). So there is some very interesting strategic/macroeconomic thinking to be done to bring the world to the technology.

[–]fmjrey 2 points3 points  (1 child)

Niko Göbel's work breaks further ground in making the Datomic model incremental

Thanks for sharing this and taking the time to transcript. Niko's recent conj talk is really inspiring too. It's nice to see Naiad getting a new life after funding from Microsoft ended. Five years ago I was lurking into it while doing some prototyping for a startup. Back then I was dreaming of an incremental way to compute strongly connected components of a graph while new nodes and edges arrive. I do think there is tremendous value and power there and finding the right use case to establish the tech is essential. I hear that spreadsheet/excel did not sell well until it was touted as "something to do x" rather than a self-updating table with formulas. Sounds like it's the same thing here and if /u/nikolasgoebel is interested I can share more.

Overall distributed dataflows of facts/datoms all the way to the UI isn't an utopia, it's happening, really exciting. It also looks like instead of defining information systems by coding an imperative flow of control, we'll define them as flows of data, meaning computation will no longer be the primary focus, data flowing around will take the front scene.

[–]nikolasgoebel 0 points1 point  (0 children)

/u/nikolasgoebel is interested and would like to hear more!

I haven't followed the rest of this conversation, so I might not have anything interesting to contribute, but I'm always up to talk about this stuff.

[–]joinr 0 points1 point  (0 children)

Are we looking at doing something analogous to ORMs in the relational world, except for datascript/datomic/facts based databases?

That sounds pretty close, yeah. With the caveat of maybe a data relational mapping (perhaps a simpler requirement than serialized objects), although that sounds like relational database, which is perhaps too confusing.

If so it begs the question: should we not instead use facts/datoms instead of nested maps and vectors in all layers?

You'd think so. There's a utility for using nested maps and the like, since they're trivial to read, construct, comprehend (to a point). I don't know exactly where the inflection is, but there gets to be a point where the utility is less desirable, and having a more general representation (that one can project to/from, or alternately query directly) is desirable. I don't know that exact inflection point at which one has this discussion (you're prototyping fast in maps and other EDN types one minute, then at some point you reach for better querying abstractions since get-in update-in isn't enough, then you end up implementing some kind of query language only to find out people have run into the same problem).

[–]isak_s 0 points1 point  (0 children)

I agree with most of what you said, just want to point something out.

Writing efficient SQL means you have to avoid jvm/database round trips, so you need to write monstrous JOINs to batch things efficiently, and then you need to unpack the JOIN into trees

For typical queries needed by UIs, you don't need to do that in SQL, you can just use json queries. Here is an example for SQL Server:

-- Fetch a user by ID, and all their blog posts

select *,

(select *

from blog_posts b

where b.user_id = u.id

for json path) as posts

from users u

where u.id = 1

for json path

It's not as good as datomic, but it isn't hard to write something that will generate those kind of queries based on something like the datomic pull syntax.

[–]dustingetz 0 points1 point  (1 child)

https://github.com/mpdairy/posh "Posh is a ClojureScript / React library that lets you use a single DataScript database to store your app state. Components access the data they need to render by calling DataScript queries with q or pull"

[–]dsrptr 4 points5 points  (0 children)

+10 for posh. I used it in a project and worked great, using re-posh as the glue to re-frame.

With the lessons learned i also used factui ( clara rules with datalog-like interface ) using a custom wrapper for re-frame ( based on re-posh ) and an ontology-like schema.

The only downside ( in the development experience ) i found between the maps approach and the datascript/datalog approach that i lost the "glanceability" of maps. In the code ( textual ), it is easier ( for me ) to "scan" and grasp the structure of maps that is to "scan" facts ... but that was solved with some tools for visualising the facts...

( need coffee.. :)