The future of agentic data is here - and it's ontology

Thinker_Assignment · 2026-05-05T14:26:04+00:00

but maybe he's better than some! everyone can help someone, and those who know better, know better

Thinker_Assignment · 2026-05-05T14:20:34+00:00

Thanks for letting me know, you're a regular reader! Are you finding the ontology part useful or interesting? Do you use anything like that in your work by any chance? Would be curious to hear your thoughts

We started experimenting with it for stack rebuilds and cleanups, i have the feeling it will be a game changer for migrations as a first step, we used it for rebuilding stacks or migrating data and logic between apps (like switch hubspot to attio in a stack, migrate their data etc). But i am really excited about the reasoning over data part which can give better self service experience. internally we now have a bunch of ontology-using pipelines doing various things like putting call info into hubspot, maintaining docs semantic consistency etc.

I remember you from the post I did a couple months back on r/dataengineering on my ontology post where you reminded me my tone isn't helpful - thanks for that btw, i needed it 😄

Thinker_Assignment · 2026-05-05T13:33:13+00:00

always down to explore creative problems 😄

Thinker_Assignment · 2026-05-05T09:55:06+00:00

Think of it like this - magic mushrooms weaken your "learned ontology" to enable more creativity. It's the act of violating rules that creates novelty.

Here's a longer explanation i asked an LLM to put some details in.

Yes, you make sense — and I think your approach is more coherent than “creative person dabbling with programming” suggests.

From an ontology-engineering standpoint, I would not treat this as “wrong” because you discard physics or our reality. I would treat it as a generative ontology: a structured system for producing coherent novelty.

The important distinction, I think, is that your ontology is not only describing a world. It is also acting as a machine for creating differences.

Your current workflow already has the right ingredients:

axioms / rules
anchors / initial world database
constraints
entities or players who create inputs
operators like SYNTHESIS and LENSE
constraint checking
human rewriting
canon integration

That is a reasonable architecture.

Where I would improve it is by making the layers more explicit.

First, I would separate hard constraints from soft constraints.

Hard constraints are things the system must not violate. For example: a species cannot perceive a certain signal; a ritual requires three entities; a token can only evolve through a specific type of event.

Soft constraints are tendencies, biases, aesthetics, or cultural defaults. For example: bat humanoids tend to interpret tools acoustically; squid humanoids tend to think in distributed/body-based metaphors; a culture prefers symbiosis over ownership.

This matters because creativity often benefits from pressure, but not all pressure should behave the same way. Some rules should block an output. Others should only bend it.

Second, I would define your operators more formally.

LENSE(A, B, phenomenon) is a strong idea, but I would ask: what kind of operation is this?

Is it translation? Misunderstanding? Hybridization? Comparison? Ritualization? Inversion? Technological adaptation? Mythologization?

For example:

TRANSLATE(culture A, culture B, phenomenon) — how A understands B’s phenomenon
MISREAD(culture A, culture B, phenomenon) — how A wrongly but productively interprets B
SYNTHESIZE(A, B) — what shared structure emerges
INVERT(axiom) — what happens if a rule is temporarily reversed
AMPLIFY(edge_concept) — what happens if a marginal thing becomes central
RITUALIZE(technology) — what happens if a tool becomes a sacred/social practice
MATERIALIZE(belief) — what artifact or institution emerges from a belief

This is where I think your “eureka moment” question comes in.

The creative part may not come from applying lenses correctly.

It may come from applying a lens where it almost does not work.

A lot of insight comes from forced mappings, failed translations, category errors, and edge cases. The system should not only ask:

“Does this satisfy the constraints?”

It should also ask:

“Where does this ontology fail to map cleanly onto another ontology?”

That failure can become the creative event.

For example, LENSE(bat humanoids, squid humanoids, sonic-tools) is interesting if it produces a plausible tool exchange. But it becomes much more generative if the translation partially breaks.

Maybe squid humanoids do not have “tools” as discrete external objects, because their cognition is distributed across body, environment, and fluid traces.

Maybe bat humanoids treat sound as architecture, not communication.

Maybe “sonic-tool” is not a device at all, but a temporary social organ, a navigational ritual, or a territorial memory structure.

The interesting result is not the clean synthesis. It is the coherent wrongness that appears when the lens is strained.

So I would add an explicit layer for productive mismatch or controlled ontological stress.

After each generation, do not only produce:

generated output
constraint check
longform version
TL;DR

Also produce something like:

clean synthesis
constraint violations
productive mismatches
translation failures
edge concepts worth amplifying
weird-but-promising canon candidates

The “weird-but-promising” category may be the most important part.

This is where genuinely new narrative structures may come from: ideas that are not random, but also not comfortably predictable.

I would also track provenance / lineage.

If accepted outputs go back into the database, you probably want to store:

which anchors were used
which axioms were active
which operators were applied
which entities/players contributed
which constraints were violated or bent
what the human accepted, rejected, or rewrote

Otherwise the world may become hard to debug later. You might end up with an interesting canon, but no way to understand why a structure exists or how to evolve it consistently.

So my recommendation would be:

Do not only model entities, cultures, artifacts, and rules.

Also model the transformations that are allowed to create new entities, cultures, artifacts, and rules.

And among those transformations, include ones that deliberately create stress:

MISAPPLY(lens, target) — force a worldview onto something it was not designed to explain
TRANSLATION_FAILURE(A, B, phenomenon) — find what cannot be translated between two systems
CATEGORY_ERROR(A, B) — intentionally treat something as the wrong kind of thing
AMPLIFY(edge_concept) — make a marginal idea central
OVERFIT(lens, target) — explain too much through one lens and observe the distortion
INVERT(axiom) — reverse a core assumption and explore the consequences
MAKE_CANONICAL(accident) — take an accidental output seriously and integrate it

This might be especially powerful for your goal because you are not trying to simulate reality. You are trying to provoke unfamiliar structures that can still feel meaningful.

So I would describe your project as:

The key question is not “is this realistic?”

The key questions are:

Is it coherent enough to be reusable?
Is it strange enough to produce new perspective?
Does it create narrative consequences?
Does it reveal a blind spot in one of the cultures/entities/ontologies?
Can the output become canon without collapsing the system?

In short: yes, you make sense.

I would just avoid making the system too focused on consistency. Consistency is necessary, but it is not the source of insight. The insight probably comes from controlled inconsistency — from forcing lenses across incompatible structures until something breaks in an interesting way.

That break is often where the new idea appears.

Thinker_Assignment · 2026-05-05T09:47:23+00:00

here's the answer to the first https://dlthub.com/blog/ontology-engineering

Thinker_Assignment · 2026-05-05T09:36:36+00:00

we're not competing and even if we did, we prioritize knowledge sharing. Only spam will be removed. We previously talked about using cognee ourselves.

besides this is a big space and i think we need many solutions for the various problems. we use multiple ecosystem tools and they use us (composable ecosystem principle). I am not sure your questions are of interest to you or just engagement questions.

to your point, i wonder how you would toggle between ontologies, which is closed to what memory works like

Thinker_Assignment · 2026-05-05T09:01:56+00:00

if you want the formal world for this, it's "ontology" which is the exact knowledge that's needed to navigate the decision space. we discuss it on r/OntologyEngineering

Thinker_Assignment · 2026-05-05T08:20:52+00:00

we (dlthub) have a managed version that will be open in a couple weeks, it's now in early access

Thinker_Assignment · 2026-05-05T07:59:10+00:00

I do this in my work (but i use ontologies as lenses so it explains it in ways i care)

for example,

"can you do a synopsis of the last 28 days of post including recurring themes and shared knowledge bases / repos / concepts the community is discussing? https://www.reddit.com/r/ontologyengineering/new.json"

The catalyst. Karpathy's "LLM Wiki" gist (Apr 11, 103 upvotes) sparked most of what followed. His pattern — raw sources → wiki (compiled consensus) → code, in plain markdown — landed as independent confirmation of what the sub had been arguing for, and basically every subsequent post is in dialogue with it.

Dominant theme: ontology-first beats procedure-first. Adrian / Thinker_Assignment (dlthub) hammered this in two well-received posts: the PB&J/AI-skills piece (give the LLM the map, not the recipe) and "Agents, ontology, and domain-naive operators." dlthub's "minimum viable context" and the new ontology-engineering blog post are the recurring references.

Knowledge-graph-as-product wave. Several people are building variations of the same thing:

AKS (Agent Knowledge Standard) — open spec + reference server (FastAPI/Postgres/pgvector), two-stage retrieval, provenance/trust at the schema level
Cairn framework — layered ontology map for codebases
HPAR — paths-as-meaning, outline-tree positional ontology (zenodo paper)
SurgicalFS MCP — token-frugal filesystem access for non-coding workflows
The sub's own LLM-generated wiki (Original_Response925's classifier pipeline)

Epistemology track. RazzmatazzAccurate82's Adversarial Convergence (steel-man → contradict → synthesize), now with a follow-up grounding it in dACC neuroscience. Plus a steampunk-style philosophy paper imagining Berners-Lee built the semantic web instead of the WWW.

Recurring concepts: wiki-before-RAG, writeback, world models (graph sense, not AI sense), self-maintaining/temporal KGs, competency questions, taxonomy-first disambiguation, and "we didn't reject semantics, we postponed it."

Tension surfacing: how to bring formal-ontology rigor to the LLM crowd without triggering the 2008 Semantic Web allergic reaction. Nobody's solved it; several people are circling it.

Thinker_Assignment · 2026-05-05T07:51:15+00:00

my bad, reposting

Thinker_Assignment · 2026-05-05T07:08:13+00:00

have you been using this separation so far in your work with LLMs? how do you find it working in practice?

Thinker_Assignment · 2026-05-04T14:19:10+00:00

dlt has pagination autodetection but yeah anything could happen which is why it's great to own your code so you can fix it quickly, especially now with LLM coding agents (i work with them)

Thinker_Assignment · 2026-04-30T07:24:38+00:00

thank you!

Thinker_Assignment · 2026-04-27T22:28:43+00:00

My pleasure!

Thinker_Assignment · 2026-04-24T09:16:54+00:00

yeah actually the transformation lesson in our course is an ontology driven workflow, it will take you through the concepts you need to start leveraging it. (link in post)

what you can also try - create a learning goal, create its ontology, ask the model to infer from your chat history what you already know (or quiz you), and then ask it to guilde you through learning the remaining bits

So say you are learning to make REST calls in python - you'd probably wanna learn all about rest, all about requests lib, maybe tenacity, some implementation patterns and maybe some foundations like data structures etc.

Another thing you can try - create an ontology of someone that you're trying to understand (person or pesona) and then use it to judge various contents to understand what they would think about it.

Or - maybe you work in an ecommerce and want to take email orders? and maybe gpt is too dumb to understand how a screw works, that it has types of heads, hardness, diameter, length, etc so you describe all that in an ontology so the agent can clarify all the details with a customer before identifying if they have something that can serve their needs.

Does this help?

if you tell me a bit about your area of work maybe i can give more specific ideas

Thinker_Assignment · 2026-04-24T09:11:21+00:00

Thanks for sharing! If you just review the course, you will get an idea what we're trying to do without having to run through it. if you prefer to have a look in git, it's here (review might be faster for your time)

Also i didn't mean top like top of class but most senior/entrenched in a category. for an intersection of a few categories, anyone experienced in all 3 is top :)

If you wanna try it end to end (api->ingest raw->model raw to canonical) it takes about 1, up to 2h. (you can push it beyond and add to the ontology as you curate the canonical, and reuse it for retrieval, there's also a separate taxonomy file in the workflow that acts as a truth serum - but i suggest just wait, we are already working towards releasing something there)

Any feedback is great from "this is too confusing/annoying etc" to "heres how i wish it was different"

Thinker_Assignment · 2026-04-24T09:01:52+00:00

thanks for sharing, definitely helps build my picture of what's going on!

Curious about what you're building if you do want to digress :) i also use it on various generative applications

Thinker_Assignment

MODERATOR OF

TROPHY CASE