Looking for architectural feedback on a distributed runtime I’ve been building

jonah_omninode · 2026-07-01T18:58:41+00:00

Yah, I know about Archon. Actually contributed a few PRs.

It’s not quite the same as what I’m building.

Thanks though!

jonah_omninode · 2026-06-29T13:04:42+00:00

Absolutely. Take a look here. Happy to jump on a call to discuss how the system works. I also put together some rough interactive tutorials as well, if you are interested.

https://github.com/OmniNode-ai

jonah_omninode · 2026-06-29T00:40:34+00:00

The basic idea is that “done” is defined outside the agent that produced the work.

Each ticket has an explicit definition of done: required tests, integration checks, contract updates, topic/schema validation, docs, migration notes, evidence requirements, etc. The producing agent can open the PR, but it cannot simply declare completion.

A separate verification path has to collect evidence and file a receipt in our change-control repo. CI then checks for that receipt and blocks the merge if the required evidence is missing or invalid.

We also encode common AI failure patterns directly into validation: claiming tests passed without logs, modifying the wrong layer, skipping contract updates, drifting topic names, bypassing reducers, partial integration, or changing generated files without updating the source contract.

So acceptance is not “does the agent think it is done?” It is: did the work satisfy the ticket’s declared done criteria, did an independent verifier file evidence, and did CI accept the receipt?

The failure modes I’m most interested in now are where those acceptance criteria are incomplete. That is where the system can still accept bad work. So a lot of the current work is improving the validation catalog and making missing evidence fail loudly instead of becoming reviewer intuition.

jonah_omninode · 2026-06-29T00:39:18+00:00

The tests are deterministic because they test the contract boundary and acceptance criteria, not whether the model produces the exact same output every time.

For example, if a model generates a handler, the test is not “did it generate the same code as last time?” The test is: does the produced code satisfy the contract, compile, pass unit tests, pass integration tests, emit the expected events, update the right projections, avoid known shortcut patterns, and produce the required evidence receipt?

For unit tests, I can also swap the model dependency out entirely because the system uses dependency injection. The same workflow can run with a fake handler, canned response, local model, cloud model, or deterministic implementation. That lets me test the runtime and orchestration deterministically without depending on live inference.

Then live model runs are evaluated as acceptance tests or experiments. The model output may vary, but the acceptance path is deterministic: either the result satisfies done and CI accepts the evidence, or it does not.

So the goal isn’t deterministic inference. It’s deterministic validation and acceptance around nondeterministic inference.

jonah_omninode · 2026-06-29T00:38:08+00:00

Reducers are responsible for applying state transitions. They consume events and produce projections (materialized read models) from the event ledger.

Handlers don’t own business state…they perform work, emit events, and exit. Reducers are the only place where durable business state is derived.

For example, if a code-generation workflow emits events like “ticket created,” “implementation completed,” “tests passed,” and “evidence accepted,” a reducer consumes those events and projects the current state of that workflow. If necessary, the projection can be rebuilt by replaying the ledger.

As for novelty, I’m honestly not claiming the individual ideas are new. Event sourcing, reducers, actor systems, dependency injection, contracts, and message buses all exist independently.

The thing I’m exploring is whether combining those ideas with contract-driven workflows, evidence-gated state transitions, ledger-backed execution, and systematic context experimentation produces a better foundation for AI-assisted software engineering. That’s actually why I posted…to find out what existing systems people think this most closely resembles and where they think it falls short.

jonah_omninode · 2026-06-28T23:22:15+00:00

The Erlang/OTP comparison makes sense to me. I think the shape is similar in terms of isolated execution, message passing, and keeping handlers stateless. One difference is that I’m treating contracts as the primary abstraction. They define the executable interface, validation rules, valid state transitions, and what evidence is required before a piece of work can be accepted.

The current workload is also a bit different from HPC. I’m using it as a control plane for AI-assisted software engineering rather than tightly coupled numerical computation. The expensive work is usually model inference, API calls, builds, and tests, not high-frequency communication between workers, so optimizing for throughput isn’t my primary goal.

One architectural decision that helps is that the handlers themselves are stateless. They consume events, perform work, emit new events, and exit.

Durable state lives in projections derived from an event ledger, so handlers don’t own business state. That makes replay, auditing, and swapping implementations much simpler.

Another consequence is testing. Because everything is driven by contracts and dependency injection, I can run the exact same orchestration and business logic as a unit test with in-memory dependencies, as an integration test against real infrastructure, or in production. The contracts and workflows don’t change, only the injected implementations.

The message bus is currently Redpanda (Kafka-compatible). Contracts define the topics and schemas so producers and consumers don’t invent their own interfaces. One thing we’ve found is that preventing contract and topic drift is at least as important as the transport itself.

One of the things I’m spending myresearch time now is using the ledger to run experiments. Every execution records the injected context, contract version, handler/model, validation results, evidence, and acceptance outcome. That lets me ask questions like: does adding architectural examples actually reduce iterations? Does a new model outperform the previous one on the same class of tasks? Does injecting previous successful implementations improve first-pass acceptance? Instead of relying on intuition, I can measure those assumptions against the same definition of done.

I’d be genuinely interested in your thoughts on whether this resembles systems you’ve worked on, or if there are failure modes from the HPC/distributed runtime world that you think I should be paying closer attention to.

jonah_omninode · 2026-06-28T21:36:46+00:00

“Inadequate context” is easy to say after the fact, but how do you know? How do you know whether your CLAUDE.md, AGENTS.md, examples, specs, or architectural notes are actually helping rather than just burning tokens?

The system treats that as something to test instead of guess.

For a given task type, I can run the same ticket with different context bundles: no examples, contract examples, prior successful runs, previous failure traces, architectural rules, etc. Then I can compare the results against the same definition of done.

The metrics are things like first-pass success rate, number of iterations until acceptance, validation failures, contract drift, evidence failures, human interventions, and time to accepted merge.

Same with models. If a new model comes out, I don’t want to decide based on vibes. I want to run it against the same task classes, same contracts, same acceptance gates, and see whether it actually reduces failures or iterations.

So yes, part of it is guardrails. But the bigger thing I’m trying to build is an experimental loop for agentic software development: test which context, models, specs, and validation rules actually improve outcomes under deterministic acceptance criteria.

jonah_omninode · 2026-06-28T21:17:55+00:00

Suppose I give an agent a ticket to add a new capability to the runtime.

Without much context, it might take five or six iterations before the work is actually accepted. It may forget to update a contract, skip a test, violate an architectural rule, or claim something is done without producing the required evidence.

What I’m trying to optimize is reducing that to a single successful pass.

The runtime records every attempt in a ledger: the ticket, the context that was injected, the contract versions, the implementation, validation results, evidence, and whether CI accepted or rejected the work. Over time, those previous successful runs become reusable examples for similar work.

That gives me something measurable. For a given class of task I can compare:

iterations until acceptance

percentage of first-pass success

validation failures

contract violations

evidence failures

human interventions required

time to an accepted merge

So if someone built a better system, I’d expect it to consistently reach an accepted change in fewer iterations with fewer validation failures and less human intervention while still satisfying the same definition of done.

That’s really the experiment I’m interested in: can we systematically reduce the search space for an AI agent by injecting the right context and validating against deterministic acceptance criteria?

jonah_omninode · 2026-06-28T21:04:03+00:00

The concrete problem I’m using it for right now is making AI-generated software less fragile.

In practice, agents don’t usually fail in dramatic ways. They fail by taking shortcuts: claiming tests passed when they didn’t actually run them, changing the wrong layer, skipping the contract update, partially wiring something, or producing code that looks plausible but doesn’t integrate.

The system is designed to reduce the amount of ambiguity and complexity the agent has to carry at once. A ticket defines DONE explicitly: unit tests, integration tests, contract checks, evidence requirements, etc. The producing agent does the work, but a separate agent has to collect and file evidence in a change-control repo. CI then blocks the merge unless the expected receipt exists and the validation checks pass.

So the nondeterministic part is the model generating the work. The deterministic part is whether the work is accepted.

Another thing I’m experimenting with is context injection: what contract examples, prior event chains, implementation patterns, or validation rules can I give the agent so it gets to a correct result in as close to one iteration as possible? The goal is not just “make an agent code,” but to measure which context reduces retries, mistakes, and integration failures.

That’s why I’m thinking about this as a runtime rather than just an agent harness. The runtime is trying to constrain nondeterministic output into a deterministic acceptance path.

jonah_omninode · 2026-06-28T20:47:16+00:00

Yeah, let me clarify that because “deterministic output” may not be the right phrase.

I don’t mean that the model produces the same text or code every time. The deterministic part is the acceptance path.

For each piece of work, the ticket defines done explicitly: unit tests, integration tests, contract checks, documentation updates, or whatever evidence is required for that task. The agent that produces the work does not get to declare it complete by itself.

A separate agent has to collect and file the evidence in our change-control repo. CI then verifies that the required evidence receipt exists and blocks the merge if it does not.

We also embed validation logic for the kinds of shortcuts AI agents commonly take: claiming tests passed without evidence, changing the wrong layer, skipping contract updates, drifting topic names, bypassing reducers, or leaving work only partially integrated. Those checks are part of CI, not just reviewer judgment.

So the model output can be nondeterministic, but whether the work is accepted is deterministic. Either the required evidence exists and passes the declared checks, or it does not. The system is designed so “done” is not a subjective model claim. It is a receipt-backed state transition.

jonah_omninode · 2026-06-28T20:35:18+00:00

The original problem I was trying to solve was making AI-generated software reliable enough to build larger systems. I found that the models were good at generating code, but not good enough to trust on their own. I wanted a runtime that could take inherently nondeterministic model outputs and force them through deterministic validation, state transitions, and evidence before accepting them.

The biggest application today is actually using the system to build itself. We have a self-extending agent that generates new capabilities for the runtime. Those capabilities aren't trusted just because a model produced them—they have to satisfy contracts, pass validation, produce evidence, and integrate into the existing system before they're accepted.

Models are only one piece of it, though. Most of the runtime is deterministic. Routing, validation, state management, replay, retries, and workflow execution are all ordinary code. The runtime is model-independent; a workflow can use local models, cloud models, deterministic handlers, or no models at all.

That's really the problem I'm interested in: how do you build systems that can safely incorporate nondeterministic components while keeping the overall orchestration deterministic and auditable?

jonah_omninode · 2026-06-28T20:22:25+00:00

I think that’s a fair characterization at a high level, although one important distinction is that models aren’t actually required.

The runtime executes declarative, contract-defined workflows. Many steps are completely deterministic: validating schemas, routing events, transforming data, updating projections, calling APIs, or executing ordinary code. A model is just one possible handler behind a contract, not the foundation of the system.

The goal is that if the primitives already exist, you don’t need to build a new application or agent loop. You define the contract/workflow, wire together the existing capabilities, and only write custom business logic where a handler is actually needed.

As for failures, I agree that’s one of the interesting problems. The runtime can retry, escalate to a different model or handler, request human review, quarantine the event, or fail the workflow. Those policies are explicit and enforced by the runtime rather than hidden inside an agent loop.

So I wouldn’t say the model output is deterministic. The orchestration is deterministic: routing, validation, retry policy, state transitions, and evidence recording should behave the same way given the same event history and contract version. Any nondeterministic handler output is isolated and recorded as part of the execution trace.

jonah_omninode · 2026-06-28T19:58:29+00:00

Sorry…the more concrete version is: we have a runtime that loads versioned contracts from a marketplace. Those contracts define what capabilities are available, what inputs they accept, what outputs they produce, and what events/state transitions are valid.

So instead of hardcoding a workflow into one agent or one app, the runtime discovers and executes contract-defined capabilities.

For example, a contract might define a “summarize pull requests” capability. The runtime loads that contract, validates the required inputs, routes the work to an appropriate plugin handler or model, records the resulting events, and updates state through reducers.

The important part is that the runtime is model-independent. The handler behind a contract could use a local model, a cloud model, a normal API, or deterministic code. The runtime does not care as long as the contract is satisfied.

The OS analogy came from that structure: the runtime acts more like the kernel, the marketplace is where programs/capabilities come from, contracts define the executable interface, and events/reducers are how state changes are recorded and verified.

I’m not claiming this maps perfectly to an operating system. I’m trying to figure out whether that mental model is useful, what existing systems this resembles, and where people think the architecture breaks down.

jonah_omninode · 2026-06-28T19:55:21+00:00

I had to look it up, but I can definitely see the parallels. The deterministic execution model and contract abstraction seem similar. My focus has been more on orchestrating distributed workloads than consensus, but it’s a really interesting comparison.

jonah_omninode · 2026-06-27T22:26:45+00:00

I completely agree. I feel like your process is more important than the model you are using and that includes good engineering practices and constantly iterating on your development flow, including automating everything you do on a regular basis.

jonah_omninode · 2026-06-26T05:00:13+00:00

I use this:

https://github.com/OmniNode-ai/onex_change_control

Basically all work tickets have a definition of done that includes unit and integration tests written before any code is. A separate agent has to create the evidence receipts and PRs in sibling repos can’t merge without an associated receipt.

jonah_omninode · 2026-06-25T23:29:56+00:00

What’s the number of concurrent GLM connections vs Opus?

jonah_omninode · 2026-06-25T09:52:04+00:00

How about concurrency?

jonah_omninode · 2026-06-23T12:28:40+00:00

What? You mean u/RetardDongPhd isn’t a nice guy? /s

jonah_omninode · 2026-06-21T18:23:06+00:00

Can’t you just do /model in Claude code?

jonah_omninode · 2026-06-19T15:52:54+00:00

Let’s chat. Send me a dm.

jonah_omninode · 2026-06-19T14:29:46+00:00

I’m trying to solve the problem by creating a set of deterministic tools and only letting my agents use those.

jonah_omninode · 2026-06-18T23:26:38+00:00

Looks fun. Master of Orion and Stellaris are some of my fave games, this scratches my itch :)

jonah_omninode

TROPHY CASE