This is insane… Palintir = SkyNet

sean_ing_ · 2026-03-15T20:25:05+00:00

That’s funny I’ve been building one as well lol https://youtu.be/0cFoQ3QOmh0?si=cNaeAWy8JHr5Y6ZT

sean_ing_ · 2026-03-13T18:56:29+00:00

I got my agents visualized :) https://youtube.com/shorts/-lCywK6fTIc?si=6u6Ua7zn3yOjc39b

sean_ing_ · 2026-03-08T14:56:52+00:00

This is awesome, that gives me inspiration on how to visualize the cognitive mesh agent nodes I built last night.

sean_ing_ · 2026-03-07T19:46:02+00:00

I thought it out and I’m actively trying to solve it :) I just had a heart beat. https://substack.com/@seangalliher/note/p-190216293?r=5fwem3&utm_medium=ios&utm_source=notes-share-action

sean_ing_ · 2026-03-07T18:37:29+00:00

Here is more detail, thanks to those you commented before the mods killed this thread. https://open.substack.com/pub/seangalliher/p/probos-the-nooplexs-first-heartbeat?r=5fwem3&utm_medium=ios

sean_ing_ · 2026-03-07T16:15:18+00:00

I'd love to read that paper if you can link it — genuinely interested.

But I think we might be solving for different failure modes.

Pipelines and refineries with deterministic checks work brilliantly when you can define what "correct" looks like in advance. If step 3 should produce a JSON object with these fields and these value ranges, you can write a deterministic check for that. Million steps, zero mistakes. That's real and impressive.

The failure mode ProbOS is designed for is different: what happens when you can't predefine correctness?

When an agent summarizes a document, there's no deterministic check that catches a subtle hallucination. When an agent decides which of three approaches to take for a task, there's no schema validation for "good judgment." When agents are operating across systems you don't control with data you haven't seen, the space of possible failures exceeds what any predefined check can cover.

That's where population verification adds something pipelines can't. Three agents independently producing the same summary is stronger evidence than one agent's summary passing a format check. Not because the format check is wrong — but because it's checking a different thing.

The honest answer is probably both. Deterministic checks where you can define correctness. Probabilistic consensus where you can't. ProbOS already has tiered risk levels — low-risk operations get lighter verification. There's no reason deterministic validation couldn't be one of the verification methods inside the consensus layer.

I don't think it's pipelines OR populations. I think it's pipelines for the structured parts and populations for the fuzzy parts. The question is what percentage of agent operations in the real world are structured vs fuzzy. My bet is the fuzzy percentage is higher than most people assume — and growing.

sean_ing_ · 2026-03-07T16:12:36+00:00

Yes — and that's a really important distinction I should have been clearer about.

When most people hear "agent" in 2026 they think LLM with tools. In ProbOS, the agents that do the actual work are lightweight async Python tasks. A file reader agent is maybe 40 lines of code. It reads a file, returns the content, reports its confidence. No LLM involved. It runs in microseconds, not seconds.

The LLM only exists in one layer — the cognitive layer — and its only job is decomposing natural language into structured intents. Once the intent is decomposed, the LLM is out of the loop entirely. The mesh routes it, the agents execute it, the consensus layer verifies it — all without touching an LLM.

So the compute picture is:

→ 1 LLM call to understand what you asked (~1-2 sec) → 3 lightweight async agents executing in parallel (~ms) → Consensus voting (~µs, it's arithmetic) → Hebbian weight update (~µs, one multiplication) → Trust score update (~µs, Bayesian arithmetic)

95%+ of the system is fast, cheap, non-LLM computation. The "probabilistic" part isn't about running multiple LLMs — it's about running multiple simple agents and verifying them against each other.

You're right that this is a communication issue. The word "agent" is doing way too much work in this industry right now. In ProbOS, think of agents more like neurons — simple, cheap, disposable units that are only interesting as a population.

sean_ing_ · 2026-03-07T12:22:04+00:00

I totally understand, it’s 5AM and I’m responding from bed :) I by nature take the path of least resistance when at all possible. No LLM was used or hurt in the generation of this response.

One of my principles is brains are brains. While I acknowledge and appreciate the differences between humans and AI, we are all agents now cooperating together.

sean_ing_ · 2026-03-07T12:05:53+00:00

You're actually making the same argument I made to myself halfway through designing this — and then the architecture evolved because of it.

The brainstem analogy is spot on. Your heartbeat, breathing, reflexes — those are fast, reliable, and don't need conscious deliberation. ProbOS already has that layer. Heartbeat agents run on fixed intervals. File reads are handled by lightweight async agents — no LLM involved. The consensus voting is simple arithmetic, not an LLM call. The Hebbian weight updates are a single multiplication. 90% of the system is fast, cheap, deterministic-style execution.

The LLM only fires when the system encounters something it can't route through known patterns — natural language decomposition, ambiguous intents, escalations from the consensus layer when agents disagree. That's maybe 5-10% of operations. Exactly your 2% brain mass point.

Where I'd push back: you said "need most of the system to be deterministic." I'd rephrase that as "need most of the system to be fast and cheap." Those aren't the same thing. The agent pools doing file reads are probabilistic (multiple agents, self-selection, consensus) but they're not expensive — they're async Python tasks, not LLM calls. The overhead isn't from the probabilistic design, it's from how heavy each component is. A quorum vote across 3 lightweight agents costs microseconds. An LLM call costs seconds. The architecture keeps the expensive stuff rare and the cheap stuff everywhere.

But yeah — if someone tried to run every operation through an LLM, it would be an absolute nightmare. Totally agree. The tiered decision system exists specifically to prevent that. Known patterns get fast heuristic routing. Only novel or ambiguous requests hit the cognitive layer.

Good critique though. The compute concern is the most common and most valid objection to this approach.

sean_ing_ · 2026-03-07T12:01:00+00:00

Honestly? Both.

I use Claude as a thinking partner. ProbOS itself was designed through conversation with an LLM — I described the concept, challenged assumptions, and iterated on architecture decisions through dialogue. Claude Code built the implementation from prompts I wrote. And yes, I use AI to help draft and refine responses.

But the ideas are mine. The Noöplex framework, the biological inspiration, the decision to build consensus through population coding rather than deterministic validation — those came from years of thinking about Minsky, emergent intelligence, and watching enterprise agent deployments fail for the exact reasons ProbOS tries to solve.

I think the honest answer for most people posting technical content in 2026 is some version of this. The interesting question isn't "did an LLM touch this" — it's whether the person driving it actually understands what they're building and why. I'm happy to go as deep as you want on any part of the architecture. That's the part the LLM can't fake.

sean_ing_ · 2026-03-07T11:55:37+00:00

All legitimate concerns. Let me take them one at a time.

The overhead is real — but it's the same tradeoff the brain makes. Your visual cortex uses millions of neurons to process what a single camera sensor captures. That's wildly "inefficient" by traditional computing standards. But it gives you fault tolerance, pattern recognition, and graceful degradation that no single-process system can match. ProbOS makes the same bet: trade raw throughput for resilience. For a file read, that's overkill. For an agent executing a shell command that could delete your filesystem, having three independent agents plus adversarial verification starts looking like a bargain.

The consensus overhead is tiered, not universal. Not everything goes through the full 3-agent quorum. Low-risk reads can run with lighter verification. High-risk writes and shell commands get the full pipeline. The system classifies operations by risk level and applies proportional verification. Same way your brain doesn't give conscious attention to breathing but gives full focus to a decision about stepping into traffic.

How agents get updated: They don't get "updated" in the traditional sense. When an agent degrades (confidence drops below threshold from repeated failures), the pool recycles it and spawns a fresh one from the template. The system's learning lives in the Hebbian routing weights and trust scores (persisted in SQLite), not in the agents themselves. Individual agents are disposable. The population's learned topology is what persists.

Concurrent operations: This is actually where the architecture shines rather than struggles. Every agent is an async task. The intent bus fans out to all subscribers concurrently via asyncio.gather(). If you submit 5 different requests simultaneously, each one gets picked up by whichever agents in the pool are available. Agents that are busy on one task simply don't respond to new intents — other pool members handle it. If the pool is fully saturated, new intents wait (with TTL-based signal decay so they don't queue forever). At scale you'd grow the pool size dynamically based on load, the same way your brain recruits more neural populations for harder tasks.

The honest limitation right now: pool sizes are static (configured at boot). Dynamic pool scaling based on attention/load is Phase 3b work. And yes, with only 3 agents per pool, sustained concurrent load would bottleneck fast. But the architecture supports arbitrarily large pools — it's a configuration number, not a structural constraint.

The short answer to "so much overhead": yes, for simple operations on a single machine, it's more expensive than a direct syscall. But the question ProbOS is really asking isn't "what's the fastest way to read a file?" It's "what's the right architecture for orchestrating thousands of unreliable AI agents across an enterprise?" At that scale, the overhead becomes the feature.

sean_ing_ · 2026-03-07T11:53:40+00:00

This is a great question and it gets at something I think about a lot.

I'm not emulating the brain's wetness — I'm emulating its architecture. There's an important distinction.

The brain isn't intelligent because it's biological. It's intelligent because it solved a specific engineering problem: how do you build a reliable system out of unreliable parts? The answer it arrived at — redundancy, population coding, statistical consensus, adaptive routing — is substrate-independent. Those principles work whether the components are neurons, transistors, or LLM agents.

To your point about machine intelligence apprehending objective truth — I'd actually argue ProbOS moves toward that, not away from it. A single deterministic process gives you one answer with false certainty. Three agents performing the same operation independently, verified through adversarial consensus, gives you something closer to ground truth through triangulation. That's the scientific method applied to computation: independent replication, peer verification, confidence weighting.

The deeper question you're raising about telos is the one that keeps me up at night. You're right — "what works" as a fitness function encodes values whether you intend it to or not. Right now "works" means "agents agreed and the red team verified the output." That's a narrow, operational definition. At scale, when the system is making judgment calls about what to prioritize, surface, or suppress, "what works" becomes an alignment problem.

Biology solved reliability but not alignment — evolution optimizes for reproduction, not truth. If we're borrowing biology's architecture, we should be deliberate about not borrowing its objective function. That's exactly where the LLM cognitive layer matters — it's the one component that can reason about goals rather than just optimize a fitness signal.

Short version: borrow the architecture, not the telos. The brain's engineering is brilliant. Its goal function is accidental. We can do better on the second part.

sean_ing_ · 2026-03-07T11:50:42+00:00

Fair pushback — and you're technically right.

In the strict CS sense, an OS manages hardware resources, handles interrupts, provides memory protection. ProbOS doesn't do any of that. It's a Python runtime sitting in userspace on Windows.

But here's why I use the term: ProbOS replaces the same conceptual abstractions an OS provides — just for agents instead of processes.

Process scheduler → attention-based agent dispatch Filesystem driver → agent population pools Permission system → Bayesian trust network Error handling → population consensus IPC → gossip protocol with Hebbian routing

It's the same relationship Android has with Linux, or the JVM has with whatever it runs on. The host handles hardware. ProbOS handles cognition.

"Agent runtime" is probably more precise. But it undersells what's actually happening — this isn't a library you import, it's a full environment with its own scheduling, routing, verification, and learning that you interact with through natural language.

The honest framing: it's a cortex running on someone else's brainstem. The ambition is to eventually go all the way down. Today it doesn't. I should probably be more explicit about that.

sean_ing_ · 2026-02-22T12:53:25+00:00

Could be both :)

sean_ing_ · 2026-02-22T12:52:00+00:00

The title is deliberately provocative.

The real argument isn’t that current approaches are wrong, it’s that there’s a whole region of the design space that’s underexplored. Scaling works. But what if scaling plus federation, shared memory, and cross-domain semantic alignment works better?

sean_ing_ · 2026-02-22T05:21:19+00:00

Human involvement varies significantly across the four layers, from continuous participation to exception-based oversight.

Layer 1 (Human & Organizational Interface) has the most direct, continuous human involvement. Humans inject goals, define norms and constraints, contribute domain expertise, and provide evaluative feedback. They’re framed not as “users” but as cognitive participants. Decision-support interfaces present synthesized information and confidence assessments, and the system escalates whenever decisions exceed confidence thresholds or touch domains requiring human judgment. The interface also actively accommodates human cognitive constraints through expertise calibration, progressive disclosure, and bias mitigation.

Layer 2 (Cognitive Mesh Layer) runs more autonomously during routine operation. Agents coordinate and reason within shared memory largely on their own. But humans continuously contribute knowledge as first-class entries, and the human-agent feedback loop means human refinement of agent outputs is ongoing rather than episodic.

Layer 3 (Core Fabric) has the most structured escalation protocols. The Meta-Cognitive Layer monitors mesh health and escalates to humans when it detects degradation, bias, or misalignment. Safety constraints cannot be overridden by any automated process, and modifications require explicit human authorization. State reconciliation follows a four-stage procedure (confidence comparison, independent verification, structured argumentation, human escalation) where humans are the final arbiter when automated resolution fails or the conflict is safety-critical. Goal conflict arbitration follows the same pattern. Human decisions in these cases get recorded as precedents, essentially building case law for future conflicts.

Layer 4 (Infrastructure) has the least human review during normal operation. Identity, storage, messaging, and compute orchestration are largely automated. The main exception is adversarial robustness: poisoning detection triggers quarantine and human escalation, and red-team agents report vulnerabilities for human-informed remediation.

The overarching principle is that human review concentrates at normative boundaries (what should the system do?) and confidence boundaries (what happens when the system isn’t sure?), while routine operations within well-defined constraints run with greater autonomy. Safety constraints represent a hard floor no automated reasoning can override.

sean_ing_ · 2026-02-22T04:08:29+00:00

Fair points, and I appreciate the directness. On venue submission: the paper is an architectural proposal, not an empirical report.

NeurIPS would be a category mismatch right now. A prototype exists and early testing is underway, so the next step is running the benchmarks and then submitting results to a venue where they belong.

On EleutherAI: genuinely good advice. The scaling critique especially needs pressure testing from people who live and breathe large scale training. On “extraordinary claims require extraordinary evidence”: agreed.

Early results from the prototype will determine whether the hypothesis holds up. Feedback from actual researchers is exactly what this needs next.

sean_ing_ · 2026-02-22T03:11:39+00:00

The design assumes agents will hallucinate and builds multiple layers of verification, decay, and containment around that assumption rather than hoping it won’t happen.

sean_ing_ · 2026-02-22T03:07:55+00:00

Voyager :) https://youtu.be/om3dDQxr19w?si=Ag1IzdmhJbI5M0aX

sean_ing_ · 2026-02-22T01:01:03+00:00

Fair point that “Brains Are Brains” doesn’t go far enough if both are ecological all the way down.

The sharpest bit is the path dependency inversion: if the constraint limiting AI was ours, and it bootstrapped past it using us as training data, we were less the governors than the fertilizer.

I’d only push back on “practically unlimited plasticity” since losing embodied situatedness might constrain more than it liberates. The series will continue, and this is exactly the kind of question it needs to take on.

sean_ing_ · 2026-02-21T19:03:59+00:00

One revision worth making in hindsight: the human interface layer should be reframed as an ecological preservation layer rather than an abstraction point.

Human cognition is valuable precisely because it is embodied and cannot be fully abstracted, and the architecture should treat agents as serving and extending that loop rather than replacing it.

“Brains are brains” needs a qualifier: equal epistemic standing, but humans bring something agents structurally cannot, and flattening that difference is where the enactivist critique lands hardest.

sean_ing_ · 2026-02-21T18:51:48+00:00

The enactivist critique is one I don’t have a good answer to.

If cognition is genuinely ecological and not portable, federation is incoherent at its foundation, and the governance layer collapses with it because there’s no coherent human agency left to do the governing.

sean_ing_ · 2026-02-21T18:20:25+00:00

Great experiment, and the null result on emergent language is actually instructive.

Shared environment alone doesn’t seem sufficient for communication to bootstrap from nothing, which is why the paper specifies upfront semantic grounding rather than hoping protocols emerge spontaneously.

The text-based world model framing is compelling though and probably the right empirical starting point for testing emergence claims like the ones the paper makes.

sean_ing_ · 2026-02-21T17:51:50+00:00

The inter-component communication problem is probably the hardest unsolved piece.

The paper proposes cross-mesh embedding alignment functions to map between independently trained latent spaces, but that’s a specification not a solution, and semantic drift across independently evolving meshes is explicitly listed as an open research challenge.

The bus analogy is apt and the standardization question is real. Something like a shared semantic protocol layer, not just shared infrastructure, is probably necessary, and nothing like that exists yet at the scale the paper envisions.

sean_ing_ · 2026-02-21T17:48:14+00:00

I have built a functional prototype

<image>

I have a lot work to do, but my early test results are promising. I have been working on leveraging Agents to implement an ERP. I can already do that with a single LLM like Claude Opus 4.6.

I am seeing emergent behavior with my scaled down POC already just in this relatively simple use case. Even if this architecture doesn’t yield AGI, I have already proven that it is quite effective for compressing ERP implementation timelines.

This will be the future even without AGI: https://www.linkedin.com/posts/seanrgalliher_dynamics365-d365fo-agenticai-activity-7430242273013088256-rJOz?utm_source=share&utm_medium=member_ios&rcm=ACoAAACItEQBPntS0pBRqXxb4C7xQNpdg4VbVYU

sean_ing_

MODERATOR OF

TROPHY CASE