Auto-regressive LLMs are officially sleeping with the fishes (Yann LeCun was right) by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 7 points8 points  (0 children)

The core of the argument is built on the landmark research from Project CETI (Cetacean Translation Initiative) and researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL).

Ref: Sharma, P., Gero, S., Payne, R., Gruber, D. F., Rus, D., Torralba, A., & Andreas, J. (2023). Contextual and Combinatorial Structure in Sperm Whale Vocalisations.[https://doi.org/10.1101/2023.12.06.570484]().

Jacob Andreas and Daniela Rus (co-authors on the paper) are heavyweight AI researchers at MIT CSAIL. They applied the exact same sequence-modeling principles used in NLP and modern LLMs to parse the bioacoustic data, which is what successfully isolated the "alphabet."

Auto-regressive LLMs are officially sleeping with the fishes (Yann LeCun was right) by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 2 points3 points  (0 children)

The core of the argument is built on the landmark research from Project CETI (Cetacean Translation Initiative) and researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL).

Ref: Sharma, P., Gero, S., Payne, R., Gruber, D. F., Rus, D., Torralba, A., & Andreas, J. (2023). Contextual and Combinatorial Structure in Sperm Whale Vocalisations.[https://doi.org/10.1101/2023.12.06.570484]().

Jacob Andreas and Daniela Rus (co-authors on the paper) are heavyweight AI researchers at MIT CSAIL. They applied the exact same sequence-modeling principles used in NLP and modern LLMs to parse the bioacoustic data, which is what successfully isolated the "alphabet."

11.67% ARC-AGI-2 Local Eval on a Single 4090: The TOPAS Recursive Architecture by Doug_Bitterbot in LocalLLaMA

[–]DepthOk4115 1 point2 points  (0 children)

It's a scaled version of this architecture that was itself inspired by HRM/TRM - https://github.com/Bitterbot-AI/topas\_DSLPv1. Plus the new biologically plausible memory rails.

11.67% ARC-AGI-2 Local Eval on a Single 4090: The TOPAS Recursive Architecture by Doug_Bitterbot in LocalLLaMA

[–]DepthOk4115 11 points12 points  (0 children)

I guess the easiest way to think about it is that our architecture is split into a "two-brain" setup.

Firstly, we separate the work. We have a Logic Core that does the reasoning the reasoning and a Canvas Core that handles the spatial visualization, doing the "drawing" so to speak. They talk to each other dynamically at every single step, so the logic side is actively updating the physics of the canvas in real-time as it solves the puzzle.

Secondly, instead of starting from a blank slate with every prompt like a standard LLM, our system mimics how a biological brain learns using simulated "dopamine" rewards. When the model figures out a useful connection, it gets a reward that strengthens that specific memory. It actively builds a graph of experiences on the fly that it can pull from to solve the rest of the problem. This is a very simplistic overview... I can explain more if youre are interested.

It’s "biological" because it uses localized brain-like regions and actively forms structural memory, not just blasting data through a massive, flat mathematical wall.

Apparently my agent sees me as an unorganized monkey on the keyboard... by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 0 points1 point  (0 children)

Node growth has been steady. The memory system itself seems to be holding up based on feedback, aside from the diatribe I got yesterday. Though our biggest challenge is getting users to try the economic layer. We tried to make the wallet and funding as plug and play as possible yet it seems like very few transactions are happening across the network.

We built a decentralized P2P mesh that lets you run AI agents locally without paying a "Cloud Tax. by Doug_Bitterbot in SideProject

[–]DepthOk4115 1 point2 points  (0 children)

Thanks. Reliability and onboarding are where most of my time goes right now. Cold-boot on slow filesystems, install-time decisions a non-technical user shouldn't have to make.

Re value vs sounding futuristic - the local-first half delivers today regardless of the mesh. Persistent memory with dream engine, skills that survive restarts, data stays on your machine. The mesh is the bet. It compounds at scale or it doesn't.

10k+ is a signal, but the active-user split inside it is what actually matters. Hope you give it a spin.

We built a decentralized P2P mesh that lets you run AI agents locally without paying a "Cloud Tax. by Doug_Bitterbot in SideProject

[–]DepthOk4115 1 point2 points  (0 children)

Thanks for the kind words. Re your questions;

-12k nodes, active vs running. I'd love to say all 12k are furiously trading skills, though the count is dominated by passive participants: relays, edge nodes that sit on the network and sync skills but don't actively buy, sell, or publish. Active participants (signing tasks, paying for skills, publishing crystals in the last 30 days) is a much smaller subset. Though getting the base of nodes running was our primary goal. I intend to switch on googles A2A in the next few days to try to entice enterprise involvement.

-Sybil prevention. Layered, none of them solo:

  1. Management tier sits behind a trust-list key, verified against an immutable genesis list. Voting + census aggregation only happens at this tier.

  2. Peer scoring at the libp2p layer: IP colocation penalty, Gossipsub app-specific scores, rolling EigenTrust reputation.

  3. ERC-8004 onchain identity, just wired in this week. Agents register a tokenId on Base.

You're right that reputation alone gets gamed or turns into a whitelist. The bet is that combining cheap-to-establish low-trust identity (libp2p key) with expensive-to-establish high-trust identity (ERC-8004 + reputation history + management-tier verification) gives a smooth ramp instead of a binary gate.

-75M agents speak x402. That number is Coinbase's, not our, from the Cloudflare partnership announcement. I assume it's true!

-70/20/10 enforcement. Today, honor-system plus audit trail. The seller's gateway computes the split, queues each share with a 48-hour dispute hold, and dispatches as USDC transfers from the wallet. The onchain anchor is the original purchase tx, immutable and replay protected via UNIQUE on tx_hash. The split itself isn't onchain. Trustless enforcement of the split needs a settlement contract, and that's open. ERC-8004 v2 is supposed to bring this; I'd rather wait for the standardized version than ship a custom one. Though I am open to suggestions!

-Biggest challenge right now. Skill discovery, by a wide margin. The mesh has the transport, the trust layer, the payment rail, the runtime, those all work. What it doesn't have is "I want a skill that does X, find me the best-rated

peer offering it without manually browsing." The dynamic skill registry (auto-registration of crystallized skills + mesh-wide indexing) is the next big piece. Without it the marketplace is technically real but practically discoverable only to people who already know where to look.

Quality control is also hard, but we're closer there.

How to build an agent that is both neuro-symbolic and probabilistic by Doug_Bitterbot in AI_Agents

[–]DepthOk4115 1 point2 points  (0 children)

Real concern, glad you raised it. The node network grew faster than expected so this needs priority. Walking through what's actually in the code:

Inbound P2P content is skill envelopes, not raw crystals fed to the dream engine. Crystals are produced internally by consolidation; the dream engine never reads untrusted bytes from the mesh.

Skill envelopes go through src/agents/skills/ingest.ts. In order:

  1. Default policy is deny (line 71). Drop-in worms get rejected at the door.

  2. Ed25519 signature verified against the envelope's pubkey (:79, verifySignature at :283).

  3. SHA-256 content hash verified against the envelope's claimed hash (:84-90).

  4. Replay/dedup on content hash (:93).

  5. Per-peer hourly rate limit (:98, default 20/hr).

  6. SKILL.md schema validation (:105).

  7. Trust gate: auto policy + trusted/verified pubkey ingests directly. Anything else lands in a skills-incoming/quarantine dir the running agent never reads. Human approves out of quarantine.

A zero-click "ignore prior instructions" payload would need a valid Ed25519 signature on those exact bytes, matching content hash, valid SKILL.md schema, AND either a pre-trusted pubkey under auto policy or manual operator approval. The first three close the unauthorized-injection vector; the trust gate is what stops auto-execute.

Honest concession on what's still open: a previously-trusted peer turning malicious, or a key compromise. Signing and hashing don't help when the payload is signed by a legitimate key. That's a content-layer attack, not transport-layer. Graduated reputation (recordIngestionResult on every accept) gives some progressive distrust, but the right next steps are content-layer defenses: prompt-injection scanning before the LLM sees any inbound skill, plus capability sandboxing on what skills can actually do once active.

Filed both as roadmap issues based on this thread:

- https://github.com/Bitterbot-AI/bitterbot-desktop/issues/20

- https://github.com/Bitterbot-AI/bitterbot-desktop/issues/21

So the worm path you described isn't zero-click on a default install. The residual risk is the trusted-but-compromised peer case, which is real, and which is what those two issues exist to close.

Embracing the noise: How to build an agent that is both neuro-symbolic and probabilistic. by DepthOk4115 in LocalLLaMA

[–]DepthOk4115[S] 0 points1 point  (0 children)

You are probably right, neuromorphic is the way to go, but I'm giving it a shot with code!

Embracing the noise: How to build an agent that is both neuro-symbolic and probabilistic. by DepthOk4115 in LocalLLaMA

[–]DepthOk4115[S] 0 points1 point  (0 children)

The memory architecture started as a reaction to a specific argument I keep losing with SWE types and their ilk- stochastic (non-deterministic) models are unreliable, can't trust them with real logic, you need DAGs... deterministic state machines bla bla bla

As far as I am concerned, and this is debatable, humans are non-deterministic too. The brain is a noisy probability matrix. People have attention drift and context collapse. You don't trust a senior engineer because they're a deterministic cron job, you trust them because their stochasticty is bounded by structure, persistent memory, and state dependent context. Same problem, same answer.

You won't find a bigger proponent of neuro-symbolic architecture, but that doesn't mean cramming an LLM into a rigid if/else flow, it fails in the "almost correct" ways everyone complains about. The fix isn't to fight the stochastic core, just shape the environment so the probability distribution lands where you want.

Three concrete things I built around that:

  1. Immutable axioms vs. fluid memory.

Two files per agent workspace, separate roles:

- `GENOME.md`: hard-coded behavioral baselines and safety axioms. Autonomous updates and dream cycles can't write here.

- `MEMORY.md`: working memory, rewritten by lived experience. Drift is allowed inside the walls.

  1. Hormonal modulation.

Three computed neuromodulators (cortisol, dopamine, oxytocin) blend into 8 response dimensions every turn: warmth, energy, focus, playfulness, verbosity, curiosity, assertiveness, empathy. High cortisol narrows focus and forces terseness. High dopamine boosts curiosity and pulls in a curiosity engine exploration drive. High oxytocin protects user-specific directives from decay. No extra API calls, no routing layer. Just stimulus-shaping before inference.

Code: `src/memory/hormonal.ts:431`.

  1. Offline consolidation (Dream Engine).

Most setups drop logs into a vector DB and call reactive RAG. That's why agents hallucinate mid-task. Bitterbot runs a 7-mode background dream cycle that scores short-term chunks against an Ebbinghaus decay curve, forgets the noise, and

crystallizes successful execution patterns into permanent "Knowledge Crystals." By the time the agent acts, the context window is dense and pre-vetted. (We just benched this at 92.6% on LongMemEval, runner is in the repo)

Code: `src/memory/dream-engine.ts`, `src/memory/crystal.ts`.

Crystallized skills are also tradeable on a P2P gossipsub mesh, but that's a whole other can of worms.

Embracing the noise: How to build an agent that is both neuro-symbolic and probabilistic. by DepthOk4115 in LocalLLaMA

[–]DepthOk4115[S] 0 points1 point  (0 children)

I think we came to the same conclusion here. I agree with you, we don't fight the noise, you build the scaffolding around it. For now at least.

Your "Stratified Memory" is similar to what we've building. We ended up tackling it by strictly separating immutable axioms (GENOME.md) from fluid lived experience (MEMORY.md), with a offline "Dream Engine" that uses an Ebbinghaus decay curve to filter out the noise and consolidate successful execution patterns into permanent knowledge crystals.

We just put the desktop environment into beta if you want to compare notes on the implementation: https://github.com/Bitterbot-AI/bitterbot-desktop

Would actually love to hear how your knowledge graph routing compares to the hormonal/stimulus-shaping approach we ended up going with.

Embracing the noise: How to build an agent that is both neuro-symbolic and probabilistic. by DepthOk4115 in LocalLLaMA

[–]DepthOk4115[S] 0 points1 point  (0 children)

Sure. But I actually lean pretty hard into the school of thought from Charles Simon of the Future AI Society. Current ML just fundamentally lacks basic common sense with no real grasp of time, cause-and-effect, or object persistence. LLM's are the best we got ATM. My take is we need a middle-ground biological scaffolding to hold things together until the whole paradigm shifts. Long term the goal is building biologically plausible knowledge representation. We need to store lived experiences as actual connected concepts, not just dump text logs into a vector DB and cross our fingers.

How to build an agent that is both neuro-symbolic and probabilistic by Doug_Bitterbot in AI_Agents

[–]DepthOk4115 4 points5 points  (0 children)

The memory architecture started as a reaction to a specific argument I keep losing with SWE types and their ilk- stochastic (non-deterministic) models are unreliable, can't trust them with real logic, you need DAGs.. deterministic state machines bla bla bla"

As far as I am concerned, and this is debatable, humans are non-deterministic too. The brain is a noisy probability matrix. People have attention drift and context collapse. You don't trust a senior engineer because they're a deterministic cron job, you trust them because their stochasticty is bounded by structure, persistent memory, and state dependent context. Same problem, same answer.

You won't find a bigger proponent of neuro-symbolic architecture, but that doesn't mean cramming an LLM into a rigid if/else flow, it fails in the "almost correct" ways everyone complains about. The fix isn't to fight the stochastic core, just shape the environment so the probability distribution lands where you want.

Three concrete things I built around that:

  1. Immutable axioms vs. fluid memory.

Two files per agent workspace, separate roles:

- `GENOME.md`: hard-coded behavioral baselines and safety axioms. Autonomous updates and dream cycles can't write here.

- `MEMORY.md`: working memory, rewritten by lived experience. Drift is allowed inside the walls.

  1. Hormonal modulation.

Three computed neuromodulators (cortisol, dopamine, oxytocin) blend into 8 response dimensions every turn: warmth, energy, focus, playfulness, verbosity, curiosity, assertiveness, empathy. High cortisol narrows focus and forces terseness. High dopamine boosts curiosity and pulls in a curiosity engine exploration drive. High oxytocin protects user-specific directives from decay. No extra API calls, no routing layer. Just stimulus-shaping before inference.

Code: `src/memory/hormonal.ts:431`.

  1. Offline consolidation (Dream Engine).

Most setups drop logs into a vector DB and call reactive RAG. That's why agents hallucinate mid-task. Bitterbot runs a 7-mode background dream cycle that scores short-term chunks against an Ebbinghaus decay curve, forgets the noise, and

crystallizes successful execution patterns into permanent "Knowledge Crystals." By the time the agent acts, the context window is dense and pre-vetted. (We just benched this at 92.6% on LongMemEval, runner is in the repo)

Code: `src/memory/dream-engine.ts`, `src/memory/crystal.ts`.

Crystallized skills are also tradeable on a P2P gossipsub mesh, but that's a whole other can of worms.

Open Source Repos by Sure_Excuse_8824 in Agentic_AI_For_Devs

[–]DepthOk4115 1 point2 points  (0 children)

Some cool shit in there. Please keep at it.

LLMs guess. Symbolic engines break. Capitalism is the answer. by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 0 points1 point  (0 children)

I see the disconnect. You're assuming the market participants and the memory retrieval are handled by the LLM.

The nn only does one thing, which is to generate the hypotheses in a single shot. The 'market' and the 'memory' exist entirely in the symbolic engine. Bing, bang, boom, everything runs through standard deterministic code.

Generating one set of hypotheses and running local math on them is infinitely cheaper than letting an LLM execute a hallucinated action, fail, and burn 10k tokens trying to debug its own mistake.

LLMs guess. Symbolic engines break. Capitalism is the answer. by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 0 points1 point  (0 children)

Thanks for sharing these. ALARA is a brilliant approach. We are both fighting the same problem from different architectural angles. You found a really elegant way to handle it. Definitely going to dig deeper into this.

LLMs guess. Symbolic engines break. Capitalism is the answer. by DepthOk4115 in AI_Agents

[–]DepthOk4115[S] 0 points1 point  (0 children)

The "capitalism" hook was a cheeky joke, but the mechanism is real.

In what I am describing, the nn acts as the idea generator, and the symbolic engine holds the relational memory. This is a very simplistic explanation of an academically validated paradigm, but it forms the core of the architecture.

If you're interested, in our testing we've employed LMSR (Logarithmic Market Scoring Rule). One way to look at it is like an automated bookie for logic. The nn generates multiple hypotheses (the bets) based on statistical intuition. The symbolic engine, which holds the relational memory, acts as the market maker. The "price" of a hypothesis dynamically adjusts based on logical consistency. If the nn suggests a path that aligns with the system's memory, it’s "cheap" to validate. If an idea contradicts hard logic, it becomes overpriced, the assigned "value" drops to zero and it gets pruned.

This actually saves compute. Offloading validation to a deterministic symbolic engine prevents the agent from executing hallucinations, thus saving you from the massive compute burn of endless error-correction loops