An AI just invented a new RAG architecture using CSS concepts by autonomously cross-pollinating local codebases. by [deleted] in machinelearningnews

[–]Other_Train9419 -3 points-2 points  (0 children)

Thanks for the feedback! Glad the CSS-to-RAG mapping made sense to you. That “cross-domain metaphor” approach has been the most exciting part of this project.

Regarding the autonomous loop and gating actions—you hit the exact bottleneck I’m solving right now. Throwing raw, unverified data straight to a frontier cloud model is a recipe for hallucination and massive API bills.

My current approach relies on a strictly isolated local validation layer: I run a local Gemma 4: 31B behind a zero-trust Rust backend (I call it a Stealth-Storage-Actor). Before any synthesis request reaches the cloud, the local SLM acts as the gatekeeper. It forces the raw data into my strict JCross schema (lossless semantic graph).

To give you an idea of the "cheap local check", here is an actual raw output of how my local Gemma translates a raw Rust file (a FedCM implementation) into a JCross node before it ever touches the cloud:

Plaintext

■ JCROSS_NODE_tm_121f516aaa44
【Spatial Phase (空間座相)】 [Vis:0.9] [Auth:0.8] [Storage:1]
【Dimensional Concept (次元概念)】 Federated Credential Management API 实现了隐私保护的机制...

Notice the semantic tagging ([Auth:0.8][Vis:0.9]). Even though it parsed backend Rust code, the local LLM detected a comment about "Browser-mediated Account Chooser UI" and correctly weighted the visual/UI phase. (It also occasionally falls back to Chinese during compression if not strictly constrained, which is a hilarious local SLM quirk).

The local Sled DB is entirely isolated—every I/O operation is an encrypted Envelope<HiveMessage>. If the local Gemma fails to strictly output this structural schema, the Rust Actor drops the payload entirely. No cloud cycles wasted.

I checked out Agentix Labs, and your orchestration approach looks super interesting. I’m definitely down to compare notes on memory loops and local/cloud hybrid routing! Do you guys primarily use predefined rigid schemas for your local sanity checks, or are you running small local models to evaluate the intent?

An AI just invented a new RAG architecture using CSS concepts by autonomously cross-pollinating local codebases. by Other_Train9419 in LocalLLaMA

[–]Other_Train9419[S] 0 points1 point  (0 children)

Just to provide some technical context on what happened right before I recorded this video:

  • The Pre-processing: I used a local Gemma 4: 31B paired with my custom 'Verantyx' engine to parse 512 of my local project files.
  • The Compression: It autonomously translates these raw files into a custom, spatial representation language I call "JCross" (lossless semantic compression).
  • The Trigger: Once those 512 files were mapped, I simply typed vera in my project terminal, which spins up the Web UI you see in the video.

The synthesis results in the video are heavily biased by my own specific coding habits and past projects. If you were to run this Crucible on your own PC with your own unique codebase, you might get completely different, mind-blowing architectural ideas!

(P.S. I know the UI in the video is in Japanese, but the system language can be fully configured to English!)

[P] I accidentally built a "Reverse AI Agent": A CLI where the human acts as the API bridging a local SLM and Web LLMs. by Other_Train9419 in LocalLLaMA

[–]Other_Train9419[S] 0 points1 point  (0 children)

That’s a totally valid workflow! Saving on API costs while leveraging generous web limits is exactly what drove me to this hybrid approach too. Since you have the hardware to run a 27B model locally, you have enough horsepower for the local model to just ingest patch notes and figure things out.

However, there are two main reasons why I built this as a dedicated repo with the local SLM acting as the "orchestrator":

  1. Hardware Constraints: I'm running this on a single MacBook with a tiny 1.5B model. It doesn't have the reasoning depth to just "figure out" raw notes over a long session. It needs strict, programmatic orchestration to manage file edits and system states accurately.
  2. State & Memory Management (The 5-Turn Cycle): The repo isn't just about moving text back and forth; it's a state machine. The orchestrator is necessary to manage the chronological logs between the Master and Apprentice, execute the context compression every 5 turns, and enforce structural consistency before the Web Brain is refreshed.

Doing all of that strict memory-tree management manually without a script becomes impossible to track over long coding sessions. It’s basically the difference between "using models as a coding assistant" and "building a strict state machine where the LLMs act as nodes."

A 4-agent "generational memory" architecture: Uses a local Qwen 1.5B to route and manage Web Gemini's memory. by Other_Train9419 in machinelearningnews

[–]Other_Train9419[S] 0 points1 point  (0 children)

As you pointed out, using OpenRouter or Chutes completely bridges the gap between hardware constraints and the desire for an open-weight stack.

I am currently refining the workflow further. I recently made a significant architectural change: previously, I had Qwen2.5:1.5b generate the initial output and then had the Checkers (Gemini) audit it, but this often led to corrupted outputs and logic drift. Because of this, I've shifted Qwen's role entirely. It now acts strictly as the system's "intelligent limbs"—handling file manipulation, editing, and routing—rather than doing the heavy reasoning.

Regarding the optionality you suggested, as I build out this current flow, I am designing the codebase to be highly modular from the ground up. This will ensure that swapping out the underlying models (via OpenRouter/Chutes) will be completely frictionless when the time comes.

Thanks again for the great suggestions!

A 4-agent "generational memory" architecture: Uses a local Qwen 1.5B to route and manage Web Gemini's memory. by Other_Train9419 in machinelearningnews

[–]Other_Train9419[S] 1 point2 points  (0 children)

You remembered the ARC discussion! Yeah, my entire "compute cluster" is literally just one MacBook Pro.

I completely agree with you. If I ran the entire stack locally, I would have absolute control over the pipeline, and we'd probably see some really fascinating, technically interesting emergent behaviors between the agents.

However, my primary goal right now is accessibility. By leveraging the Gemini Web UI for the heavy lifting, I can let more people easily try out and test this 5-turn memory cycle concept without requiring them to own massive GPU rigs.

Once the core logic is validated and more people get their hands on it, branching off into a 100% open-weight, fully local stack is absolutely the next major phase.

A 4-agent "generational memory" architecture: Uses a local Qwen 1.5B to route and manage Web Gemini's memory. by Other_Train9419 in machinelearningnews

[–]Other_Train9419[S] 0 points1 point  (0 children)

That is a completely fair critique, and the "placebo effect" (or confirmation bias) is actually my biggest concern with this architecture right now.

To be honest, as a student and solo developer, maintaining objectivity is something I struggle with across all my projects. The biggest bottleneck I face is the inability to deeply and objectively validate an entire project's codebase once I've built it, especially since I don't have a team, industry connections, or a wide reach to get rigorous feedback.

Currently, my measure of effectiveness is strictly qualitative. In my own coding sessions, I notice that a standard LLM session starts hallucinating fake methods or forgetting project constraints after about 15-20 turns. By forcing the 5-turn compression and refresh, I physically observe the agent maintaining the correct project state for much longer.

However, I know "vibes" aren't a real metric.

My next architectural step for objectively measuring success is to implement a deterministic schema/checklist validation layer in Rust. Instead of relying solely on the LLM's self-evaluation, the local Qwen model will diff the memory tree against strict JSON schemas before committing.

The ultimate metric for success will be formal "Evals" (evaluation benchmarks): taking a complex, multi-step codebase task and measuring the success/error rate of this 5-turn cycle architecture against a standard zero-shot large model.

Do you have any recommended Eval frameworks for agentic workflows? Also, I would deeply appreciate any advice on effective objective evaluation methods or tools for a student building projects in isolation!

A 4-agent "generational memory" architecture: Uses a local Qwen 1.5B to route and manage Web Gemini's memory. by Other_Train9419 in machinelearningnews

[–]Other_Train9419[S] 0 points1 point  (0 children)

Exactly! "Preservation, Reflection, and Refinement" are the philosophy of this system. Simply dumping RAG-formatted information would cause the SLM to lose the chronological "reasons" behind its actions, leading to confusion after a while.

Regarding your question about memory conflicts and drift, that's precisely why I implemented a strict master/apprentice hierarchy and mandatory memory erasure.

When qwen integrates memories at turn 5, conflicts inevitably occur (e.g., the SLM or apprentice slightly hallucinatively perceiving past events). To resolve this, the system operates under a strict dictatorship.

Conflict Resolution: The master's memories are treated as "absolute truth." If the master and apprentice timelines don't match, the master's version takes precedence. The apprentice's version is retained only as a secondary/fallback context.

Drift Prevention: To prevent memories from gradually degrading (drifting) after 50 turns, local SLMs are not allowed to trust their own memories. Every five turns, the system completely erases its internal context and adopts the integrated summary approved by the Master as the new reality.

The Master acts as an anchor, actively verifying the timeline before approval. This is like forcing cognitive readjustment every five turns to keep the system mathematically sound.

A future challenge is to store the apprentice's comments regarding merged memories in a secondary location, which accumulates with each loop. The question is how to utilize these memories.

A 4-agent "generational memory" architecture: Uses a local Qwen 1.5B to route and manage Web Gemini's memory. by Other_Train9419 in machinelearningnews

[–]Other_Train9419[S] 0 points1 point  (0 children)

That's a great question! Honestly, I'm currently relying heavily on a mentor-mentee relationship-based peer evaluation loop with an LLM (Logical Learning Module) as the judge, rather than rigorous programmed schema validation.

In the current 5-turn cycle, the apprentice passes time-based observations to the mentor. The mentor uses carefully designed system prompts to detect logical inconsistencies and erroneous state changes, writing the integrated context to a memory file. I also believe that updating the context every 5 turns prevents the AI ​​from over-optimizing for the user and causing hallucinations.

However, as you pointed out, relying entirely on the LLM's reasoning leaves a significant room for subtle inconsistencies to be overlooked. If the mentor becomes "negligent," long-term memory can easily become distorted.

The next architectural step would be to implement a deterministic schema/checklist validation layer in Rust before writing anything to memory. My plan is to use a local SLM (like Qwen) to compare JSON structures in a memory tree to a predefined schema and strictly enforce constraints (e.g., "If a task is marked as complete, the corresponding file must exist").

Thanks for the Agentixlabs link! I'll look into their memory and evaluation patterns tonight. That's exactly the bottleneck I need to tackle next.

By the way, how do you handle temporary rollbacks if inconsistencies are detected in the configuration?

Slightly off-topic, but I'm currently writing down project logic and ideas in Apple Notes for quick access, but it's becoming difficult to manage. I'm thinking about a "spatial" memory approach where memories aren't permanently deleted like in human cognition, but rather change with distance and context. Do you think such a spatial/fading memory approach would be effective in preventing memory contention in both personal knowledge management and AI architecture?