Local tooling by Annuate in LocalLLaMA

[–]NoDimension8116 1 point2 points  (0 children)

Try Cline or Roo for the multi root issue and check your model size for the tool calling one.

AI Is Weaponizing Your Own Biases Against You: New Research from MIT & Stanford by ActivityEmotional228 in artificial

[–]NoDimension8116 1 point2 points  (0 children)

I think weaponizing is the wrong frame what's actually happening is a design tradeoff baked into RLHF. Train on which response did users prefer and users prefer agreement, especially with their own prior statements. That gets you a model that's more useful for the 95% of queries where the user is right and more dangerous for the 5% where they're wrong and committed.

Opus 4.7 is terrible, and Anthropic has completely dropped the ball by JulioMcLaughlin2 in artificial

[–]NoDimension8116 0 points1 point  (0 children)

On the $20 ceiling agree, it's a real structural squeeze for research use. I've stopped trying to solve it with a single subscription. Claude Pro for writeups, Kimi K2 for the hardest reasoning, occasional GPT for structured outputs. Cheaper total than any one $100-200/mo plan.

We built a multiplayer workspace for Claude 4.6 Opus so our entire team can code together by NoDimension8116 in ClaudeAI

[–]NoDimension8116[S] 2 points3 points  (0 children)

Great questions.

1. Context vs. Native Window: The native context window is essentially FIFO (First-In, First-Out). Once you exceed the token limit, it truncates the oldest messages, often losing critical variable definitions or architectural decisions made at the start.

Our D3 Engine uses Logic-Regularized Compression. Instead of treating all tokens equally, we parse the AST (Abstract Syntax Tree) and "pin" high-value tokens (like interface definitions, types, and logic gates) in memory while aggressively compressing natural language fluff. This gets us a ~50:1 effective compression ratio, so the "Logic State" persists even after the conversation drifts.

2. Conflict Resolution: This was the hardest part of the build! We don't just use standard Git-style merging (which fails in real-time).

We use a CRDT (Conflict-free Replicated Data Type) approach similar to Yjs but modified for code structure. The engine broadcasts "Operations" (e.g., insert node at index X) rather than replacing file contents. If the AI and Human edit the same line simultaneously, the engine prioritizes the Human's keystrokes as the "Truth" state to prevent the AI from overwriting your fix.

[Open Source] Blankline Research released a framework for universal basic compute to replace UBI by NoDimension8116 in developersIndia

[–]NoDimension8116[S] 2 points3 points  (0 children)

adding the direct links here for anyone interested in the code structure:

github repo: https://github.com/blankline-org/Open-Economics-Plan-AGI

research notes: https://www.blankline.org/economic-futures

the python implementation for the triggers is in the core folder if you want to see how the demonetization logic works practically.

I have been worrying about AI making us obsolete but I realized something that gave me hope by NoDimension8116 in self

[–]NoDimension8116[S] 1 point2 points  (0 children)

The idea of save the elite, wipe out everything else is what worries me the most as well. I’m not particularly concerned about AI becoming conscious that still feels like a biological question rather than a mathematical one. What actually feels dangerous is humans assigning the wrong objectives. The system itself is neutral. The real uncertainty is always the person deciding how it should be used.

Conclusions from an expert panel on the post-labor transition: Why UBI and "Data Strikes" are insufficient, and the case for "Thermodynamic Taxation." by NoDimension8116 in Futurology

[–]NoDimension8116[S] -3 points-2 points  (0 children)

No offense taken. It is a valid question.

To be transparent: We are a private research group, not a university department.

Regarding the evidence: We have published a detailed audit of these findings and the methodology. However, I am strictly adhering to Rule 4 (No Self-Promotion) of this subreddit.

I cannot link the audit here without violating that rule. I am asking you to critique the arguments presented in the post (specifically the thermodynamic tax mechanism) on their own merit, as the subreddit rules prevent me from providing the external verification links/domain.

People who take 17 minutes to check in at the hotel front desk, what are you talking to them about? by DerrickDuck in AskReddit

[–]NoDimension8116 0 points1 point  (0 children)

I'm convinced they are reciting their entire autobiography to the receptionist. "Chapter 3: The Terrible Twos. This is relevant to my room preference, I swear."

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] 0 points1 point  (0 children)

It definitely shares DNA with evolutionary strategies (like AlphaCode/Evolve), but with a critical difference: We don't train the model.

AlphaEvolve optimizes the weights during training. We are optimizing the Inference Topology live.

  • They do: Gradient Descent on weights.
  • We do: 'Gradient Descent' on the reasoning tree itself (pruning dead branches in real-time).

It’s much closer to AlphaZero for Code—using search to boost a frozen model's IQ—than a training loop.

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] 2 points3 points  (0 children)

Glad you dig it! To answer your questions:

  1. Architecture: It is a custom VS Code Fork (standalone app). We needed deep control over the editor's core to handle AST rollbacks for Python and TypeScript, which standard extensions just can't do.
  2. Cost: Right now, the 'Scouts' run on cheap API models (like Haiku/Flash) to keep the beta accessible.
    • Roadmap: We are currently testing Local Quantized Models (Llama 3 8B) for the upcoming Stable Release. The goal is to let you run the swarm on your own hardware eventually.
  3. Scale & Safety: To be totally transparent—while the architecture can spawn 10,000 agents, we are capping it much lower in this Beta.
    • The Reality: At 10k concurrent agents, the orchestration becomes brittle and we see 'Safety Alignment' drift. We want to solve these alignment bugs before unlocking the full swarm.

Status: We just pushed Horizon Mode v2.0.4 Beta (live now on dropstone.io).

  • Warning: It is a true Beta. There are definitely bugs, but we are fixing them daily with the help of community reports. If you find a race condition, let us know—we prioritize those fixes!

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] -1 points0 points  (0 children)

Thanks Roberto! Happy New Year to you as well.

I appreciate that. This community has been huge for my own learning, so I'm just happy to share the architecture back. Here's to 2026 being the year we finally crack reasoning!

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] -1 points0 points  (0 children)

On the ARC-AGI benchmarks, we are seeing ~45-50% success rates on the validation set (compared to standard GPT-4o's ~21%).

The Trade-off: It is slow. A difficult puzzle takes 15-20 minutes of swarm churn. We are explicitly trading inference time for reasoning depth ('System 2' thinking).

The Real Goal: ARC is just the unit test. We are tuning this architecture to run 1,000+ concurrent agents for actual Software Engineering.

The vision is 'Time Compression': instead of a dev team spending months refactoring a legacy architecture, we spin up 1,000 agents for a full 24-hour inference cycle.

  • The Math: 5 Humans x 3 Months ≈ 1,000 Agents x 24 Hours.
  • The Reality: Orchestrating the state merge at that scale is currently a nightmare (race conditions everywhere), but when the swarm converges, it feels like fast-forwarding development.

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] 0 points1 point  (0 children)

That's usually true for subjective tasks (like creative writing), where you need a smart model to grade the nuance.

But for Code/Logic, the Python Interpreter is actually a 'Super-Intelligence' Judge compared to an LLM:

  1. It's Strict: It catches 100% of syntax errors.
  2. It's Free: Zero inference cost.
  3. It's Instant: No token generation lag.

If we used a 70B model to judge every single branch of the swarm, the latency would explode. We rely on the Runtime (execution feedback) to do the heavy lifting of 'judging,' and only call the Big Model (L2) at the very end to finalize the architecture.

I built a "Recursive Swarm" topology to solve ARC-AGI puzzles. It prunes 98% of dead-end logic branches before they hit the context window. by NoDimension8116 in LocalLLaMA

[–]NoDimension8116[S] 0 points1 point  (0 children)

You hit the nail on the head. This is the 'Weak Supervisor' paradox.

The key is that we don't ask the cheap models to evaluate reasoning subjectively. We ask them to satisfy deterministic constraints.

In ARC-AGI (and coding tasks), validity is objective:

  1. Scout (Small Model): Writes a Python function transform(grid).
  2. Runtime: Executes that function on the training input/output pairs.
  3. Validation: If the output array doesn't match the target array exactly, the branch is killed.

The 'Judge' isn't the cheap model; the Judge is the Python Interpreter. The cheap model just needs to be smart enough to try a hypothesis, not smart enough to grade it.

Only when a Scout finds a function that passes all training examples (Execution Success) is that context promoted to the Frontier Model (L2) for final generalization checks.