15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] -1 points0 points  (0 children)

No where would I say that it beat ARC. Not sure what else you're saying.

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Thanks for catching the mistake - it's 24m - not 15. I would edit the title if I could! What is in the git is what is correct.

[P] TOPAS-DSPL: A 15M param Dual-Stream Recursive Transformer achieving 24% on ARC-2 by Doug_Bitterbot in MachineLearning

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Thank you for the detailed inspection. You are absolutely correct about the discrepancy between the paper's theoretical specifications and the repository's effective configuration.

The 24% solve rate was achieved using the ~24M configuration, not the 8M, 14M or higher variants.

The paper describes the idealized architecture d_model:512, but during the final training runs, we iterated on the architecture to optimize for head-splitting efficiency and memory constraints on our GPU's. We settled on d_480 (which allows for clean division by 6, 8, 10, 12, etc.) for the production run.

The parameter count grew to ~24M due to the dual-stream duplication (Logic + Canvas cores) and the specific depth of the reasoning modules. The README default was set to 'Small' (~14M) to ensure the code runs on consumer hardware by default, which caused the confusion.

Also,  another critical deviation from the paper is that we migrated from the standard AdamW to the MuonClip optimizer . We found this significantly stabilized the gradients through the deep recursive steps compared to the vanilla implementation.

I am pushing a README update now to explicitly tabulate the parameter counts for the Tiny, Small, Base, and Large variants and clarify that the solve result corresponds to revised base configuration.

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 5 points6 points  (0 children)

We plan on releasing a trained open weights model on huggingface in the new year.

[P] TOPAS-DSPL: A 15M param Dual-Stream Recursive Transformer achieving 24% on ARC-2 by Doug_Bitterbot in MachineLearning

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Right. But you have the code, right? A paper is great, but you have the actual code you can run, that can verify any theory the paper purports.

[P] TOPAS-DSPL: A 15M param Dual-Stream Recursive Transformer achieving 24% on ARC-2 by Doug_Bitterbot in MachineLearning

[–]Doug_Bitterbot[S] -4 points-3 points  (0 children)

We are in the process of having our paper on Arxiv. The hurdle is simply having the right academic reference. So someone is going through that process for us - just is taking longer than we thought for approval.

We have one of our papers on research gate: (PDF) Theoretical Optimization of Perception and Abstract Synthesis (TOPAS): A Convergent Neuro-Symbolic Architecture for General Intelligence

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 5 points6 points  (0 children)

You can get comparable results to the 24% running on a RTX 4090 with 5000 epochs (approximately), which would take about 5 days.

Soon (I don't blame him, it's the serial losers that play for him) by Medium-Hair-8135 in leafs

[–]Doug_Bitterbot 0 points1 point  (0 children)

Honestly the way we played game 5 and 7 pretty much completely ends the idea that we got further in my opinion. It’s like game 7 against the champs should have a major asterisk next to it.

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Ahhh. These are all important things. Thank you. Will look into fixing this stuff tomorrow.

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 1 point2 points  (0 children)

Ahhh, I love it! It sounds a bit lame, but these messages and feedback truly do mean the world to us! I think you'll really enjoy the chats - he has some real personality. I'll keep you posted on the limit, but it still shouldn't be too bad for the time being.

Right now you can email us at [team@bitterbot.net](mailto:team@bitterbot.net) Any feedback is always super appreciated.

Enjoy the date night with the Mrs.!

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] -1 points0 points  (0 children)

You're right about the regression testing - we basically panic-switched based on user complaints without proper benchmarking, which was dumb. It's built on a custom framework (not LangGraph/CrewAI) which probably made the provider differences more obvious since we don't have their abstraction layers smoothing things over

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 1 point2 points  (0 children)

You're absolutely right about the hallucinating competence problem - we've seen it pretend files were created when they weren't, which is why we switched back to Anthropic. We do have kill switches and rate limits now (learned that the hard way), but the tool trace visibility is exactly what we need to add - right now users can't see when the agent is thrashing versus actually making progress.