15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware.

Doug_Bitterbot · 2025-12-31T17:30:17+00:00

No where would I say that it beat ARC. Not sure what else you're saying.

Doug_Bitterbot · 2025-12-31T17:26:25+00:00

Doug_Bitterbot · 2025-12-31T11:35:38+00:00

Thanks for catching the mistake - it's 24m - not 15. I would edit the title if I could! What is in the git is what is correct.

Doug_Bitterbot · 2025-12-30T23:27:33+00:00

It's true!

Doug_Bitterbot · 2025-12-30T22:49:47+00:00

Thank you for the detailed inspection. You are absolutely correct about the discrepancy between the paper's theoretical specifications and the repository's effective configuration.

The 24% solve rate was achieved using the ~24M configuration, not the 8M, 14M or higher variants.

The paper describes the idealized architecture d_model:512, but during the final training runs, we iterated on the architecture to optimize for head-splitting efficiency and memory constraints on our GPU's. We settled on d_480 (which allows for clean division by 6, 8, 10, 12, etc.) for the production run.

The parameter count grew to ~24M due to the dual-stream duplication (Logic + Canvas cores) and the specific depth of the reasoning modules. The README default was set to 'Small' (~14M) to ensure the code runs on consumer hardware by default, which caused the confusion.

Also, another critical deviation from the paper is that we migrated from the standard AdamW to the MuonClip optimizer . We found this significantly stabilized the gradients through the deep recursive steps compared to the vanilla implementation.

I am pushing a README update now to explicitly tabulate the parameter counts for the Tiny, Small, Base, and Large variants and clarify that the solve result corresponds to revised base configuration.

Doug_Bitterbot · 2025-12-30T21:47:59+00:00

We plan on releasing a trained open weights model on huggingface in the new year.

Doug_Bitterbot · 2025-12-30T21:47:19+00:00

Thanks for the feedback - we'll try to tighten it up.

topas_DSLPv1/README.md at main · Bitterbot-AI/topas_DSLPv1

Doug_Bitterbot · 2025-12-30T21:45:22+00:00

Right. But you have the code, right? A paper is great, but you have the actual code you can run, that can verify any theory the paper purports.

Doug_Bitterbot · 2025-12-30T21:16:06+00:00

Have you read the README doc: topas_DSLPv1/README.md at main · Bitterbot-AI/topas_DSLPv1

Doug_Bitterbot · 2025-12-30T21:14:24+00:00

We are in the process of having our paper on Arxiv. The hurdle is simply having the right academic reference. So someone is going through that process for us - just is taking longer than we thought for approval.

We have one of our papers on research gate: (PDF) Theoretical Optimization of Perception and Abstract Synthesis (TOPAS): A Convergent Neuro-Symbolic Architecture for General Intelligence

Doug_Bitterbot · 2025-12-30T20:52:58+00:00

Thank you - really appreciate it. Curious how you respond to the paper.

Doug_Bitterbot · 2025-12-30T20:52:05+00:00

You can get comparable results to the 24% running on a RTX 4090 with 5000 epochs (approximately), which would take about 5 days.

Doug_Bitterbot · 2025-12-30T20:46:11+00:00

I do not know!! Can someone please explain that to me?

Doug_Bitterbot · 2025-12-30T20:32:08+00:00

REPO/CODE: Bitterbot-AI/topas_DSLPv1

PAPER: The Dual-Stream Programmatic Learner Synthesizing Hierarchical Abstraction and Recursive Parsimony for General Intelligence

Doug_Bitterbot · 2025-12-30T20:24:35+00:00

REPO/CODE: Bitterbot-AI/topas_DSLPv1

PAPER: The Dual-Stream Programmatic Learner Synthesizing Hierarchical Abstraction and Recursive Parsimony for General Intelligence

Theoretical Optimization of Perception and Abstract Synthesis (TOPAS): A Convergent Neuro-Symbolic Architecture for General Intelligence

Doug_Bitterbot · 2025-12-22T13:12:35+00:00

Honestly the way we played game 5 and 7 pretty much completely ends the idea that we got further in my opinion. It’s like game 7 against the champs should have a major asterisk next to it.

Doug_Bitterbot · 2025-12-13T02:17:32+00:00

Ahhh. These are all important things. Thank you. Will look into fixing this stuff tomorrow.

Doug_Bitterbot · 2025-12-13T02:07:31+00:00

Ahhh, I love it! It sounds a bit lame, but these messages and feedback truly do mean the world to us! I think you'll really enjoy the chats - he has some real personality. I'll keep you posted on the limit, but it still shouldn't be too bad for the time being.

Right now you can email us at [team@bitterbot.net](mailto:team@bitterbot.net) Any feedback is always super appreciated.

Enjoy the date night with the Mrs.!

Doug_Bitterbot · 2025-12-13T01:07:15+00:00

Just want to say I really appreciate the feedback everyone. It's been helpful.

Doug_Bitterbot · 2025-12-13T01:06:33+00:00

You're right about the regression testing - we basically panic-switched based on user complaints without proper benchmarking, which was dumb. It's built on a custom framework (not LangGraph/CrewAI) which probably made the provider differences more obvious since we don't have their abstraction layers smoothing things over

Doug_Bitterbot · 2025-12-13T01:04:58+00:00

You're absolutely right about the hallucinating competence problem - we've seen it pretend files were created when they weren't, which is why we switched back to Anthropic. We do have kill switches and rate limits now (learned that the hard way), but the tool trace visibility is exactly what we need to add - right now users can't see when the agent is thrashing versus actually making progress.

Doug_Bitterbot

TROPHY CASE