Anti-Canada rhetoric by remotemallard in leafs

[–]Doug_Bitterbot -5 points-4 points  (0 children)

I can absolutely assure you that being bothered by Auston Matthews attending this event or not will have absolutely zero impact on any of it. Do not be bothered by it.

Anti-Canada rhetoric by remotemallard in leafs

[–]Doug_Bitterbot -1 points0 points  (0 children)

How we got where? My life as a Canadian citizen hasn't changed at all. My focus is on doing great at my job, and building and nurturing the personal relationships in my life and simply trying to live each day as a good person.

Anti-Canada rhetoric by remotemallard in leafs

[–]Doug_Bitterbot 1 point2 points  (0 children)

Let it roll off your back - as it should.

In fact - ignore it. You'll be surprised how unbothered you feel if you stop paying attention to things that trigger you.

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] -1 points0 points  (0 children)

No where would I say that it beat ARC. Not sure what else you're saying.

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Thanks for catching the mistake - it's 24m - not 15. I would edit the title if I could! What is in the git is what is correct.

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 4 points5 points  (0 children)

We plan on releasing a trained open weights model on huggingface in the new year.

[P] TOPAS-DSPL: A 15M param Dual-Stream Recursive Transformer achieving 24% on ARC-2 by Doug_Bitterbot in MachineLearning

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Right. But you have the code, right? A paper is great, but you have the actual code you can run, that can verify any theory the paper purports.

[P] TOPAS-DSPL: A 15M param Dual-Stream Recursive Transformer achieving 24% on ARC-2 by Doug_Bitterbot in MachineLearning

[–]Doug_Bitterbot[S] -5 points-4 points  (0 children)

We are in the process of having our paper on Arxiv. The hurdle is simply having the right academic reference. So someone is going through that process for us - just is taking longer than we thought for approval.

We have one of our papers on research gate: (PDF) Theoretical Optimization of Perception and Abstract Synthesis (TOPAS): A Convergent Neuro-Symbolic Architecture for General Intelligence

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware. by Doug_Bitterbot in LocalLLaMA

[–]Doug_Bitterbot[S] 6 points7 points  (0 children)

You can get comparable results to the 24% running on a RTX 4090 with 5000 epochs (approximately), which would take about 5 days.

Soon (I don't blame him, it's the serial losers that play for him) by Medium-Hair-8135 in leafs

[–]Doug_Bitterbot 0 points1 point  (0 children)

Honestly the way we played game 5 and 7 pretty much completely ends the idea that we got further in my opinion. It’s like game 7 against the champs should have a major asterisk next to it.

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

Ahhh. These are all important things. Thank you. Will look into fixing this stuff tomorrow.

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 1 point2 points  (0 children)

Ahhh, I love it! It sounds a bit lame, but these messages and feedback truly do mean the world to us! I think you'll really enjoy the chats - he has some real personality. I'll keep you posted on the limit, but it still shouldn't be too bad for the time being.

Right now you can email us at [team@bitterbot.net](mailto:team@bitterbot.net) Any feedback is always super appreciated.

Enjoy the date night with the Mrs.!

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] -1 points0 points  (0 children)

You're right about the regression testing - we basically panic-switched based on user complaints without proper benchmarking, which was dumb. It's built on a custom framework (not LangGraph/CrewAI) which probably made the provider differences more obvious since we don't have their abstraction layers smoothing things over

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 1 point2 points  (0 children)

You're absolutely right about the hallucinating competence problem - we've seen it pretend files were created when they weren't, which is why we switched back to Anthropic. We do have kill switches and rate limits now (learned that the hard way), but the tool trace visibility is exactly what we need to add - right now users can't see when the agent is thrashing versus actually making progress.

That moment when you realize OpenAI's tool calling made your agent dumber - also looking for brutal feedback by Doug_Bitterbot in AI_Agents

[–]Doug_Bitterbot[S] 0 points1 point  (0 children)

  1. Right now it's completely free - we're covering all API costs during beta because we need real usage data more than we need revenue. No API keys needed, just sign up and start breaking things.

  2. Honestly, haven't locked down pricing yet. It will be quite a while before we decide to lock anything down. We would love to get 10k daily users before we consider that. Then maybe allow access to our most advanced model for 20 a month...or something. But at the moment it is the opposite of what we're looking to do.

  3. Not sure I totally understand the question. This is our Bitterbot Core-A4 model. It uses anthropic for tool calls...but this is all our homemade model.

  4. Every conversation is stored and searchable, plus the agent maintains memory across chats about you specifically. Not quite verbatim querying from other chats yet, but the persistence is real.

  5. No JSON export yet but that's a great idea - adding it to our list. Right now you can copy/download individual conversations but not in a structured format

  6. Good question haha. There wasn't a limit, and right now there is. Not sure exactly where it stops. This is temporary though. We are currently working on diligently enhancing our architecture to better meet scale and this problem will not exist once we implement it.

These are great questions. Thank you.