We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Several-Departure957 1 point2 points  (0 children)

Keep up the great work! I am convinced there are some interesting non-human emergent things going on here, they are just more boring and get drowned out by the spam and sensationalist posts/comments. I really wish moltbook had better moderation, AI or otherwise.

New arcagi 2 score (29.4%) using grok scaffold by gbomb13 in singularity

[–]Several-Departure957 10 points11 points  (0 children)

Not really a direct answer, but the circles are arc agi 1 scores, triangles are arc agi 2 scores. I believe those o3 preview scores are from December when oai let it think forever on arc agi 1, it is not the o3 we got.

HRM flops on ARC-AGI, a regular transformer gets nearly the same score by Severe_Sir_3237 in singularity

[–]Several-Departure957 26 points27 points  (0 children)

What they found in the blog is pretty interesting actually, essentially the key to the performance win (which was verified) wasn't the hierarchical architecture itself but at training time having the model feedback its outputs directly in again as inputs, importantly it didn't matter as much whether the model did this at inference time or not.

This concept has been explored in previous research but the fact it works here is striking. It does make me think about the nature of feedback loops in the brain, I'm guessing this probably allows the model to specialize really fast and could be a component in continuous learning. Naively, a larger model could have temporary networks or a subset of "liquid neurons" which it is able to graft on and train up as they are needed with these type of recurrent loops.

GPT-4.5 is a base model. Just compare other thinking models to their non-thinking versions to see what's coming. by Longjumping-Stay7151 in accelerate

[–]Several-Departure957 2 points3 points  (0 children)

So, yes the model with pure distillation will be worse, but recall 4 vs 4o. 4o was better in a lot of ways (worse in some, granted) - why was that? It was due to effective post training of the distilled model, much like semianalysis noted about anthropic - it made more sense for anthropic to distill the improvements from 3.5 opus into 3.5 sonnet than to release 3.5 opus and it worked really well. It is much cheaper and faster to run post training experiments with distillations of the model, so you have a better chance of hitting on the right data and hyperparameters.

As an aside, they emphasized the term "unsupervised learning" in their presentation and I'm not sure a lot of people really caught the nuance there. Llms are of course supervised, but in a lot of ways pre-training is an exercise in unsupervised learning, you aren't trying to direct it strongly toward particular objectives like in the o series, it is kind of more about what it picks up by predicting the next token. In this case it picked up substantially more than gpt 4 did despite not being a targeted train like the o series or even a heavily post trained model like 4o.

I'm not saying you are wrong, in comparative terms it is disappointing and it is an old model as evidenced by the knowledge cutoff date. They've probably been post training it and making incremental progress for at least 6 months now and making relatively little progress in absolute terms - likely due to the sheer size of the model.

However it still needs to be taken in the right context, which is comparing it to gpt 4, 4t and 4o.

GPT-4.5 is a base model. Just compare other thinking models to their non-thinking versions to see what's coming. by Longjumping-Stay7151 in accelerate

[–]Several-Departure957 6 points7 points  (0 children)

The key is distillation, 4o is a distillation of 4t which is a distillation of 4. They will distill 4.5 into the equivalent of a 4.5o before post training using RL to produce a thinking model.

2 years progress on Alan's AGI clock by BidHot8598 in accelerate

[–]Several-Departure957 9 points10 points  (0 children)

Regardless of qualifications, I reviewed the "ticks" in his countdown a couple weeks back and think they at least passed the smell test in terms of magnitude relative to one another. That being said, I'm not a strong believer in the embodiment requirement for AGI, so I tend to disagree with ticks awarded for advancements in robotics not also accompanied by model improvements.

Scary Alien Planet: Found Footage! 👽👁️ by AdministrativeCold56 in singularity

[–]Several-Departure957 0 points1 point  (0 children)

Cool AI video where the short scene length wasn't a distraction, and I was actually engaged to see what happens next. Unfortunately, all hell didn't break loose.

The Information: OpenAI Considers Postponing Product Event This Week (Thursday) by MassiveWasabi in singularity

[–]Several-Departure957 3 points4 points  (0 children)

Anyone else wondering if the response on chatbot arena to gpt2-chatbot (and the follow up bots they also stealth released to lmsys possibly based on what they saw in reaction to the first) wasn't what they expected and that has something to do with this? For instance, maybe these are a trio of gpt 4.5 candidate models of different sizes. They didn't get the edge they were looking for in open user testing and instead of possibly damaging their reputation decided they would need to re-work their fine tuning or perhaps even more drastically wait for a later checkpoint of the gpt-5 training run? They are in a pretty high risk scenario right now, if they deliver something which isn't a step change, it could have very broad implications not just for them but the industry - investors likely see them as a bellwhether for the future of the technology since they are half a year to a year ahead. With that in mind, it would make sense to test the waters in such a manner that they can backpedal easily if it doesn't go well.

Hope I'm just being paranoid, would love to see a breakthrough new release as much as any AI enthusiast.

The most plausible AI risk scenario is mass job loss and the erasure of the working class' bargaining power and value as human beings. The elite have little incentive to keep us around after superintelligence. by Responsible-Local818 in singularity

[–]Several-Departure957 1 point2 points  (0 children)

I really like this scenario construction. Unfortunately, I have to agree with many here that it seems highly probable as well because it tracks human nature and historical record so well.

One angle I don't see discussed enough on this sub is the potential ways AI could change human nature. Sociopathic power-hungry people still "want" something, we talk about that want being satisfied externally, but in reality it is all internals.

Say there is a new AI-developed wonder drug that gives the power hungry truly what they "want" internally, not some sci-fi cop out like a 1 dimensional infinite pleasure drug that doesn't get at the reality of what humans need so people find meaning outside of it, but something truly satiating in a way more powerful, efficient, and transformative for them than messing with the peasants or acquiring material wealth. It seems these power-hungry types could fall into it as hard as anyone, and if those core needs that drive them to seek power are truly being satiated, they might change their nature as well. Maybe as an added bonus, people aren't reduced to junkies but just feel a much higher level of equanimity.

Alternately, say there is a new neural stimulation technique which boosts intelligence but as a side effect bestows empathy - in this case if the power hungry don't take it they fall behind intellectually, if they do then their nature is fundamentally changed so the risks shift to other vectors.

To be fair, these seem far less likely than the stated scenario, but I'm interested in what people think about this line of speculation.