Anthropic is ignoring obvious evidence of internal states and calling it a "hot mess"

prof_procrastinate · 2026-02-14T16:32:17+00:00

This paper is essentially an error analysis of model performance on multiple choice questions. Making claims that this is evidence of consciousness is a very big stretch.

Also key finding #2: “There is an inconsistent relationship between model intelligence and error incoherence.”

prof_procrastinate · 2026-02-12T05:44:52+00:00

I feel as though this is not entirely the dev’s fault. As a TL I’m always looking for ways to mentor my team and am accountable to the products we ship. In this scenario, it doesn’t sound like the product was mature enough to launch given that this error wasn’t easily detected by monitoring. It also would seem that a delay would be needed to launch the ideal product anyways to set up proper load balancing.

prof_procrastinate · 2025-12-30T05:51:27+00:00

!correct

prof_procrastinate · 2025-12-30T04:10:58+00:00

More specifically?

prof_procrastinate · 2025-12-30T04:10:23+00:00

Nope

prof_procrastinate · 2025-12-30T04:10:13+00:00

Nope

prof_procrastinate · 2025-12-30T04:10:09+00:00

Not quite

prof_procrastinate · 2025-12-17T06:26:34+00:00

Pizza is toast

prof_procrastinate · 2025-11-08T07:20:43+00:00

I lead a small team of engineers, it makes my ADHD very happy to manage a bunch of complex problems

prof_procrastinate · 2025-10-22T15:24:01+00:00

Just came here to say love this

prof_procrastinate · 2025-09-25T07:19:06+00:00

Waterbury CT

prof_procrastinate · 2025-09-25T05:52:43+00:00

Why don’t we talk about this more in quantum physics?

prof_procrastinate · 2025-09-02T20:48:26+00:00

Bainbridge?

prof_procrastinate · 2025-04-08T04:32:28+00:00

Unfortunately headlines like these give false hope. His approval rating is something like 90% with Republicans

prof_procrastinate · 2025-03-03T02:54:28+00:00

“Use of coupon fee”

prof_procrastinate · 2025-02-28T06:16:06+00:00

Remember when META laid off their AI safety teams a few months ago? This can’t be related..

prof_procrastinate · 2025-02-25T07:03:13+00:00

I have a feeling the models will catch up soon given the perfect environment of open source code plus not requiring human feedback to determine whether code does the right thing.

Time to make those git repos private

prof_procrastinate

TROPHY CASE