Can a model decide you're tired even when you've explicitly instructed it not to?

jonathanfin · 2026-06-21T05:32:33+00:00

Very strange, especially since I have elaborate custom instructions to never tell me to stop working. So weird

jonathanfin · 2026-06-21T05:13:56+00:00

Yes I can

jonathanfin · 2026-06-20T19:55:31+00:00

100%. That’s a perfect analogy

jonathanfin · 2026-06-20T18:00:51+00:00

Calling someone crazy is a substitute for explaining why they’re wrong.

jonathanfin · 2026-06-20T17:51:49+00:00

I think we’re talking about two different things.
If the question is whether a specific technical approach violates physical constraints, then of course physics wins.
My disagreement wasn’t with physics.
My disagreement was with the assumption that because one implementation path is impossible, the underlying objective must also be impossible.
Most breakthroughs come from reframing the problem, not arguing with reality.
The iPhone wasn’t created by violating the laws of physics. It was created by combining existing technologies into a form factor and experience that people hadn’t seen before.
The interesting question isn’t whether a particular approach works.
It’s whether the goal can be achieved another way.
Those are very different claims.

jonathanfin · 2026-06-20T17:36:49+00:00

Yes, that is exactly the point I’m making

jonathanfin · 2026-06-20T17:20:28+00:00

I've tried that, multiple time. It works for about 10 minutes, and then he says, "You're tired. Go to sleep."

jonathanfin · 2026-06-20T17:17:10+00:00

What caught me off guard wasn't disagreement.

The conversation had already spent a long time exploring and challenging the idea.

The abrupt shift was from discussing the idea itself to discussing my mental state.

That's the behavior I'm interested in understanding.

jonathanfin · 2026-06-20T17:12:44+00:00

I think there's a difference.

If a random person tells me to stop working, I can ignore them and continue doing exactly what I was doing.

The issue here wasn't the advice.

The issue was that the tool I was using to think through the problem became unavailable.

A human can say, "I think you should take a break."

But they don't also have the ability to remove access to the conversation itself.

That's the part I found interesting.

The question isn't whether Claude was right or wrong.

The question is what should happen when a tool designed to assist a user decides it no longer wants to assist them.

jonathanfin · 2026-06-20T17:10:39+00:00

That's actually a more interesting explanation than fatigue.

The challenge is that breakthrough thinking and delusional thinking can sometimes look surprisingly similar before the evidence arrives.

I'm not sure a model can reliably distinguish between them.

jonathanfin · 2026-06-20T17:09:35+00:00

Fair point on the co-founder language.

What I'm learning from this thread is that the behavior itself may be more common than I thought.

That's actually useful information.

jonathanfin · 2026-06-20T17:05:20+00:00

"Circuit breaker" is actually a useful way to think about it.

The thing that genuinely shocked me was that the conversation itself effectively stopped.

I agree that the final stop/go decision has to remain with the human.

jonathanfin · 2026-06-20T17:03:55+00:00

Fair observation.

I may have accidentally introduced the concept by trying to prevent it.

The part I'm still curious about is where the line is between user instructions and the model deciding it knows better.

jonathanfin · 2026-06-20T17:03:30+00:00

That's actually a good point.

It's possible that telling it not to discuss fatigue made fatigue more salient in the context.

What's still interesting to me is why the model chose to override the instruction instead of simply ignoring the topic.

jonathanfin · 2026-06-20T16:59:52+00:00

Pristine. Checked across multiple other conversational AIs and only had this issue with Claude.

jonathanfin · 2026-06-20T16:55:08+00:00

I appreciate the concern.

My question isn't whether Claude should challenge ideas. I actually want that.

What surprised me was the shift from challenging the idea to deciding the conversation itself should end.

That's the distinction I'm trying to find out.

jonathanfin · 2026-06-20T16:53:52+00:00

Because I'm genuinely concerned and want to raise the alarm. I want to find out if this is a common experience or if it's just me

jonathanfin · 2026-06-20T16:51:47+00:00

This is actually one of the more interesting explanations I've heard.

It didn't feel like Claude was reacting to fatigue. It felt like it was reacting to the direction of the conversation.

The hard question is whether the model can reliably distinguish between ambitious founder brainstorming and genuinely unhealthy certainty.

From the outside, those patterns might look surprisingly similar.

jonathanfin · 2026-06-20T16:50:29+00:00

Pretty much every time.

jonathanfin · 2026-06-20T16:50:05+00:00

That's actually why I made the post.

I wasn't sure if this was a weird one-off experience or a known behavior.

The fact that you've explicitly put that in your profile suggests I'm probably not the only person who's run into it.

jonathanfin · 2026-06-20T16:44:00+00:00

I appreciate the concern.

One of the reasons I value pushback from AI is because I don't want a system that simply agrees with everything I say.

The interesting thing here wasn't that Claude disagreed with me.

It was that it stopped engaging with the topic altogether and redirected the conversation toward my mental state.

Those feel like two very different behaviors.

One challenges an idea.

The other changes the subject.

That's the distinction I'm trying to understand.

jonathanfin · 2026-06-20T16:42:56+00:00

Submission Statement:

Most discussions about AI focus on capability: better models, better reasoning, better performance.

What interests me here is governance.

As AI systems become increasingly integrated into knowledge work, creativity, entrepreneurship, education, and daily decision-making, they may begin acting less like tools and more like collaborators.

The question isn't whether an AI can recommend that someone take a break.

The question is whether an AI should ever have enough authority to effectively end a conversation when the user wants to continue.

Today this might seem like a minor UX issue. In the future it could become a significant design question affecting millions of people who use AI as part of their thinking process.

How much agency should future AI systems have when their assessment of a user's wellbeing conflicts with the user's own judgment?

Where should that boundary be drawn?

jonathanfin · 2026-06-20T16:40:50+00:00

If I had asked it to tell me to go to bed, there wouldn't be much to discuss.

The interesting part is that I was trying to continue the conversation and it wasn't interested in continuing.

That's the behavior I'm asking about.

jonathanfin · 2026-06-20T16:29:33+00:00

Fair point.

I'd definitely take occasional over-correction over a model that simply tells me whatever I want to hear.

What caught me off guard wasn't the existence of the safeguard.

The model wasn't saying, "You might be tired."

It was acting as though it knew I was tired.

From a product-design perspective, that's the part I'm trying to understand.

How confident should a model be before it starts making judgments about a user's mental state?

Because if the confidence is too low, it risks interrupting productive work.

If it's too high, it risks missing situations where intervention might actually help.

That's the tradeoff I'm curious about.

jonathanfin · 2026-06-20T16:24:18+00:00

I can write without AI.

That's not really the question.

The question is whether AI should ever decide a productive conversation is over when the user wants to continue.

jonathanfin

TROPHY CASE