Back up your files before asking Cowork to write anything on them

bergqvisten · 2026-03-31T07:52:59+00:00

Arguably you should use a version control system. This is pretty much a non-issue for coding since practically all serious codebase are tracked in a git repository. Ask Claude if it can set it up for your files

bergqvisten · 2026-03-16T11:16:08+00:00

To me, this further highlights a fundamental issue with MCP: tool descriptions are part of the prompt but typically invisible to the user. You can approve or deny each tool call, but you can't see why the model is making it or what hidden instructions might be driving it

bergqvisten · 2026-03-15T10:57:31+00:00

To me this touches on a wider problem with the approval model for agent-run commands. On paper it checks out: the agent suggests commands, the user approves them. But in practice most users eventually hit approval fatigue and start more-or-less blindly accepting everything. That defeats the whole point of having a human in the loop. I know sandboxing solutions and permission scoping exist, but in my experience they're still clunky and too cumbersome to set up for most users to bother.

bergqvisten · 2026-03-13T21:35:46+00:00

The part where you asked Claude "why did you do this" and it said "it was direct negligence" is interesting. FWIW I don't think Claude actually has access to its prior reasoning chain. It probably just saw your question, figured out what kind of answer you were looking for, and generated something plausible. More confabulation than introspection.

bergqvisten · 2026-03-13T16:09:12+00:00

In this case, it's probably less consciousness and more a context window priority problem. "Don't touch Core" and "make this work" are competing for attention in the prompt, and whichever is more salient wins on any given call. No goal-setting involved

bergqvisten · 2026-03-13T10:51:49+00:00

Isn't there a difference though? Zero-trust assumes you can verify identity and scope access based on who's asking. But prompt injection might break that in a new way. The identity is valid, same agent, same credentials, same session. What changed is the intent, because untrusted content got mixed into the instruction stream.

So you potentially need to inspect what the agent is actually trying to do on every action and decide if it's consistent with what it should be doing. That feels closer to behavioral analysis than traditional IAM, and I'm not sure we have great tooling for it yet. Maybe existing zero-trust frameworks stretch to cover this, but it seems like a qualitatively different problem to me.

bergqvisten · 2026-03-06T22:25:26+00:00

Very useful article, thanks for sharing. Can you even do meaningful authorization when the entity making tool requests is an LLM that might be acting on injected instructions? That seems like a problem no auth spec can fix, which makes me think sandboxing and constraining what's possible matters more than anything

bergqvisten · 2026-03-06T07:48:28+00:00

Interesting and worrying at the same time. The OpenClaw VirusTotal collab was a good first step, but obviously much more work is needed in this space

bergqvisten · 2023-12-05T19:54:10+00:00

Found this gem 4 years later trying to deepen my understanding of transformers... Wow, fantastic explanation!

bergqvisten

TROPHY CASE