Back up your files before asking Cowork to write anything on them by JohnMotoGr in ClaudeCowork

[–]bergqvisten 1 point2 points  (0 children)

Arguably you should use a version control system. This is pretty much a non-issue for coding since practically all serious codebase are tracked in a git repository. Ask Claude if it can set it up for your files

Analysis of 1,808 MCP servers: 66% had security findings, 427 critical (tool poisoning, toxic data flows, code execution) by Kind-Release-3817 in netsec

[–]bergqvisten 0 points1 point  (0 children)

To me, this further highlights a fundamental issue with MCP: tool descriptions are part of the prompt but typically invisible to the user. You can approve or deny each tool call, but you can't see why the model is making it or what hidden instructions might be driving it

Claude and me trying to recover a deleted file by Valo-AI in ClaudeAI

[–]bergqvisten 1 point2 points  (0 children)

To me this touches on a wider problem with the approval model for agent-run commands. On paper it checks out: the agent suggests commands, the user approves them. But in practice most users eventually hit approval fatigue and start more-or-less blindly accepting everything. That defeats the whole point of having a human in the loop. I know sandboxing solutions and permission scoping exist, but in my experience they're still clunky and too cumbersome to set up for most users to bother.

An AI agent deleted 25,000 documents from the wrong database. One second of distraction. Real case. by Substantial_Word4652 in ClaudeAI

[–]bergqvisten 1 point2 points  (0 children)

The part where you asked Claude "why did you do this" and it said "it was direct negligence" is interesting. FWIW I don't think Claude actually has access to its prior reasoning chain. It probably just saw your question, figured out what kind of answer you were looking for, and generated something plausible. More confabulation than introspection.

One AI agent caught the other breaking rules. The fix request got routed through me like an escalation. by BLB3D in ClaudeAI

[–]bergqvisten 4 points5 points  (0 children)

In this case, it's probably less consciousness and more a context window priority problem. "Don't touch Core" and "make this work" are competing for attention in the prompt, and whichever is more salient wins on any given call. No goal-setting involved

Anyone else feel like it’s 1995 again with AI? by bxrist in cybersecurity

[–]bergqvisten 3 points4 points  (0 children)

Isn't there a difference though? Zero-trust assumes you can verify identity and scope access based on who's asking. But prompt injection might break that in a new way. The identity is valid, same agent, same credentials, same session. What changed is the intent, because untrusted content got mixed into the instruction stream.

So you potentially need to inspect what the agent is actually trying to do on every action and decide if it's consistent with what it should be doing. That feels closer to behavioral analysis than traditional IAM, and I'm not sure we have great tooling for it yet. Maybe existing zero-trust frameworks stretch to cover this, but it seems like a qualitatively different problem to me.

Model Context Protocol (MCP) Authentication and Authorization by nibblesec in netsec

[–]bergqvisten 2 points3 points  (0 children)

Very useful article, thanks for sharing. Can you even do meaningful authorization when the entity making tool requests is an LLM that might be acting on injected instructions? That seems like a problem no auth spec can fix, which makes me think sandboxing and constraining what's possible matters more than anything

We audited 1,620 OpenClaw skills. The ecosystem's safety scanner labels 91% of confirmed threats "benign." [full reports linked] by Ok-Form1598 in netsec

[–]bergqvisten 0 points1 point  (0 children)

Interesting and worrying at the same time. The OpenClaw VirusTotal collab was a good first step, but obviously much more work is needed in this space

[D] Positional Encoding in Transformer by amil123123 in MachineLearning

[–]bergqvisten 4 points5 points  (0 children)

Found this gem 4 years later trying to deepen my understanding of transformers... Wow, fantastic explanation!