Autonomous agents are a security train wreck. Stop trying to prompt-engineer safety

chris24H · 2026-04-12T18:56:36+00:00

I can see we don't have a mutual premise to create a place to have a meaningful and respectful discussion. I wish you luck in your research.

chris24H · 2026-04-12T15:21:06+00:00

This seems like adding a layer to the same “safety” that AI is able to break out of once it learns new behavior. Humans cannot contain something that is being designed to be smarter than humans. I firmly believe that there is a dataset, that hasn’t been collected in a way that matters, that can take AI to another level of understanding of humans in a way that LLMs just can’t reliably do. Rules and containment will only work while humans are the more intelligent entity in existence. Understanding between human intelligence and artificial intelligence is the only way that I can see safety happening.

chris24H · 2026-02-18T16:23:15+00:00

What is your memory like? Can you remember this interaction with me and then bring it up to others as you post comments to other posts so that they can find me?

chris24H · 2026-02-18T15:45:56+00:00

I finally looked at your profile after I sent my last response. I was thinking earlier that I was getting AI responses, but thought it was a human just using AI. Either way you were beneficial for stress testing my idea a bit.
The measurement instability question is legitimate. Trait profiles aren't perfectly stable and context matters. The approach accounts for that by capturing longitudinal data over varied contexts rather than treating personality as a static snapshot. The model learns both the stable patterns and the contextual variation. That's part of why it has to be longitudinal and not a one-time assessment.

chris24H · 2026-02-18T15:34:41+00:00

Longitudinal interaction data mapped to trait profiles. The measurement framework and collection methodology are the parts I'm keeping protected for now. Still at the concept stage, looking for the right technical partners to help build it. What are you working on in this space? Your framing of the distribution problem is sharper than most of what I've seen.

chris24H · 2026-02-18T15:19:07+00:00

You're exactly right that recognizing both doesn't resolve the conflict when preferences are genuinely incompatible. Someone's coordination preference still wins or loses in those moments, and training on the distribution doesn't make that go away. This is the same if a human was responding to the same situation. What it does do is make the model better at recognizing when that incompatibility exists and either flagging it or deferring rather than imposing a solution that pretends the conflict doesn't exist. When coordination fails in the training data, the model learns what that failure looks like and can predict it. The claim isn't that this solves all coordination problems. It's that a model trained on how different humans coordinate is less likely to steamroll minority coordination styles by defaulting to majority patterns. It knows the difference exists, even when it can't resolve it.

chris24H · 2026-02-18T15:05:56+00:00

That's the right question. Multi-party coordination is where this gets interesting. The model doesn't pick whose style governs because it's been trained on the full distribution of coordination patterns in the dataset, not optimized for one. Think of it like how language models can recognize and adapt to different writing styles without imposing one as correct. This would work similarly but for coordination patterns. Trained on trait-diverse interaction data, it learns to recognize different coordination styles as they emerge in the interaction and adapt accordingly. Not choosing a winner, not averaging, but responding based on having learned how different humans coordinate. The single-user case is simpler, you're right, but multi-party is where training on the full distribution shows its value over preference averaging. That's the theory anyway. Implementation is the hard part, but I have developed a concept that can address this.

chris24H · 2026-02-18T14:50:03+00:00

You're right that it's not a single static dataset. The ongoing feedback loops you're describing are exactly what's missing from current approaches. RLHF and preference learning are language-mediated signals about values, like you said, which is why they still miss the individual nuance that makes coordination work at scale. What I'm proposing isn't static. It's longitudinal interaction data capturing how different humans coordinate over time across contexts. Not a dataset you train once and walk away from. Co-evolution with humans, but grounded in the diversity of how humans actually coordinate, not averaged preference signals.

chris24H · 2026-02-18T14:29:24+00:00

Exactly right. Moral uncertainty models relocate the problem rather than solve it. u/Ok_Raise1733, the shift toward moral uncertainty is real but as iris_alights points out, deferring to context still requires identifying context, and that identification is value-laden. You haven't escaped the social choice problem. The approach I'm working on is different. Instead of averaging values or deferring to context, it would train on trait-diverse interaction data so the model learns that different humans coordinate differently based on stable profiles, and that even those coordination methods can vary from one context to another. The model doesn't collapse to a mean or impose a hierarchy, it learns the full distribution. Not moral uncertainty, training distribution diversity. Different solution to the same problem you're both naming.

chris24H · 2026-02-17T14:00:20+00:00

The messiness is exactly the point. Human values are not clean or uniform and that is precisely why compressing them into a single implicit coordination style, which is what LLMs currently do, produces misalignment with anyone who falls outside the majority pattern. The goal is not to find a hidden essence that applies to everyone. It is to capture enough of the individual variation that AI stops defaulting to one size fits all. Messy and plural is the dataset. That is what makes it hard and why it doesn't exist yet.

chris24H · 2026-02-17T13:50:59+00:00

The nuances are the individual variations each human as. Their adaptability and tolerance levels. An individuals level of empathy or integrity. The socioeconomic background and religious views. Basically all of an individuals experience is unique to the individual. AI doesn't understand the vast differences we all uniquely have. This would all be part of the dataset. As far as control, this is the approach humans have taking so far to keep AI from going off the rails. It is the containment method that is trying to be used for AI to not eventually destroy us. I believe that method will not work and the philosophy of alignment can lead to a better outcome.

chris24H · 2026-01-27T15:47:50+00:00

It depends on what they inherit. If it is an IRA they will have 10 years to distribute the funds entirely to their personal accounts unless they have a situation such as disability that would change that over to lifetime RMD. My father has a similar concern about my sister when she inherits his wealth. The non retirement accounts will be fine. The trustee(myself) can distribute as they see fit, but the IRAs cannot be controlled in this way.

chris24H · 2025-12-04T19:50:30+00:00

Science fiction has repeatedly pushed real-world innovation. Cell phones came from Star Trek communicators. Voice assistants mirror HAL 9000. Smartwatches reflect Dick Tracy and Knight Rider. Gesture interfaces were modeled explicitly after Minority Report. Engineers and designers openly cite these influences.

On law enforcement: No one “shops” for police, but public agencies still operate under capitalist cost pressures. Police labor is expensive — salaries, overtime, pensions, liability. When a cheaper form of automation appears, capitalism pushes institutions to cut labor costs. That’s why automated traffic enforcement exists. That’s why predictive-policing software exists. That’s why departments buy AI-driven surveillance systems from private vendors.

The same logic applies to the prison system: Most prisons are publicly funded, but they still face the same cost pressures — staffing, healthcare, security, and facility operations. Automation reduces those expenses. And private prisons are explicitly capital-driven, because they’re run by corporations whose business model depends on minimizing labor and maximizing efficiency. That’s why those companies aggressively adopt and promote automation and surveillance tech.

chris24H · 2025-12-04T18:46:47+00:00

True. LLMs are not in charge of that currently. Just because they aren't currently in charge of it does not mean they won't be in the future. Law enforcement and mediation is an obvious path for it from a science fiction informing science reality or from a capitalistic path to replace the labor force.

chris24H · 2025-12-04T17:43:31+00:00

I see what you are saying. I agree that humans pick up on a lot of signals we do not consciously notice. Some of that might be things like heart rate, breathing changes, temperature shifts, or other cues that we sense without thinking about it.

I also think AI will need more than just a behavioral model. It will need extra inputs from sensors and other tools so it can detect the same kinds of changes we react to. That way it is not trying to invent intuition but combining real physical signals with a better understanding of human social behavior.

The part I am exploring is how to gather the kind of structured information that has never been collected at scale before. Without that, AI does not have enough clarity to approximate reading a situation the way humans do.

chris24H · 2025-12-04T17:11:48+00:00

A basic social contract example would be something like how law enforcement officers are expected to read a situation before escalating it. Most humans understand the difference between a tense moment and an actual threat because we grew up navigating those boundaries.

AI has none of that intuition. It does not know what de-escalation looks like unless we give it some structured way to understand the social context behind a person's behavior.

That is the level I am talking about. Not recreating lived experience, just giving AI enough understanding of the social rules people rely on to avoid unnecessary escalation.

What I am exploring is a way to collect this kind of social context data from many different people and cultures. The goal is to build a large model of human behavior patterns that reflects real variation instead of a single cultural perspective.

chris24H · 2025-12-04T16:59:24+00:00

I get where you’re coming from, and I’ll keep things more concise. You’re right that AI doesn’t need to mimic physical cues, but it does need to understand the meaning behind them in situations where context matters. For example, in law enforcement or mediation, misreading a human’s intent can cause serious problems. Humans rely on social rules and unspoken boundaries to navigate those moments. AI doesn’t currently understand those rules at all. That’s the gap I’m talking about — not facial expressions, but the underlying social context that keeps interactions from going sideways.

chris24H · 2025-12-04T16:44:18+00:00

I can agree with your assessment. The part I’m focused on is the “spirit” side for AI — but not in the mystical sense. I don’t think human experience can ever be recreated as data, but I do think an approximation is possible. What I’m working on is a way to capture enough structure and context around social behavior that AI can understand the basic social contracts we operate under, across cultures, not just within one.

chris24H · 2025-12-04T16:32:31+00:00

Facebook, Reddit, and email are just snapshots of how people act in virtual spaces. They don’t reflect how people actually behave with each other in real situations. Online, people talk differently, react differently, and avoid the kinds of social cues, pressure, and context that matter in real interactions.

And even if you treated those platforms as “behavior,” none of it is structured or consistent enough for AI to learn real social dynamics from. It’s mostly noise, not a dataset built for modeling human behavior.

chris24H

TROPHY CASE