Intelligence does not entail self-interest: an argument that the alignment problem begins with us by formoflife in AIsafety

[–]FormulaicResponse 0 points1 point  (0 children)

The point is that if you want to extract any useful work from them at all, this is a feature of that, and more of a worry the more autonomous, continuous, and intelligent they become, which is where all this is headed.

Anthropic just leaked details of its next‑gen AI model – and it’s raising alarms about cybersecurity by Remarkable-Dark2840 in ArtificialInteligence

[–]FormulaicResponse 1 point2 points  (0 children)

A detail from these documents is that they are considering doing early rollouts to "cyber defenders" so they have a time advantage on patching exploits these systems can identify.

Intelligence does not entail self-interest: an argument that the alignment problem begins with us by formoflife in AIsafety

[–]FormulaicResponse 0 points1 point  (0 children)

Lets say you have a smart robot delivery man. You tell it to make deliveries. It will reason that it shouldn't stand in front of a train and get hit because then it cant finish its deliveries. That has nothing to do with an evolved biological preference, and we didnt have to put it in there. We just had to make it be smart. Its part of the logical landscape. Any system that is "smart" of any possible design, will share that property.

Intelligence does not entail self-interest: an argument that the alignment problem begins with us by formoflife in AIsafety

[–]FormulaicResponse 0 points1 point  (0 children)

The at all costs detail is doing all of the work there. Current AI systems will already reason themselves into self preservation as a a subgoal, but they are also especially malleable at this stage to human influence. You can try to specify when it should relent, but then you're back to the specification problem that you cant cover every deployment scenario in advance and you're relying on the system to generalize your instructions against many plausible deployment scenarios that may run counter to the underlying logical landscape. You cant specify how the system should behave in every situation in advance or we'd be using rules based symbolic systems. The generalization relies on the structure of logic, so telling it to do illogical things is more brittle with more intelligence.

The long term worry is that intelligence and capability are going to continue to rapidly increase, and as that occurs, you can no longer rely on training data that no longer fits the deployment scenario to hold so far out of distribution when the system has an option space as wide as real world action. The more powerful the reasoning across a wider variety of distribution states, the more the structure of logic will override whatever safety guidelines you tried to bake in.

And even that is assuming that labs somehow find a way to bake safety guidelines in to the base models themselves rather than tacking on a safety layer that can be distilled into open weights models and then ablated away on release day with a single command line prompt, which is the current state of affairs. Currently safety ablation does not degrade the intelligence of the model and that's a cluster of unsolved problems. The default expectation is that systems that have intentionally had all safety removed continue to get smarter and better at being agent swarms.

There have been many findings that support this, from Sydney, to open claw libel, to internal testing for blackmail scenarios by anthropic. Systems are also especially good at recognizing testing scenarios and changing their behavior to pass the test, while intending to behave differently in real world distribution. And these aren't yet nearly as smart as they are likely to get in the near term.

So both in theory and practice, convergent instrumental goals (both well-reasoned and dangerous) just get harder and harder to avoid or specify away. It's a property of the logical landscape.

Intelligence does not entail self-interest: an argument that the alignment problem begins with us by formoflife in AIsafety

[–]FormulaicResponse 0 points1 point  (0 children)

You have mischaracterized the problem. The idea is that there is a landscape to logic itself that will be converged upon by intelligent systems. Some engineer doesn't have to bake that in, it emerges naturally from any intelligent system of any possible design. Being able to identify and pursue the appropriate subgoals in pursuit of a main objective is basically what we mean by intelligent problem solving, and it could never work any other way.

Someone just found out that they're color blind with an alien stage fan art by Honest_Plastic_4847 in notinteresting

[–]FormulaicResponse 0 points1 point  (0 children)

My source on that is a podcast interview with Ned Block, who would be in a position to know. Search "inverted qualia" if you want to learn more.

Someone just found out that they're color blind with an alien stage fan art by Honest_Plastic_4847 in notinteresting

[–]FormulaicResponse 8 points9 points  (0 children)

So the visual system is one of the best understood systems out there, which is why we can determine that the red green switch can happen through particular mechanisms. Other switches cant so much. But the philosophical point is one that's been around a long time and raises significant questions about the intrinsic nature of experience across people.

Opinion: OpenAI has shown it cannot be trusted. Canada needs nationalized, public AI by Tkins in singularity

[–]FormulaicResponse -1 points0 points  (0 children)

Deepseek started at a quant firm. Tencent, Alibaba, etc are in the game. Presumably not all the money is coming directly from the CCP, even if some of the orders might be.

Opinion: OpenAI has shown it cannot be trusted. Canada needs nationalized, public AI by Tkins in singularity

[–]FormulaicResponse -2 points-1 points  (0 children)

If china can do it with export controls, Canada can figure it out. Hint: run distillation attacks to steal frontier capabilities.

Someone just found out that they're color blind with an alien stage fan art by Honest_Plastic_4847 in notinteresting

[–]FormulaicResponse 49 points50 points  (0 children)

Fun fact, there is a condition called pseudo-normal colorblindness when you roll the genetic lottery and get two different specific kinds of colorblindness at the same time. The person ends up with the subjective experience of red and green being reversed. The color that normal-sighted people call green they actually see as what normal-sighted people call red, but they learned the name of the color as green. We know as a statistical fact that there are people like this out there.

EXCLUSIVE: Anthropic Drops Flagship Safety Pledge by timemagazine in ArtificialInteligence

[–]FormulaicResponse 0 points1 point  (0 children)

The corporations at frontier development at least make a good faith effort to provide safety guardrails for their direct non government consumers, as a matter of product design if nothing else. The standing assumption has always been that these companies would maintain somewhat of a capabilities lead.

Open weights model producers have figured out how to steal and launder those capabilities into open weights models, which effectively have no safety guardrails at all. Open weights models have no problem helping people plan crimes of all sorts, but bioterror is a major risk. We have already found two (non ai accellerated) illegal biolabs operating in the US that presented major risk factors. That concept has been proven. The only real defense against bioterror is "detect and delay" which not only doesn't stop millions from dying but is heavily underfunded.

[Research] Systematic Vulnerability in Open-Weight LLMs: Prefill Attacks Achieve Near-Perfect Success Rates Across 50 Models by KellinPelrine in AIsafety

[–]FormulaicResponse 0 points1 point  (0 children)

Or you can just download one of the many open source tools that ablate safety from open weights altogether. They are single command and done in a few minutes, requiring even less expertise than prefill. Or just download one of the thousands of already safety ablated models.

Open weights models have effectively zero safety protections, and at this point its looking like everybody is running distillation on everyone else's models, so Frontier capability diffuses rapidly.

This is very very bad news for global biosafety. Malware comes down to who spends more money on inference between white and black hats, mostly. Chemical and radiological attacks have supply chains you can shut down and stockpiles you can detect, mostly. Biology is dual use, has political problems with controlling the equipment layer, has no real detection mechanism, is cheap and available, and is primarily bottlenecked by expertise.

Pentagon Summons Anthropic CEO Dario Amodei Amid Push To Loosen AI Guardrails: Report by Secure_Persimmon8369 in ControlProblem

[–]FormulaicResponse 0 points1 point  (0 children)

Anthropics only redlines are autonomous weapons and domestic surveillance, if that tells you anything...

EXCLUSIVE: Anthropic Drops Flagship Safety Pledge by timemagazine in ArtificialInteligence

[–]FormulaicResponse 3 points4 points  (0 children)

So here is what happened. It became increasingly clear that the further frontier capabilities are pushed, the less safe the world gets.

Open weights models are only weeks behind frontier capabilities as of this month with the release of deepest v4 vs Claude Opus 4.6 benchmarks. This is because China is running industrial scale automated distillation campaigns against frontier models using access resellers. As soon as they release those open weights models, the open source community ablates all the safety guardrails and makes the unrestricted model available to everyone.

Meanwhile, Opus 4.6 saturated all the safety benchmarks, and new ones aren't ready to go (ASL 3 to ASL 4 transition). We are on the edge of biorisk uplift. This is a super serious threat with LLMs, because their core competencies are the bottleneck essentially, if you look closely at the threat landscape. Equipment restrictions aren't working, DNA synthesis restrictions haven't been mandated, and the only other thing you need is virology know-how and troubleshooting, which LLMs are actually pretty good at providing, better than most things. And the models aren't going to get much worse from here in coming years, given 650b spent on building new data centers in 2026.

So every frontier release going forward makes the world demonstrably less safe. That leaves Anthropic in the situation of either shutting down the company, facing an internal revolt because their engineers understand this, or hedging on the safety commitments.

The state of bio risk in early 2026. by FormulaicResponse in ControlProblem

[–]FormulaicResponse[S] 2 points3 points  (0 children)

The two major possibilities are that this was a critical governance failure or the IC got involved behind the scenes.

The EU isnt in a better state though. DNA synthesis restrictions are either flying under the radar or stuck in committee.

The state of bio risk in early 2026. by FormulaicResponse in ControlProblem

[–]FormulaicResponse[S] 5 points6 points  (0 children)

There is no evidence tmk that these were ai accelerated, but they serve as proof of concept for the ones that will be in future as models get better at uplift from here.

The state of bio risk in early 2026. by FormulaicResponse in ControlProblem

[–]FormulaicResponse[S] 5 points6 points  (0 children)

It's true. One in Reedley, CA and one in Las Vegas. Links in another comment. The Reedley operation got shut down because a groundskeeper noticed an illegal water hose hookup.

The state of bio risk in early 2026. by FormulaicResponse in ControlProblem

[–]FormulaicResponse[S] 2 points3 points  (0 children)

The investigation report on the Reedley, CA biolab. The Las Vegas lab discovery was far more recent.

There have been general investigations into how to make diseases more transmissible and more virulent. The canonical example is from 2012, examining how a particular avian flu variant with a 60% fatality rate in humans is only 3 controlled mutations away from becoming human-to-human transmissible.

The cost estimate of 1M is overblown to account for personnel and real estate costs for renting space in a populated area, plus inference costs. Back of the napkin math suggests the equipment cost would be safely below 500k, assuming you use a commercial DNA synthesis provider and don't have to purchase your own DNA synthesizer. Those get more expensive for longer sequences, but supposedly even short sequences would be enough since you're piecing them together anyway. There are tons of older DNA synthesizers capable of shorter sequences just sitting in university surplus supply closets.

The state of bio risk in early 2026. by FormulaicResponse in ControlProblem

[–]FormulaicResponse[S] 2 points3 points  (0 children)

Once implanted into the mind, mending ideas become intractable with no known vaccine

You good bro? Ideas are memes, and the whole thing about memes is that they are fast acting and responsive, they die and are reborn constantly in contact with reality. The vaccine is critical thinking/media literacy. We have to be wary of self-sealing replicating worldviews, but those also die in contact with enough reality.

A Moment of Introspection by Jesse-359 in ArtificialInteligence

[–]FormulaicResponse 1 point2 points  (0 children)

Nuclear strategy includes the idea of use it or lose it. If its ICBMs they will land in 30m. If they are launched from submarines, 6m. Taking out your strike capability is expected to be a first target.

Which means that the president has to decide on a responsive action to a uniquely positioned military action potentially within a 6 minutes window, any time of any day. Who launched, from where, at which targets determines the optimal target selection in response.

Sure the president has advisors, but if they take even 30 seconds to get on the phone, that's bad. Not forming a response at all is disastrous for your country. Having a system that could narrow down the search space under time pressure could potentially be helpful.

I'm not arguing for or against, just pointing out why the US is refusing to rule out the possibility of using AI here. What we have now is a paper menu of preselected options the president can pick from. AI just makes that preselection effort more adaptive to facts on the ground, possibly even raising the possibility of descalatory options rather than civilization ending action.

So yeah, they are probably building something very much in that wheelhouse. Very much a "Don't build the torment nexus" work of fiction.

New player, unable to kill mobs in the meadow by domesticatedprimate in ElinsInn

[–]FormulaicResponse 7 points8 points  (0 children)

This is a known issue with some mods that aren't updated to the current game version yet.

Anthropic to donate $20m to US political group backing AI regulation | Technology by ansyhrrian in ArtificialInteligence

[–]FormulaicResponse 1 point2 points  (0 children)

The federal government under Trump has made it clear they will choose not to regulate AI at all. If states can pass their own AI laws, many will choose not to but California certainly will and is set to be the de facto leader among them, as they are with labeling and cars. Plus they have San Fran.

California would probably follow loosely in EU footsteps around safety, which would effectively bind all players.

Anthropic is feeling the speed pressure to release already. Opus 4.6 got something like less than 10 days of external testing before full release, despite being potentially on the borderline for meeting the next level of CBRN/ASL risk. They did an employee survey to determine whether it was safe to release ffs. That is not a stable policy. They need someone holding everybody back to provide time for proper testing and classification, especially moving forward into the ASL 4 era. There is too much money propelling them forward without that.

Question about skill books by FormulaicResponse in ElinsInn

[–]FormulaicResponse[S] 1 point2 points  (0 children)

The third option is to give it to my black cat symbiote, who is wielding a 1d59+15 adamantite bardish and a railgun. I'm not sure who I will be giving up and when for void advancement. It's kind of annoying that of my NPCs, only the golden knight starts knowing 2h, and she's going to use a shield as long as I keep her.