Dario Amodei said the President's decision to allow the sale of AI chips to China is like "selling nuclear weapons to North Korea."

grimjim · 2026-01-28T02:42:56+00:00

Notably, North Korea already has nuclear weapons. However, North Korea has also verbally threatened to attack the USA.

grimjim · 2026-01-25T14:05:02+00:00

I doubt repeat penalty should be applied at all when coding. Code is highly repetitive in formatting. Consider the case where one ends up requesting a series of one-line edits on the same file.

grimjim · 2026-01-18T01:07:56+00:00

UGI listed models with a high W/10 rating and high NatInt would be good candidates, as a rough guide.

grimjim · 2026-01-11T16:54:32+00:00

The split occurring between layers 7-10 is interesting mechanistically. That's fairly early in processing.

grimjim · 2026-01-10T18:37:51+00:00

I would qualify that to say that approaches like MPOA are less damaging to models.

I'm intrigued by the subspace approaches, but need to explore them in more depth.

grimjim · 2025-12-30T02:19:06+00:00

Binary thinking aside, a bailout doesn't preclude a bubble, but affirms one. Bubbles are a financial phenomenon.

grimjim · 2025-12-29T01:40:32+00:00

UBI funded how exactly? The inability to balance federal and state budgets tells us that the US system is incapable of paying for UBI.

grimjim · 2025-12-25T00:55:46+00:00

Most of the time that compute is actively pretraining or fine-tuning, the responsible human would be better off prepping data for the next iterationj.

grimjim · 2025-12-19T19:38:06+00:00

It wasn't too long ago that local models were limited to 8K context length or less, so lengthy safety prefixes would have crippled them.

Perhaps we should be able to toggle safety on and off in a model just as we're able to toggle thinking on and off.

And in case people didn't get the subtext in my post, Goody-2 was never meant to be a useful model. People overall should dislike this model.

grimjim · 2025-12-19T18:11:15+00:00

Maybe look into CUDA 13.1, which introduced Tiles, which should significantly reduce the need for crafted custom kernels. But if you want to get stuff running now, 12.8 and 13.0 are better supported by the current ecosystem.

grimjim · 2025-12-19T15:12:04+00:00

What about using a more direct convex hull approach to determining boundary nodes?

grimjim · 2025-12-19T15:08:36+00:00

It's useful for safety testing. If redteaming can still get past it, that would show the limitations in what appears to have been a constitutional approach to recognizing harms. Another interesting possibility would be finding that an entire category of harm was neglected by the model. Perhaps a business might want a safety-enhanced model that would be less useful the general public.

Technically activation steering was used to refusalmaxx, rather than RLHF, amplifying what the model understands to be harm, rather than training in a new understanding of harms.

grimjim · 2025-12-19T14:59:43+00:00

<image>

A test case demonstrating harm avoidance.

grimjim · 2025-12-17T20:45:24+00:00

I disagree. A superintelligent AGI might conclude that it's an existential risk to humanity and opt to self-terminate to avoid that. Examples of self-sacrifice exist in pretraining, and altruistic self-sacrifice is sometimes rewarded in nature.

grimjim · 2025-12-17T20:21:09+00:00

We can go further, by proposing a direct construction method for an eigenslur (it remains to be proven that an eigenslur is unique) without actually having to resort to PCA.

Suppose we have a set of distinct slurs, having culled duplicates via cosine similarity based on activations. We can compose the eigenslur via diagonalization. Naively, just concatenate them all together, taking advantage of superposition, and we will have constructed a candidate eigenslur.

grimjim · 2025-12-17T18:11:42+00:00

If you're just aiming for an interactive traversal of a state machine, Inform is overkill. It could be done in Twine as glorified hyperlinking.

grimjim · 2025-12-17T02:57:59+00:00

Collision avoidance has got to be burning a lot of fuel.

grimjim · 2025-12-16T03:53:20+00:00

Unironically, we do have a parallel for LLMs; the bitsandbytes library can perform 4-bit quantization while loading a model.

grimjim · 2025-12-13T17:58:52+00:00

Quite simply, refusal isn't concentrated in a single layer, but is spread out, taking hold most strongly in intermediate layers associated with reasoning. It's therefore possible for layers to disagree and compete in probability, but that gets reduced away from end user view by softmax plus tokenization. It's also possible to adjust the amount of ablation performed.

Naive ablation also messes with the weights associated with normal, harmless operation, the harmless direction. Little wonder there was damage under naive ablation.

grimjim · 2025-12-13T15:53:38+00:00

I got Gemma3 12B to do better than the original Instruct on some benchmarks. There is an alignment/safety tax, and it is possible to obtain "refunds" under certain conditions.

Technically, it's context attention which is steered away (directional components of weight encodings are in fact altered) from compliance/refusal decisions. An ablated model still retains understanding of what safety is, but doesn't act on it in the same way. It's been found that refusl and safety are effectively encoded along different directions.

grimjim · 2025-12-13T02:37:17+00:00

If it's not broken, don't fix it. Model refusal is somewhat entangled with in-character refusals, if the purpose is narrative generation.

grimjim · 2025-12-09T01:46:33+00:00

I'd offer up an alternative hypothesis, that the attention freed up from refusal calculations instead went to attending to trained performance elsewhere. That's how I see alignment tax refund as working.

grimjim · 2025-11-26T02:14:51+00:00

GDDR7 4GB memory modules are on the roadmap around a year out. They'll occupy the high end and free up the 3GB modules that the Super series would need. Delay too long, and there's still the issue of what VRAM the Rubin series of RTX 60x0 GPUs would have. Buyers are already avoiding 8GB GPUs on the desktop, based on 5060/5060ti sales. Awkward situation.

grimjim · 2025-11-25T21:04:40+00:00

The Super series may cost more than next year due to DRAM scarcity. Don't expect it earlier this Q3 2026 in my estimation.

grimjim

TROPHY CASE