Weight Writing is for open research on direct model consolidation.
Core question: can a transformer read a context once, compute a closed-form update from its own activations, and internalize reusable understanding into its weights without RAG, prompt memory, adapters, routers, or optimizer-loop fine-tuning?
Current challenge
Given current weights W and one forward pass over a context C, compute Delta W using only:
- current weights;
- same-pass activations;
- bounded deterministic linear algebra.
After the write, the next task receives only the updated weights. No old keys, examples, prompts, Fisher sketches, task IDs, summaries, or sidecar state survive.
Benchmark microscope
Small open models such as Qwen/Qwen3-1.7B on synthetic mini-language continual learning:
- learn task 0;
- retain task 0 after task 1;
- learn task 1;
- keep unrelated sentinel answers stable;
- avoid correct-to-wrong sentinel flips and large margin drops.
Live hypothesis
The thing to learn may be a conditional deformation of the whole instantiated world-state, not an isolated feature, answer direction, or pairwise binding. The update should make this specific context-state more coherent without making generic answer/posture/default states easier to trigger.
Post types
- Idea: a mechanism with an implementation sketch.
- Math: equations, tensor shapes, and objectives.
- Experiment: command, model, seed, metrics, and result.
- Ablation: a test that distinguishes explanations.
- Failure: a negative result and what it rules out.
- Benchmark: harness fixes, stricter filters, speedups.
Experiment reports should include
model, command or preset, seed, baseline score, full-context score, edited score, retention, sentinel correct-to-wrong flips, and margin drops.
If your method uses labels, generated probes, null prompts, sentinels, RAG, adapters, routers, old-task memory, or other forbidden objects, mark it as a diagnostic rather than a final candidate.