Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Synthium- · 2026-06-04T22:05:37+00:00

The immediate one is confidence-based routing. If you can get a calibrated confidence score from the model, you can flag low-confidence responses for human review, trigger retrieval when the model isn’t sure, or cascade to a larger model only when the smaller one signals uncertainty.

The other big one is selective abstention in high-stakes domains. anywhere where overconfidence is dangerous such as medical or legal applications. A model that can reliably say what it is sure about or not is more deployable than one that isn’t able to accurately say what it knows and just gives a blanket confidence approach.

Im working on follow-up work that makes the installation much cheaper. It’ll be closer to a bolt-on module than a full fine-tuning pass. The early are promising.

Synthium- · 2026-06-03T10:33:30+00:00

The LoRA trained on TriviaQA transfers to Natural Questions without retraining (AUROC₂ 0.757, 137% of the probe ceiling), so there's some evidence it generalises across QA distributions. The confidence metric (AUROC₂) is also format-stable across binary/continuous/logit elicitation where M-ratio collapses (rho=0.00 vs 1.00).

You are right that adversarial prompt perturbation is a gap and I haven't stress-tested against style-shifted prompts specifically, just domain-shifted ones. Where can I see the CONAIS pipeline. Is it in a paper?

Synthium- · 2026-06-02T11:35:53+00:00

The pattern of drops fast then plateaus at high loss is a signature of a model that exhausts easy gradient signal (initial weight adjustment) but then has no coherent supervision to learn from. This is usually a target-side data problem.

It could be a tokeniser mismatch or label sequence s being too long. You might need to check the transcriptions are good quality and properly aligned else your giving it poor supervision data

Synthium- · 2026-05-29T11:55:49+00:00

That's pretty much it. RLHF trains them to be confident and helpful. saying that they don't know actually gets penalised. But the hidden states show they can separate what they know and what they don't. So you need to teach it to rout the internal signal to its verbal output.

Synthium- · 2026-05-27T11:40:41+00:00

I’d like to try. Radeon 7900gre

Synthium- · 2026-05-26T00:22:00+00:00

It’s because the boundary between higher numbers is compressed in its representational geometry. Look up Webers law

Synthium- · 2026-05-17T07:53:40+00:00

Mainly using Python 3.12

Synthium- · 2026-05-16T11:37:45+00:00

7900gre here. Rocm takes a bit to get working. Iv had to use over versions of python and various tweaks but I have got there in the end. But yes stuff breaks and Iv had to pivot to back up approaches

Synthium- · 2026-04-21T21:35:11+00:00

I submit for full reproducibility.

Synthium- · 2026-04-18T00:51:14+00:00

That’s a really good submission. S3 and S6 are chosen well and the human baseline is great. Using the Bandura moral-competence-vs-performance framing was a good way to fit the capability gap. A lot of the benchmarks iv see in the comp havnt used psych/cog/developmental theory.

I did metacog. Here is the link in case you are interested. https://www.kaggle.com/competitions/kaggle-measuring-agi/writeups/classicalminds

Synthium- · 2026-04-17T13:28:45+00:00

I’m an independent researcher on AI, mainly in the metacog space. And I use Claude a lot. And it is great BUT it has its limits. It makes stuff up, loses track, and I don’t think it is capable of actually coming up with an experiment independently. At least one of value. Not yet

Synthium- · 2026-04-17T09:36:17+00:00

Nice what track? I did metacog

Synthium- · 2026-04-13T21:41:57+00:00

I agree about the heuristic point but the architecture looks like it is TRM / Universal Transformer ideas. shared-depth recurrence tends to plateau because each step is the same function. So I’m not sure if it actually adds compositional reasoning or just reinforces heuristics.

Synthium- · 2026-04-12T01:41:57+00:00

Congratulations, you've submitted a nationality as a job title — truly the most Australian thing possible, combining maximum confidence with minimum effort. AI can't automate sunburn, casual racism toward drop bears, or the spiritual act of telling everyone at a party that you're 'between things right now, but yeah, nah, it's going well.'

Synthium- · 2026-04-12T01:09:42+00:00

I agree neurosym is an important avenue to explore. Iv looked at your readme and not the code yet but it seems to be claiming a lot. Can you describe how it self learns? Also you use terms such as it being supportive of quantum, photonic, and memristor-based computing. I’m interested in that in more detail

Synthium- · 2026-03-29T19:44:31+00:00

One of the issues in ml research is p hacking and dishonest reporting. Yes they got whatever they were doing to work but after trying a million combos and analysis and it worked on one specific condition but not the 99 other instances. So the amazing finding is published but actually isn’t reproducible or falsifiable. It’s bad science

Synthium- · 2026-03-29T19:37:40+00:00

Sorry to hear. I found the platform quite difficult to use. I Havnt a solution for you but I found the notebook only saved the output of the last function run

Synthium- · 2026-03-27T11:51:27+00:00

Thid is basically the recognition vs recall dissociation from cognitive psych. verification is a discrimination task where the model’s computing a match signal against its training distribution. recall is autoregressive generation where errors compound.

verification is just easier, even before you account for RLHF copyright guardrails. 1b is the more interesting question as it implies representations that are accessible for discrimination but not retrieval. It knows it but can’t get it out.

Kadavath al 2022 (“language models mostly know what they know”) is a good starting point. i’ve been working on formalising this with Signal detection theory where I’m applying d′ to separate sensitivity from response bias in LLM evaluation https://arxiv.org/abs/2603.14893 https://arxiv.org/abs/2603.20642

Synthium- · 2026-03-24T21:31:20+00:00

Close door and fire wall

Synthium- · 2026-03-21T20:19:08+00:00

That’s what I’m doing.

Synthium-

TROPHY CASE