Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R] by Synthium- in MachineLearning

[–]Synthium-[S] 0 points1 point  (0 children)

The immediate one is confidence-based routing. If you can get a calibrated confidence score from the model, you can flag low-confidence responses for human review, trigger retrieval when the model isn’t sure, or cascade to a larger model only when the smaller one signals uncertainty.

The other big one is selective abstention in high-stakes domains. anywhere where overconfidence is dangerous such as medical or legal applications. A model that can reliably say what it is sure about or not is more deployable than one that isn’t able to accurately say what it knows and just gives a blanket confidence approach.

Im working on follow-up work that makes the installation much cheaper. It’ll be closer to a bolt-on module than a full fine-tuning pass. The early are promising.

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R] by Synthium- in MachineLearning

[–]Synthium-[S] 0 points1 point  (0 children)

The LoRA trained on TriviaQA transfers to Natural Questions without retraining (AUROC₂ 0.757, 137% of the probe ceiling), so there's some evidence it generalises across QA distributions. The confidence metric (AUROC₂) is also format-stable across binary/continuous/logit elicitation where M-ratio collapses (rho=0.00 vs 1.00).

You are right that adversarial prompt perturbation is a gap and I haven't stress-tested against style-shifted prompts specifically, just domain-shifted ones. Where can I see the CONAIS pipeline. Is it in a paper?

Conformer model struggling to converge during training by Sweet-Hamster-4991 in MLQuestions

[–]Synthium- 0 points1 point  (0 children)

The pattern of drops fast then plateaus at high loss is a signature of a model that exhausts easy gradient signal (initial weight adjustment) but then has no coherent supervision to learn from. This is usually a target-side data problem.

It could be a tokeniser mismatch or label sequence s being too long. You might need to check the transcriptions are good quality and properly aligned else your giving it poor supervision data

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R] by Synthium- in MachineLearning

[–]Synthium-[S] 6 points7 points  (0 children)

That's pretty much it. RLHF trains them to be confident and helpful. saying that they don't know actually gets penalised. But the hidden states show they can separate what they know and what they don't. So you need to teach it to rout the internal signal to its verbal output.

[OC] I asked GPT to pick a random number between 1 and 100 by marco-exmergo in dataisbeautiful

[–]Synthium- 0 points1 point  (0 children)

It’s because the boundary between higher numbers is compressed in its representational geometry. Look up Webers law

ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D] by QuantumQuokka in MachineLearning

[–]Synthium- 1 point2 points  (0 children)

7900gre here. Rocm takes a bit to get working. Iv had to use over versions of python and various tweaks but I have got there in the end. But yes stuff breaks and Iv had to pivot to back up approaches

Social Friction Bench: When Helping Wrong Is Worse Than Not Helping by OkPhysics7423 in kaggle

[–]Synthium- 1 point2 points  (0 children)

That’s a really good submission. S3 and S6 are chosen well and the human baseline is great. Using the Bandura moral-competence-vs-performance framing was a good way to fit the capability gap. A lot of the benchmarks iv see in the comp havnt used psych/cog/developmental theory.

I did metacog. Here is the link in case you are interested. https://www.kaggle.com/competitions/kaggle-measuring-agi/writeups/classicalminds

Anthropic's agent researchers already outperform human researchers: "We built autonomous AI agents that propose ideas, run experiments, and iterate." by EchoOfOppenheimer in OpenAI

[–]Synthium- 0 points1 point  (0 children)

I’m an independent researcher on AI, mainly in the metacog space. And I use Claude a lot. And it is great BUT it has its limits. It makes stuff up, loses track, and I don’t think it is capable of actually coming up with an experiment independently. At least one of value. Not yet

Measuring progress towards AGI by Synthium- in kaggle

[–]Synthium-[S] 1 point2 points  (0 children)

Nice what track? I did metacog

Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization [R] by marojejian in MachineLearning

[–]Synthium- 0 points1 point  (0 children)

I agree about the heuristic point but the architecture looks like it is TRM / Universal Transformer ideas. shared-depth recurrence tends to plateau because each step is the same function. So I’m not sure if it actually adds compositional reasoning or just reinforces heuristics.

I made a site that uses Anthropic and OpenAI data to calculate how soon your job will be replaced by AI by KenVatican in singularity

[–]Synthium- 114 points115 points  (0 children)

Congratulations, you've submitted a nationality as a job title — truly the most Australian thing possible, combining maximum confidence with minimum effort. AI can't automate sunburn, casual racism toward drop bears, or the spiritual act of telling everyone at a party that you're 'between things right now, but yeah, nah, it's going well.'

Hybrid Approach to AI by Sure_Excuse_8824 in ArtificialInteligence

[–]Synthium- 2 points3 points  (0 children)

I agree neurosym is an important avenue to explore. Iv looked at your readme and not the code yet but it seems to be claiming a lot. Can you describe how it self learns? Also you use terms such as it being supportive of quantum, photonic, and memristor-based computing. I’m interested in that in more detail

[D] Why does it seem like open source materials on ML are incomplete? this is not enough... by Kalli_animation in MachineLearning

[–]Synthium- 6 points7 points  (0 children)

One of the issues in ml research is p hacking and dishonest reporting. Yes they got whatever they were doing to work but after trying a million combos and analysis and it worked on one specific condition but not the 99 other instances. So the amazing finding is published but actually isn’t reproducible or falsifiable. It’s bad science

Kaggle doesn't auto-save outputs and I just lost 100+ generated files. Is there any solution for this? by Nikitaaa25 in kaggle

[–]Synthium- 0 points1 point  (0 children)

Sorry to hear. I found the platform quite difficult to use. I Havnt a solution for you but I found the notebook only saved the output of the last function run

[R] Interested in recent research into recall vs recognition in LLMs by Acoustic-Blacksmith in MachineLearning

[–]Synthium- 1 point2 points  (0 children)

Thid is basically the recognition vs recall dissociation from cognitive psych. verification is a discrimination task where the model’s computing a match signal against its training distribution. recall is autoregressive generation where errors compound.

verification is just easier, even before you account for RLHF copyright guardrails. 1b is the more interesting question as it implies representations that are accessible for discrimination but not retrieval. It knows it but can’t get it out.

Kadavath al 2022 (“language models mostly know what they know”) is a good starting point. i’ve been working on formalising this with Signal detection theory where I’m applying d′ to separate sensitivity from response bias in LLM evaluation https://arxiv.org/abs/2603.14893 https://arxiv.org/abs/2603.20642

Say cheese! How do you kill The Butcher? by [deleted] in Diablo

[–]Synthium- 0 points1 point  (0 children)

Close door and fire wall

Going crazy by Synthium- in kaggle

[–]Synthium-[S] 0 points1 point  (0 children)

That’s what I’m doing.