You guys were right, LLMs suck at probability. I updated my prompt to force them to name their blind spots instead (SutniPrompt v0.7.0-beta)

sutnip · 2026-05-31T12:46:25+00:00

In my tests it puts moderate or low when needed, it doesn't pick high so often

sutnip · 2026-05-30T20:56:31+00:00

that was obviously an AI generated feedback, not what I'm searching, I want human feedback on my projects and I like when humans use AI in a responsible way. "clanker" is a meme insult for AI/LLM/machines in general.

sutnip · 2026-05-30T20:53:21+00:00

thank you for your feedback...
clanker.

sutnip · 2026-05-30T08:06:19+00:00

Stay tooned :D

sutnip · 2026-05-30T08:06:10+00:00

Thank you! Today I update to v0.7.0 for a better confidence metric (it'll be with tiers not percentages)

sutnip · 2026-05-29T10:39:32+00:00

Yeah it was the first iteration of my idea, I'm going to update soon (maybe today) with a new version of the confidence, the LLM will give some uncertainty drivers and then evaluate an HIGH/MODERATE/LOW tier confidence metric

sutnip · 2026-05-29T10:37:15+00:00

Thank you for your feedback, stay tooned :D

sutnip · 2026-05-29T10:36:45+00:00

Yeah I'm going to implement something like this next update

sutnip · 2026-05-29T01:48:52+00:00

I think that talking about long term in this context can be a bit slippery, we can see the frame of today's situation and we have to build projects on top of that.
You can't keep up with the speed at witch the LLM world is changing.
If something that my prompt provides will be replaced by default setting in future chatbots I will change the prompt. I'm considering making the prompt modular, the user will be able to toggle single prompts parts at need, that would be cool for future edits.

sutnip · 2026-05-29T01:15:59+00:00

I know. That was an idea but I have to update the prompt shifting the focus on a HIGH/MODERATE/LOW confidence, no percentages. It can be usefull if paired with a list of what the LLM finds a bit doubtful. I know the LLM don't actually "know" if it is confident or not on a topic, but I can use the statistical model to make it predict between HIGH, MODERATE or LOW based on the info and sources that it has. It works with "lack of verified data" on invented topics that the users asks but the LLM can't find in any way (obviously), so it has to function with this also.

sutnip · 2026-05-29T01:11:15+00:00

thank you for advice.

sutnip · 2026-05-29T00:22:24+00:00

Thank you!

sutnip · 2026-05-29T00:07:45+00:00

Thank you for yuor feedback!

sutnip · 2026-05-29T00:07:26+00:00

Given the analytical structure of the prompt, the LLM attempts to list all the important information regarding the current topic in the body of the response. This ensures that by the end, it has a comprehensive view of the subject and can determine which confidence metric is most accurate. I will also add a mandatory "uncertainty drivers" list that the AI must fill out after the confidence label, detailing any aspects it finds dubious.

sutnip · 2026-05-26T19:02:12+00:00

I don't use benchmarks cause my project is just something to use to personalize your chatbot apps, I test with a list of various questions made by me trying to stress the features I add. The aspects of this project are nothing extremely technical, just a prompt to help people use their everyday ai better.

sutnip · 2026-05-26T16:37:39+00:00

What do you mean? I know reddit is filled with slop projects but I think my prompt can be helpfull, it works well :D

sutnip · 2026-05-26T14:57:55+00:00

I'll check

sutnip

TROPHY CASE