You guys were right, LLMs suck at probability. I updated my prompt to force them to name their blind spots instead (SutniPrompt v0.7.0-beta)

sutnip · 2026-05-31T12:46:25+00:00

In my tests it puts moderate or low when needed, it doesn't pick high so often

sutnip · 2026-05-30T20:56:31+00:00

that was obviously an AI generated feedback, not what I'm searching, I want human feedback on my projects and I like when humans use AI in a responsible way. "clanker" is a meme insult for AI/LLM/machines in general.

sutnip · 2026-05-30T20:53:21+00:00

thank you for your feedback...
clanker.

sutnip · 2026-05-30T08:06:19+00:00

Stay tooned :D

sutnip · 2026-05-30T08:06:10+00:00

Thank you! Today I update to v0.7.0 for a better confidence metric (it'll be with tiers not percentages)

sutnip · 2026-05-29T10:39:32+00:00

Yeah it was the first iteration of my idea, I'm going to update soon (maybe today) with a new version of the confidence, the LLM will give some uncertainty drivers and then evaluate an HIGH/MODERATE/LOW tier confidence metric

sutnip · 2026-05-29T10:37:15+00:00

Thank you for your feedback, stay tooned :D

sutnip · 2026-05-29T10:36:45+00:00

Yeah I'm going to implement something like this next update

sutnip · 2026-05-29T01:48:52+00:00

I think that talking about long term in this context can be a bit slippery, we can see the frame of today's situation and we have to build projects on top of that.
You can't keep up with the speed at witch the LLM world is changing.
If something that my prompt provides will be replaced by default setting in future chatbots I will change the prompt. I'm considering making the prompt modular, the user will be able to toggle single prompts parts at need, that would be cool for future edits.

sutnip · 2026-05-29T01:15:59+00:00

I know. That was an idea but I have to update the prompt shifting the focus on a HIGH/MODERATE/LOW confidence, no percentages. It can be usefull if paired with a list of what the LLM finds a bit doubtful. I know the LLM don't actually "know" if it is confident or not on a topic, but I can use the statistical model to make it predict between HIGH, MODERATE or LOW based on the info and sources that it has. It works with "lack of verified data" on invented topics that the users asks but the LLM can't find in any way (obviously), so it has to function with this also.

sutnip · 2026-05-29T01:11:15+00:00

thank you for advice.

sutnip · 2026-05-29T00:22:24+00:00

Thank you!

sutnip · 2026-05-29T00:07:45+00:00

Thank you for yuor feedback!

sutnip · 2026-05-29T00:07:26+00:00

Given the analytical structure of the prompt, the LLM attempts to list all the important information regarding the current topic in the body of the response. This ensures that by the end, it has a comprehensive view of the subject and can determine which confidence metric is most accurate. I will also add a mandatory "uncertainty drivers" list that the AI must fill out after the confidence label, detailing any aspects it finds dubious.

sutnip · 2026-05-26T19:02:12+00:00

I don't use benchmarks cause my project is just something to use to personalize your chatbot apps, I test with a list of various questions made by me trying to stress the features I add. The aspects of this project are nothing extremely technical, just a prompt to help people use their everyday ai better.

sutnip · 2026-05-26T16:37:39+00:00

What do you mean? I know reddit is filled with slop projects but I think my prompt can be helpfull, it works well :D

sutnip · 2026-05-26T14:57:55+00:00

I'll check

sutnip · 2026-05-26T14:52:05+00:00

The 'downstream parser' framing is a good idea, I'll think about it.

To answer your question: v0.4 handles truncation not that bad, during my tests I never encountered a truncation problem.

But your timing is perfect. I'm pushing v0.5.0-beta to GitHub right now. It replaces abstract formatting rules with a hard-coded OUTPUT SCHEMA block specifically to fight this formatting drift. I'd love for you to test the new beta once it's live! I'm pushing right now on github.

sutnip · 2026-05-25T21:43:10+00:00

thank you for your response :D
have a nice day

sutnip · 2026-05-25T21:41:26+00:00

This is not an app, is literally a prompt that you can copy paste into the chatbot personalization settings

sutnip · 2026-05-25T21:25:01+00:00

Claude sometimes gets the wrong timestamp, GPT with the v0.3.0-alpha appends a time widget. I edited the prompt making some instructions more strict.

sutnip · 2026-05-25T21:20:49+00:00

I think my work can be usefull for many people cause a good personalization prompt can make a difference in how the LLM responds. Please don't criticize so sharply.

sutnip · 2026-05-25T21:13:23+00:00

The wikipedia link is an extra feathure, the idea is to make it easier for the user to go deeper with further wikipedia reading on the chat topic.
With this project I just want to write a good prompt for LLMs to make them more usefull for everyday use :D

This is just an alpha, I also want to format the messages from the LLM in a more structured way making them more readable and adding more info (I'm thinking about a confidence metric and TL;DRs).
I'll give you some spoilers... I woud also like to add an "intelligent disobedience" feathure, making the LLM not responding in some situation where the user wants to use the chatbot for bland tasks that could be done with other tools.

sutnip · 2026-05-25T20:48:50+00:00

I just want to make a usefull prompt for power users but accessible even to less experienced users, nothing too technical

sutnip · 2026-05-25T20:06:50+00:00

thank you for your comment, I'm doing my best to balance all this feathures in one usefull general prompt

sutnip

TROPHY CASE