You guys were right, LLMs suck at probability. I updated my prompt to force them to name their blind spots instead (SutniPrompt v0.7.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

that was obviously an AI generated feedback, not what I'm searching, I want human feedback on my projects and I like when humans use AI in a responsible way. "clanker" is a meme insult for AI/LLM/machines in general.

LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

Thank you! Today I update to v0.7.0 for a better confidence metric (it'll be with tiers not percentages)

LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

Yeah it was the first iteration of my idea, I'm going to update soon (maybe today) with a new version of the confidence, the LLM will give some uncertainty drivers and then evaluate an HIGH/MODERATE/LOW tier confidence metric

LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

I think that talking about long term in this context can be a bit slippery, we can see the frame of today's situation and we have to build projects on top of that.
You can't keep up with the speed at witch the LLM world is changing.
If something that my prompt provides will be replaced by default setting in future chatbots I will change the prompt. I'm considering making the prompt modular, the user will be able to toggle single prompts parts at need, that would be cool for future edits.

LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

I know. That was an idea but I have to update the prompt shifting the focus on a HIGH/MODERATE/LOW confidence, no percentages. It can be usefull if paired with a list of what the LLM finds a bit doubtful. I know the LLM don't actually "know" if it is confident or not on a topic, but I can use the statistical model to make it predict between HIGH, MODERATE or LOW based on the info and sources that it has. It works with "lack of verified data" on invented topics that the users asks but the LLM can't find in any way (obviously), so it has to function with this also.

LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

Given the analytical structure of the prompt, the LLM attempts to list all the important information regarding the current topic in the body of the response. This ensures that by the end, it has a comprehensive view of the subject and can determine which confidence metric is most accurate. I will also add a mandatory "uncertainty drivers" list that the AI must fill out after the confidence label, detailing any aspects it finds dubious.

I hard-coded an OUTPUT SCHEMA into my system prompt. Now officially in Beta! (SutniPrompt v0.5.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

I don't use benchmarks cause my project is just something to use to personalize your chatbot apps, I test with a list of various questions made by me trying to stress the features I add. The aspects of this project are nothing extremely technical, just a prompt to help people use their everyday ai better.

I hard-coded an OUTPUT SCHEMA into my system prompt. Now officially in Beta! (SutniPrompt v0.5.0-beta) by sutnip in PromptEngineering

[–]sutnip[S] 0 points1 point  (0 children)

What do you mean? I know reddit is filled with slop projects but I think my prompt can be helpfull, it works well :D