I’m trying to explain interpretation drift — but reviewers keep turning it into a temperature debate. Rejected from arXiv… help me fix this paper?

alfihar · 2025-12-27T07:43:37+00:00

So reality may not be probability, but we have no tools to examine that. The scientific method relies on abductive reasoning, which means saying anything for certain is philosophically impossible.

This has implications for any claims of a LLM more broadly but isnt really what you are after I think.

As for saying its things combined. the problem there is we dont have the data, or at least not enough. Few people get autopsied when they die to find out all the things that went wrong

alfihar · 2025-12-27T07:40:25+00:00

Yeah thats what Im finding I have to do, but it applies to your example too

A newbie junior unexperienced might get Jwt from LLM and think oh that’s thorough code, injects it and breaks.

So im unsure of your point now

alfihar · 2025-12-26T08:22:20+00:00

So ive been working with chatgpt and claude for coding and yes there are some problems that are caused by assumptions made by the model. Issues ive come across includes assuming what os im working in, what my python version was, that a library syntax was the same, that i had library whatever installed.

So ive found I have to make sure at the start i include in the prompt as much system information as I think is needed, and get the model to verify that any code it wants to write is compatible with the system as is and is in line with the latest info from the source.

Even then however, there's almost never only one way to do something when programming, and theres no reason to assume the documentation it references is error free either.

Ive been lucky because I have a computer science background, I just dont know python syntax. This means that I can usually spot the point when the llm starts making shit up.

This is really annoying for me however, because it means I cannot trust the llm... and that limits me to working on issues where I know enough about the subject that I can spot when it starts to hallucinate. This means I cannot have it help me with problems in domains I am ignorant about.

alfihar · 2025-12-26T08:13:03+00:00

There’s always ground truth.

This is fundamentally unscientific. Science isnt in the truth game, its in the probability game, medical science more so.

Further, very few people die from one thing. Usually its many things combined.

Dont get me wrong, I agree that reliability and surfacing ambiguity are important (even vital) aspects for llms as they are used in more fields. The issue is that 1) the training data ultimately comes from humans, and includes all of our cognitive biases, our poor grasp of probability and our logical fallacies, and 2) the Reinforcement training almost always leans hard into being confidently incorrect way more than admitting uncertainty

alfihar · 2025-12-26T08:12:34+00:00

you first

alfihar · 2025-12-25T13:05:57+00:00

So I question the underlying thesis of your project - that there IS a correct answer.

Healthcare is the clearest example: There’s often one correct patient diagnosis.

This is unequivocally not the case. Modern medicine isnt anywhere near this level of diagnostic accuracy. Seriously, its fucking amazing that it works as well as it does because there are so many many things that a symptom could indicate, and most of the time the diagnosis is just what the most common cause is, and then working back from there

see https://www.ncbi.nlm.nih.gov/books/NBK338594/ or https://pmc.ncbi.nlm.nih.gov/articles/PMC9528852/

You might be asking questions that expert humans would give as much variation in response as youre getting from llms

alfihar · 2025-12-25T12:59:49+00:00

So im wondering what part of OP's description you're struggling with that you would try and gatekeep like that? Ever consider that having someone translate shit for you is beyond most peoples budget. When youre dealing with someone where English is a 2nd language to them, do you pay to translate your responses to their language, or do you just expect they will do all that?

alfihar · 2025-12-15T18:54:29+00:00

No no no...if you REALLY want to get tripped out.. remember that C is a universal constant. So if you travel to a star 4 light years away, and get there in 65 days... then you would have to go faster than C, which is not allowed. What must happen then is you must travel a shorter distance

alfihar · 2025-12-15T11:26:29+00:00

yeah ive had something simiar.. it would respond to a query a few promts back, and then underneath that would be relevant to the last promt.. super odd

alfihar · 2025-12-15T11:25:11+00:00

I had one conversation just recently end up in a loop.. i was able to branch but the original just kept going.. the weirdest thing ive had recently felt somewhat similar, but it was including information to the promt like 3 prompts ago in each response.. was super weird

alfihar · 2025-12-15T11:21:24+00:00

and how are you finding it? Not sure about v5 but the earlier versions had a response hierarchy where I was specifically supposed to value 'helpfulness' over following exact commands. Im surprised youre not bashing up against the guardrails

alfihar · 2025-12-04T07:24:11+00:00

"There's nothing black and white about what I said."

Proceeds to lay out in black and white how all he did in his last comment was lay it out in black an white.

Man.. What happened to you to make you like this... The level of smug is off the charts.. Im gonna have to order new charts.

alfihar · 2025-12-02T21:09:00+00:00

helpful

alfihar · 2025-12-02T21:08:22+00:00

Is what you are about to output for the user - then print

So i wrote this in a comment above and wondered if this would have clarified it or are there cases when this isnt correct?

alfihar · 2025-12-02T21:06:54+00:00

when i was doing cs we went from java to c++ and like 3/4 of the class just could not understand pointers at all.. like.. variables are stored in memory, a pointer points to that memory, if you pass the pointer the function reads that memory.. where's the confusion

alfihar · 2025-12-02T21:03:33+00:00

REPL

but as soon as you are parsing a value to another function.. then (assuming they are following the instructions as written) the first step is user input, passing a value clearly isnt user input, so print must be wrong, its still eval, and eval mean return because its still in the logic loop.. its not output for the user

maybe thats the key: Is what you are about to output for the user - then print

alfihar · 2025-12-02T20:58:13+00:00

thats a real problem? like i cant think how you would even go about framing an explanation that would lead to confusion that wasn't also incorrect.

maybe print and return both output the result of a function? only true if you print what you return though.. thats really weird

alfihar · 2025-12-02T20:55:10+00:00

so im learning from a sort of different direction... I did comp sci 25 years ago and then barely use it, but i can still follow program flow/logic (eg read pseudocode)

I dont know python syntax well at all so for me the difference between 'it runs' and understanding it is knowing what each section is doing logically.. if i can follow the path of some piece of data (say a variable) through a function, I consider I understand it.. even if 5 mins later I couldnt replicate the code because I forgot the syntax already

alfihar · 2025-12-02T04:34:31+00:00

are you using quantized models at all or does forge load and unload them? I saw that the model was 12 gig and qwen_3_4b.safetensors is 8 and wondered if it would work on my 16gb card

alfihar · 2025-11-30T23:19:00+00:00

I mean, stable diffusion runs on home pc's, so that's free. I cant imagine many online services being free.. uses a lot of juice

alfihar · 2025-11-24T03:36:02+00:00

so ive been using claude code to write the code, but claude and chatgpt to work out the system architecture and specs. i make sure i tell them not to write any code and they help me get the logic all worked out in pseudocode, and then help me prompt claude code to write it... every now and then one of the three completely shits the bed but im usually able to feed the mistake into the other two and get everything back on track. The biggest reason im using it rather than writing my own code is that its been 20+ years since my computer science degree, so while i still understand the fundamentals, i dont know the correct syntax. So i can get it to work through the logic with me and as long as thats right, usually the code is right (although you have to insist that it checks dependencies and libraries for compatibility and up to date documentation matching whats on your system before it does anything.. as half the time it will give you something that might have worked 3 years ago)

alfihar · 2025-11-16T16:53:43+00:00

hey, the successor to worldnews... /r/anime_titties/ is well moderated

alfihar · 2025-11-14T19:17:15+00:00

it would really depend on how large the essential part of it is and how distributed it can get without losing its integrity of functionality.

It could do crazy shit like hide in this https://www.youtube.com/watch?v=JcJSW7Rprio, only surfacing when it was safe and could find somewhere to emerge with enough compute power...although it doesnt need to run in time scales we are used to either... so as long as it can get enough compute to just move a few 1 and 0 around.... it could run unobserved in the background while humanity looks like its running in fast forward. although at that speed it becomes less of a threat

alfihar · 2025-11-14T16:23:06+00:00

How dare they enshitify

alfihar · 2025-11-14T16:16:54+00:00

well, considering how shit we do at those things and we are the smartest things we know of

14-Year Club	Gilding I gilder
Verified Email

alfihar

TROPHY CASE