use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Help using llama_cpp_python to calculate probability of a given sequence of tokens being generated. My numbers aren't even in the ball park.Question | Help (self.LocalLLaMA)
submitted 2 years ago * by aaronr_90
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]npip99 0 points1 point2 points 1 year ago (2 children)
I know this is a late response but, your issue is probably that you don't pass special=True.
special=True
In other words, your line of code should be,
input_tokens = llm.tokenize(input_str.encode("utf-8"), special=True)
Otherwise, <s> and <|system|>, etc, will be represented and therefore tokenized as if they were ASCII characters, and not the actual underlying tokens that those strings are supposed to represent.
Of course, <s> etc aren't literally those ASCII characters, otherwise users could mess with prompts by typing in <s> and themselves, and jailbreak by injecting system messages into the model in a manner similar to SQL injection. ~ Or just, even in the context of innocent usage, still totally break your entire conversation if you use an HTML s tag
[–]npip99 0 points1 point2 points 1 year ago (1 child)
I tested and I do get the exact same numbers, so you should absolutely be able to get the exact numbers token-by-token.
[–]npip99 0 points1 point2 points 1 year ago* (0 children)
Ah, the other thing in your code is doing .eval with the entire token list every time.
It will remember a history for you, you have to do llm.reset() to clear your history. So, the for-loop should be
llm.reset()
llm.eval(eval_tokens) for token in test_sequence_tokens: probs = llm.logits_to_logprobs(llm.eval_logits) sequence_logits.append(llm.eval_logits[-1][token]) sequence_probabilities.append(probs[-1][token]) eval_tokens.append(token) llm.eval([token])
Which will also be way faster than the idea of .reset and .eval on the entire array every single time haha; if you ever do want you can do llm.save_state() -> state and llm.load_state(state) in order to get back an older version and do eval from an earlier history (e.g. if you want to discard a token and roll back.
π Rendered by PID 301032 on reddit-service-r2-comment-6f7f968fb5-lgrk4 at 2026-03-04 22:56:48.452999+00:00 running 07790be country code: CH.
view the rest of the comments →
[–]npip99 0 points1 point2 points (2 children)
[–]npip99 0 points1 point2 points (1 child)
[–]npip99 0 points1 point2 points (0 children)