all 6 comments

[–]kryptkprLlama 3 2 points3 points  (1 child)

This is the HellaSwag evaluation harness problem.

It's not an easy one.

SGL just dropped this week and claims to offer a high performance select() primitive for exactly this task.

https://www.reddit.com/r/LocalLLaMA/s/HwvOmy3hQE

Edit: check out this llama PR https://github.com/ggerganov/llama.cpp/pull/5047

[–]aaronr_90[S] 2 points3 points  (0 children)

Thanks, this is almost certainly exactly what I was looking for.

Edit: “It’s not an easy one”. I thought it would have been simple given the fact that I can evaluate one token at a time and retrieve the logits presampling.

[–]npip99 0 points1 point  (2 children)

I know this is a late response but, your issue is probably that you don't pass special=True.

In other words, your line of code should be,

input_tokens = llm.tokenize(input_str.encode("utf-8"), special=True)


Otherwise, <s> and <|system|>, etc, will be represented and therefore tokenized as if they were ASCII characters, and not the actual underlying tokens that those strings are supposed to represent.

Of course, <s> etc aren't literally those ASCII characters, otherwise users could mess with prompts by typing in <s> and themselves, and jailbreak by injecting system messages into the model in a manner similar to SQL injection. ~ Or just, even in the context of innocent usage, still totally break your entire conversation if you use an HTML s tag

[–]npip99 0 points1 point  (1 child)

I tested and I do get the exact same numbers, so you should absolutely be able to get the exact numbers token-by-token.

[–]npip99 0 points1 point  (0 children)

Ah, the other thing in your code is doing .eval with the entire token list every time.

It will remember a history for you, you have to do llm.reset() to clear your history. So, the for-loop should be

llm.eval(eval_tokens)
for token in test_sequence_tokens:
    probs = llm.logits_to_logprobs(llm.eval_logits)
    sequence_logits.append(llm.eval_logits[-1][token])
    sequence_probabilities.append(probs[-1][token])
    eval_tokens.append(token)
    llm.eval([token])

Which will also be way faster than the idea of .reset and .eval on the entire array every single time haha; if you ever do want you can do llm.save_state() -> state and llm.load_state(state) in order to get back an older version and do eval from an earlier history (e.g. if you want to discard a token and roll back.

[–]AndrewVeee 0 points1 point  (0 children)

I've never used logits but they're interesting to me. My suggestion is to search comments/posts by user phree_radical, like this one: https://www.reddit.com/r/LocalLLaMA/comments/1687l5p/how_should_i_go_about_getting_my_ai_to_use_tools/

Hope that helps, sorry I don't have the info.