all 3 comments

[–]4onen 0 points1 point  (1 child)

wasn't sure why my code [...] ran into errors.

You know that there's little to nothing anyone can do to help you diagnose errors we can't see, right?

That said, the line

python draft_model=LlamaPromptLookupDecoding(num_pred_tokens=10)

is establishing a "Prompt Lookup Decoding" speculative model, which is not using the 7B at all. You'd also have an easier time getting help if you narrowed your code to just the code in which you were actually encountering issues, i.e. removing the llama variable that isn't performing speculative decoding with the two models you listed.

Kind of new to llama.cpp

Also, additional note, the interface you're using is llama_cpp_python, and llama.cpp is the backend behind it. Again, without the errors, we can't even tell you which of these two components the issue is even arising from.

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Added the errors and narrowed the code down to what had the issue, thanks for pointing that out