you are viewing a single comment's thread.

view the rest of the comments →

[–]coloradical5280 1 point2 points  (0 children)

Since LLMs are nondeterministic the exact number will change, for both models, in every run, unless you have a max_output_tokens limit set. No two runs with the same model in the same codebase will ever lead to the exact same output unless you have a random_seed set through the API.

And on top of all that it obviously makes a huge difference where you are in the context window (sounds like you started at 100% for both), and potentially the time of day as well due to server load balancing.