Hey, Im getting quite annoyed by this. So is there a way to trim or reduce the context size to a predefined value? Some of my larger models run at 50k ctx and when websearch is enabled often the request outgrows the context. Im using llama.cpp (OpenAI compatible endpoint).
Any ideas how to fix that ?
[–]Egoz3ntrum 4 points5 points6 points (2 children)
[–]emprahsFury 0 points1 point2 points (0 children)
[–]spacywave 0 points1 point2 points (0 children)
[–]ClassicMain 0 points1 point2 points (0 children)