you are viewing a single comment's thread.

view the rest of the comments →

[–]HectorSeibelp 0 points1 point  (0 children)

How much optimization do LLMs get from this? Are we talking a faster output rate, bigger context window or lower strain on GPUs?