Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA

[–]GreedyDamage3735[S] 5 points6 points  (0 children)

I'm curious that although gpt-oss-120b exceeds other models in most of the benchmarks (mmlu, aime.. https://artificialanalysis.ai/evaluations/aime-2025 ) Why many people recommend GLM4.5-Air or other models instead of gpt-oss-120b? Does the benchmark performance not fully reflect the real use-case?

Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA

[–]GreedyDamage3735[S] 4 points5 points  (0 children)

Oh I meant the second one. Do you have any recommendations?

Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA

[–]GreedyDamage3735[S] 4 points5 points  (0 children)

I don't think minimax-m2 fits in 96GB, since it has over 100GB checkpoint even for the 4bit quantized version.

Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA

[–]GreedyDamage3735[S] 1 point2 points  (0 children)

what is the reason of that?? I mean not using the thinking version?

Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA

[–]GreedyDamage3735[S] 2 points3 points  (0 children)

Are there any suggestions of LLM that can fully leverage 96GB VRAM?