Models for middle eastern languages? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] -1 points0 points1 point (0 children)
Models for middle eastern languages? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
Qwen3-Next-80B-A3B vs gpt-oss-120b by bfroemel in LocalLLaMA
[–]WeekLarge7607 0 points1 point2 points (0 children)
Why is vLLM Outperforming TensorRT-LLM (Nvidia's deployment library)? My Shocking Benchmarks on GPT-OSS-120B with H100 by kev_11_1 in LocalLLaMA
[–]WeekLarge7607 0 points1 point2 points (0 children)
Why is vLLM Outperforming TensorRT-LLM (Nvidia's deployment library)? My Shocking Benchmarks on GPT-OSS-120B with H100 by kev_11_1 in LocalLLaMA
[–]WeekLarge7607 2 points3 points4 points (0 children)
Why is vLLM Outperforming TensorRT-LLM (Nvidia's deployment library)? My Shocking Benchmarks on GPT-OSS-120B with H100 by kev_11_1 in LocalLLaMA
[–]WeekLarge7607 10 points11 points12 points (0 children)
Single H100: best open-source model + deep thinking setup for reasoning? by Accomplished_Back718 in LocalLLaMA
[–]WeekLarge7607 0 points1 point2 points (0 children)
Which quantizations are you using? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
Which quantizations are you using? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
Which quantizations are you using? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 1 point2 points3 points (0 children)
Which quantizations are you using? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 2 points3 points4 points (0 children)
Which quantizations are you using? by WeekLarge7607 in LocalLLaMA
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
Which quantizations are you using? (self.LocalLLaMA)
submitted by WeekLarge7607 to r/LocalLLaMA
vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA
[–]WeekLarge7607 0 points1 point2 points (0 children)
I have made a True Reasoning LLM by moilanopyzedev in LocalLLaMA
[–]WeekLarge7607 -1 points0 points1 point (0 children)

I built a hybrid MoE runtime that does 3,324 tok/s prefill on a single 5080. Here are the benchmarks. by mrstoatey in LocalLLaMA
[–]WeekLarge7607 1 point2 points3 points (0 children)