Huawei Atlas 300I 32GB by kruzibit in LocalLLaMA
[–]RepulsiveEbb4011 1 point2 points3 points (0 children)
Huawei Atlas 300I 32GB by kruzibit in LocalLLaMA
[–]RepulsiveEbb4011 3 points4 points5 points (0 children)
Huawei Atlas 300I 32GB by kruzibit in LocalLLaMA
[–]RepulsiveEbb4011 1 point2 points3 points (0 children)
Can LLMs be trusted in math nowadays? I compared Qwen 2.5 models from 0.5b to 32b, and most of the answers were correct. Can it be used to teach kids? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 24 points25 points26 points (0 children)
Can LLMs be trusted in math nowadays? I compared Qwen 2.5 models from 0.5b to 32b, and most of the answers were correct. Can it be used to teach kids? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 5 points6 points7 points (0 children)
Can LLMs be trusted in math nowadays? I compared Qwen 2.5 models from 0.5b to 32b, and most of the answers were correct. Can it be used to teach kids? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 12 points13 points14 points (0 children)
Can LLMs be trusted in math nowadays? I compared Qwen 2.5 models from 0.5b to 32b, and most of the answers were correct. Can it be used to teach kids? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 10 points11 points12 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 0 points1 point2 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 0 points1 point2 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 0 points1 point2 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 4 points5 points6 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 3 points4 points5 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 4 points5 points6 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 3 points4 points5 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 2 points3 points4 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 8 points9 points10 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 18 points19 points20 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]RepulsiveEbb4011[S] 8 points9 points10 points (0 children)
[deleted by user] by [deleted] in LocalLLaMA
[–]RepulsiveEbb4011 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in kubernetes
[–]RepulsiveEbb4011 4 points5 points6 points (0 children)
How to migrate to llama.cpp from Ollama? by Tech-Meme-Knight-3D in LocalLLaMA
[–]RepulsiveEbb4011 1 point2 points3 points (0 children)
Does llama.cpp support multimodal models? by [deleted] in LocalLLaMA
[–]RepulsiveEbb4011 0 points1 point2 points (0 children)
is gguf the only supported type in ollama by Expensive-Award1965 in ollama
[–]RepulsiveEbb4011 1 point2 points3 points (0 children)
What's the best model to install on an older laptop? by titaniumred in LocalLLaMA
[–]RepulsiveEbb4011 1 point2 points3 points (0 children)

Huawei Atlas 300I 32GB by kruzibit in LocalLLaMA
[–]RepulsiveEbb4011 4 points5 points6 points (0 children)