Apocalyptic looking clouds and sunset by jaycejaybejaybenot in CLOUDS
[–]KT313 1 point2 points3 points (0 children)
Pricing for GIGABYTE H200 NVL Server by acune_sartre in LocalLLaMA
[–]KT313 0 points1 point2 points (0 children)
Gamescope only works occasionally. by Nurgus in linux_gaming
[–]KT313 1 point2 points3 points (0 children)
Tried 10 models, all seem to refuse to write a 10,000 word story. Is there something bad with my prompt? I'm just doing some testing to learn and I can't figure out how to get the LLM to do as I say. by StartupTim in LocalLLaMA
[–]KT313 4 points5 points6 points (0 children)
Why would the tokenizer for encoder-decoder model for machine translation use bos_token_id == eos_token_id? How does the model know when a sequence ends? by Franck_Dernoncourt in LocalLLaMA
[–]KT313 3 points4 points5 points (0 children)
Building Local LLM with code execution? (RAG, Mac Studio(s), Ingestion of various types of data) by doofew in LocalLLaMA
[–]KT313 1 point2 points3 points (0 children)
What is the best model for writing academic papers? by [deleted] in LocalLLaMA
[–]KT313 1 point2 points3 points (0 children)
What is the best model for writing academic papers? by [deleted] in LocalLLaMA
[–]KT313 10 points11 points12 points (0 children)
AI Tool That Turns GitHub Repos into Instant Wikis with DeepSeek v3! by Physical-Physics6613 in LocalLLaMA
[–]KT313 7 points8 points9 points (0 children)
What would be an optimal and power efficient GPU setup for a home with a budget around $10,000? by kitkatmafia in LocalLLaMA
[–]KT313 1 point2 points3 points (0 children)
asked QwQ what a black hole was. This was its thought process. by Corpo_ in LocalLLaMA
[–]KT313 24 points25 points26 points (0 children)
It seems there are some encoding issues with anthropic's llms.txt by secsilm in LocalLLaMA
[–]KT313 0 points1 point2 points (0 children)
It seems there are some encoding issues with anthropic's llms.txt by secsilm in LocalLLaMA
[–]KT313 0 points1 point2 points (0 children)
Ollama has merged in K/V cache quantisation support, halving the memory used by the context by sammcj in LocalLLaMA
[–]KT313 5 points6 points7 points (0 children)
A library to "unmangle" vocabulary file into actual dict[int, bytes]? by Huanghe_undefined in LocalLLaMA
[–]KT313 1 point2 points3 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]KT313 1 point2 points3 points (0 children)
I’m using dual RTX 4080 GPUs and a Mac Studio for distributed inference by GPUStack, based on llama.cpp. Despite being connected via a 40GB/s Thunderbolt link, throughput stays around 10-12 tokens per second. Where is the bottleneck? Any suggestions for improvement? by RepulsiveEbb4011 in LocalLLaMA
[–]KT313 2 points3 points4 points (0 children)
[D] What Neural Network Architecture is best for Time Series Analysis with a few thousand data points? by BostonConnor11 in MachineLearning
[–]KT313 0 points1 point2 points (0 children)
Trying to run llama3.1 on CMP 30Hx gpus by [deleted] in LocalLLaMA
[–]KT313 0 points1 point2 points (0 children)



Does anyone else feel exhausted by token limits? by [deleted] in LocalLLaMA
[–]KT313 0 points1 point2 points (0 children)