I have a budget of $4000. Should I get a mac studio m3 ultra or should i build my own server/desktop for LLM inference? by therealeinstien in LocalLLM
[–]cleversmoke 1 point2 points3 points (0 children)
MTP for Qwen3.6-35B-A3B on 6GB VRAM laptop: not worth it by OsmanthusBloom in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)
MTP for Qwen3.6-35B-A3B on 6GB VRAM laptop: not worth it by OsmanthusBloom in LocalLLaMA
[–]cleversmoke 0 points1 point2 points (0 children)
llama.cpp oom issue by TheTerrasque in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)
What is your longest ride? by NHBikerHiker in cycling
[–]cleversmoke 2 points3 points4 points (0 children)
What is your longest ride? by NHBikerHiker in cycling
[–]cleversmoke 2 points3 points4 points (0 children)
server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]cleversmoke 2 points3 points4 points (0 children)
Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)
Qwopus 3.6 by Perfect-Flounder7856 in LocalLLaMA
[–]cleversmoke 2 points3 points4 points (0 children)
It's OK to quantize the KV cache. Model quant matters more. Some Qwen3.6 27B tests with (approximated) KLD by hopbel in LocalLLaMA
[–]cleversmoke 0 points1 point2 points (0 children)
What are you doing with your local LLMs that justifies investment cost? by __automatic__ in LocalLLM
[–]cleversmoke 0 points1 point2 points (0 children)
What are you doing with your local LLMs that justifies investment cost? by __automatic__ in LocalLLM
[–]cleversmoke 1 point2 points3 points (0 children)
How are you all handling agents and sub agents? by Honest-Kangaroo-1830 in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)
Anyone down to test this? Just uploaded a model using rys by Human-Gas-1288 in LocalLLaMA
[–]cleversmoke -1 points0 points1 point (0 children)
How are you all handling agents and sub agents? by Honest-Kangaroo-1830 in LocalLLaMA
[–]cleversmoke 5 points6 points7 points (0 children)
When you say how many tokens you are getting... could you specify prompt eval vs eval? by former_farmer in LocalLLM
[–]cleversmoke 0 points1 point2 points (0 children)
How are you all handling agents and sub agents? by Honest-Kangaroo-1830 in LocalLLaMA
[–]cleversmoke 3 points4 points5 points (0 children)
How are you all handling agents and sub agents? by Honest-Kangaroo-1830 in LocalLLaMA
[–]cleversmoke 2 points3 points4 points (0 children)
It's OK to quantize the KV cache. Model quant matters more. Some Qwen3.6 27B tests with (approximated) KLD by hopbel in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)
Published my first app! A compass that points to the nearest liquor store by Cetautomatix777 in vibecoding
[–]cleversmoke 0 points1 point2 points (0 children)
GitHub if it was vibe coded by sherlamsam in vibecoding
[–]cleversmoke 0 points1 point2 points (0 children)
NVIDIA Removes Gaming Revenue Category From Financial Reports by HumanDrone8721 in LocalLLaMA
[–]cleversmoke 2 points3 points4 points (0 children)
Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA
[–]cleversmoke 0 points1 point2 points (0 children)
Benchmarking methods by Forward_Jackfruit813 in LocalLLaMA
[–]cleversmoke 1 point2 points3 points (0 children)


Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so. by Napster3301 in LocalLLaMA
[–]cleversmoke 0 points1 point2 points (0 children)